1
|
Lavan N, Sutherland CAM. Idiosyncratic and shared contributions shape impressions from voices and faces. Cognition 2024; 251:105881. [PMID: 39029363 DOI: 10.1016/j.cognition.2024.105881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 07/06/2024] [Accepted: 07/07/2024] [Indexed: 07/21/2024]
Abstract
Voices elicit rich first impressions of what the person we are hearing might be like. Research stresses that these impressions from voices are shared across different listeners, such that people on average agree which voices sound trustworthy or old and which do not. However, can impressions from voices also be shaped by the 'ear of the beholder'? We investigated whether - and how - listeners' idiosyncratic, personal preferences contribute to first impressions from voices. In two studies (993 participants, 156 voices), we find evidence for substantial idiosyncratic contributions to voice impressions using a variance portioning approach. Overall, idiosyncratic contributions were as important as shared contributions to impressions from voices for inferred person characteristics (e.g., trustworthiness, friendliness). Shared contributions were only more influential for impressions of more directly apparent person characteristics (e.g., gender, age). Both idiosyncratic and shared contributions were reduced when stimuli were limited in their (perceived) variability, suggesting that natural variation in voices is key to understanding this impression formation. When comparing voice impressions to face impressions, we found that idiosyncratic and shared contributions to impressions similarly across modality when stimulus properties are closely matched - although voice impressions were overall less consistent than face impressions. We thus reconceptualise impressions from voices as being formed not only based on shared but also idiosyncratic contributions. We use this new framing to suggest future directions of research, including understanding idiosyncratic mechanisms, development, and malleability of voice impression formation.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, United Kingdom.
| | - Clare A M Sutherland
- School of Psychology, King's College, University of Aberdeen, United Kingdom; School of Psychological Science, University of Western Australia, Australia
| |
Collapse
|
2
|
Lavan N, Rinke P, Scharinger M. The time course of person perception from voices in the brain. Proc Natl Acad Sci U S A 2024; 121:e2318361121. [PMID: 38889147 PMCID: PMC11214051 DOI: 10.1073/pnas.2318361121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 04/26/2024] [Indexed: 06/20/2024] Open
Abstract
When listeners hear a voice, they rapidly form a complex first impression of who the person behind that voice might be. We characterize how these multivariate first impressions from voices emerge over time across different levels of abstraction using electroencephalography and representational similarity analysis. We find that for eight perceived physical (gender, age, and health), trait (attractiveness, dominance, and trustworthiness), and social characteristics (educatedness and professionalism), representations emerge early (~80 ms after stimulus onset), with voice acoustics contributing to those representations between ~100 ms and 400 ms. While impressions of person characteristics are highly correlated, we can find evidence for highly abstracted, independent representations of individual person characteristics. These abstracted representationse merge gradually over time. That is, representations of physical characteristics (age, gender) arise early (from ~120 ms), while representations of some trait and social characteristics emerge later (~360 ms onward). The findings align with recent theoretical models and shed light on the computations underpinning person perception from voices.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, LondonE1 4NS, United Kingdom
| | - Paula Rinke
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Marburg35037, Germany
| | - Mathias Scharinger
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Marburg35037, Germany
- Research Center “Deutscher Sprachatlas”, Philipps-University Marburg, Marburg35037, Germany
- Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Marburg35032, Germany
| |
Collapse
|
3
|
Michel L, Ricou C, Bonnet-Brilhault F, Houy-Durand E, Latinus M. Sounds Pleasantness Ratings in Autism: Interaction Between Social Information and Acoustical Noise Level. J Autism Dev Disord 2024; 54:2148-2157. [PMID: 37118645 DOI: 10.1007/s10803-023-05989-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/06/2023] [Indexed: 04/30/2023]
Abstract
A lack of response to voices, and a great interest for music are part of the behavioral expressions, commonly (self-)reported in Autism Spectrum Disorder (ASD). These atypical interests for vocal and musical sounds could be attributable to different levels of acoustical noise, quantified in the harmonic-to-noise ratio (HNR). No previous study has investigated explicit auditory pleasantness in ASD comparing vocal and non-vocal sounds, in relation to acoustic noise level. The aim of this study is to objectively evaluate auditory pleasantness. 16 adults on the autism spectrum and 16 neuro-typical (NT) matched adults rated the likeability of vocal and non-vocal sounds, with varying harmonic-to-noise ratio levels. A group by category interaction in pleasantness judgements revealed that participants on the autism spectrum judged vocal sounds as less pleasant than non-vocal sounds; an effect not found for NT participants. A category by HNR level interaction revealed that participants of both groups rated sounds with a high HNR as more pleasant for non-vocal sounds. A significant group by HNR interaction revealed that people on the autism spectrum tended to judge as less pleasant sounds with high HNR and more pleasant those with low HNR than NT participants. Acoustical noise level of sounds alone does not appear to explain atypical interest for voices and greater interest in music in ASD.
Collapse
Affiliation(s)
- Lisa Michel
- UMR 1253, iBrain, Université de Tours, INSERM, 37000, Tours, France.
| | - Camille Ricou
- UMR 1253, iBrain, Université de Tours, INSERM, 37000, Tours, France
| | - Frédérique Bonnet-Brilhault
- UMR 1253, iBrain, Université de Tours, INSERM, 37000, Tours, France
- EXAC.T, Centre Universitaire de Pédopsychiatrie, CHRU de Tours, Tours, France
| | - Emannuelle Houy-Durand
- UMR 1253, iBrain, Université de Tours, INSERM, 37000, Tours, France
- EXAC.T, Centre Universitaire de Pédopsychiatrie, CHRU de Tours, Tours, France
| | - Marianne Latinus
- UMR 1253, iBrain, Université de Tours, INSERM, 37000, Tours, France
- Centro de Estudios en Neurociencia Humana y Neuropsicología, Facultad de Psicología, Universidad Diego Portales, Santiago, Chile
| |
Collapse
|
4
|
Ostrega J, Shiramizu V, Lee AJ, Jones BC, Feinberg DR. No evidence that averaging voices influences attractiveness. Sci Rep 2024; 14:10488. [PMID: 38714709 PMCID: PMC11076608 DOI: 10.1038/s41598-024-61064-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 04/30/2024] [Indexed: 05/10/2024] Open
Abstract
Vocal attractiveness influences important social outcomes. While most research on the acoustic parameters that influence vocal attractiveness has focused on the possible roles of sexually dimorphic characteristics of voices, such as fundamental frequency (i.e., pitch) and formant frequencies (i.e., a correlate of body size), other work has reported that increasing vocal averageness increases attractiveness. Here we investigated the roles these three characteristics play in judgments of the attractiveness of male and female voices. In Study 1, we found that increasing vocal averageness significantly decreased distinctiveness ratings, demonstrating that participants could detect manipulations of vocal averageness in this stimulus set and using this testing paradigm. However, in Study 2, we found no evidence that increasing averageness significantly increased attractiveness ratings of voices. In Study 3, we found that fundamental frequency was negatively correlated with male vocal attractiveness and positively correlated with female vocal attractiveness. By contrast with these results for fundamental frequency, vocal attractiveness and formant frequencies were not significantly correlated. Collectively, our results suggest that averageness may not necessarily significantly increase attractiveness judgments of voices and are consistent with previous work reporting significant associations between attractiveness and voice pitch.
Collapse
Affiliation(s)
- Jessica Ostrega
- Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, Canada
| | - Victor Shiramizu
- Department of Psychological Sciences and Health, University of Strathclyde, Glasgow, UK
| | - Anthony J Lee
- Division of Psychology, University of Stirling, Stirling, UK
| | - Benedict C Jones
- Department of Psychological Sciences and Health, University of Strathclyde, Glasgow, UK
| | - David R Feinberg
- Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, Canada.
| |
Collapse
|
5
|
Pounder Z, Eardley AF, Loveday C, Evans S. No clear evidence of a difference between individuals who self-report an absence of auditory imagery and typical imagers on auditory imagery tasks. PLoS One 2024; 19:e0300219. [PMID: 38568916 PMCID: PMC10990234 DOI: 10.1371/journal.pone.0300219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 02/25/2024] [Indexed: 04/05/2024] Open
Abstract
Aphantasia is characterised by the inability to create mental images in one's mind. Studies investigating impairments in imagery typically focus on the visual domain. However, it is possible to generate many different forms of imagery including imagined auditory, kinesthetic, tactile, motor, taste and other experiences. Recent studies show that individuals with aphantasia report a lack of imagery in modalities, other than vision, including audition. However, to date, no research has examined whether these reductions in self-reported auditory imagery are associated with decrements in tasks that require auditory imagery. Understanding the extent to which visual and auditory imagery deficits co-occur can help to better characterise the core deficits of aphantasia and provide an alternative perspective on theoretical debates on the extent to which imagery draws on modality-specific or modality-general processes. In the current study, individuals that self-identified as being aphantasic and matched control participants with typical imagery performed two tasks: a musical pitch-based imagery and voice-based categorisation task. The majority of participants with aphantasia self-reported significant deficits in both auditory and visual imagery. However, we did not find a concomitant decrease in performance on tasks which require auditory imagery, either in the full sample or only when considering those participants that reported significant deficits in both domains. These findings are discussed in relation to the mechanisms that might obscure observation of imagery deficits in auditory imagery tasks in people that report reduced auditory imagery.
Collapse
Affiliation(s)
- Zoë Pounder
- Department of Psychology, School of Social Sciences, University of Westminster, London, United Kingdom
- Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
| | - Alison F. Eardley
- Department of Psychology, School of Social Sciences, University of Westminster, London, United Kingdom
| | - Catherine Loveday
- Department of Psychology, School of Social Sciences, University of Westminster, London, United Kingdom
| | - Samuel Evans
- Department of Psychology, School of Social Sciences, University of Westminster, London, United Kingdom
- Neuroimaging, King’s College London, London, United Kingdom
| |
Collapse
|
6
|
Dureux A, Zanini A, Everling S. Mapping of facial and vocal processing in common marmosets with ultra-high field fMRI. Commun Biol 2024; 7:317. [PMID: 38480875 PMCID: PMC10937914 DOI: 10.1038/s42003-024-06002-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 03/01/2024] [Indexed: 03/17/2024] Open
Abstract
Primate communication relies on multimodal cues, such as vision and audition, to facilitate the exchange of intentions, enable social interactions, avoid predators, and foster group cohesion during daily activities. Understanding the integration of facial and vocal signals is pivotal to comprehend social interaction. In this study, we acquire whole-brain ultra-high field (9.4 T) fMRI data from awake marmosets (Callithrix jacchus) to explore brain responses to unimodal and combined facial and vocal stimuli. Our findings reveal that the multisensory condition not only intensifies activations in the occipito-temporal face patches and auditory voice patches but also engages a more extensive network that includes additional parietal, prefrontal and cingulate areas, compared to the summed responses of the unimodal conditions. By uncovering the neural network underlying multisensory audiovisual integration in marmosets, this study highlights the efficiency and adaptability of the marmoset brain in processing facial and vocal social signals, providing significant insights into primate social communication.
Collapse
Affiliation(s)
- Audrey Dureux
- Centre for Functional and Metabolic Mapping, Robarts Research Institute, University of Western Ontario, London, ON, N6A 5K8, Canada.
| | - Alessandro Zanini
- Centre for Functional and Metabolic Mapping, Robarts Research Institute, University of Western Ontario, London, ON, N6A 5K8, Canada
| | - Stefan Everling
- Centre for Functional and Metabolic Mapping, Robarts Research Institute, University of Western Ontario, London, ON, N6A 5K8, Canada
- Department of Physiology and Pharmacology, University of Western Ontario, London, ON, N6A 5K8, Canada
| |
Collapse
|
7
|
Lavan N. Left-handed voices? Examining the perceptual learning of novel person characteristics from the voice. Q J Exp Psychol (Hove) 2024:17470218241228849. [PMID: 38229446 DOI: 10.1177/17470218241228849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]
Abstract
We regularly form impressions of who a person is from their voice, such that we can readily categorise people as being female or male, child or adult, trustworthy or not, and can furthermore recognise who specifically is speaking. How we establish mental representations for such categories of person characteristics has, however, only been explored in detail for voice identity learning. In a series of experiments, we therefore set out to examine whether and how listeners can learn to recognise a novel person characteristic. We specifically asked how diagnostic acoustic properties underpinning category distinctions inform perceptual judgements. We manipulated recordings of voices to create acoustic signatures for a person's handedness (left-handed vs. right-handed) in their voice. After training, we found that listeners were able to successfully learn to recognise handedness from voices with above-chance accuracy, although no significant differences in accuracy between the different types of manipulation emerged. Listeners were, furthermore, sensitive to the specific distributions of acoustic properties that underpinned the category distinctions. We, however, also find evidence for perceptual biases that may reflect long-term prior exposure to how voices vary in naturalistic settings. These biases shape how listeners use acoustic information in the voices when forming representations for distinguishing handedness from voices. This study is thus a first step to examine how representations for novel person characteristics are established, outside of voice identity perception. We discuss our findings in light of theoretical accounts of voice perception and speculate about potential mechanisms that may underpin our results.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
8
|
Tompkinson J, Mileva M, Watt D, Mike Burton A. Perception of threat and intent to harm from vocal and facial cues. Q J Exp Psychol (Hove) 2024; 77:326-342. [PMID: 37020335 PMCID: PMC10798027 DOI: 10.1177/17470218231169952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 03/16/2023] [Accepted: 03/22/2023] [Indexed: 04/07/2023]
Abstract
What constitutes a "threatening tone of voice"? There is currently little research exploring how listeners infer threat, or the intention to cause harm, from speakers' voices. Here, we investigated the influence of key linguistic variables on these evaluations (Study 1). Results showed a trend for voices perceived to be lower in pitch, particularly those of male speakers, to be evaluated as sounding more threatening and conveying greater intent to harm. We next investigated the evaluation of multimodal stimuli comprising voices and faces varying in perceived dominance (Study 2). Visual information about the speaker's face had a significant effect on threat and intent ratings. In both experiments, we observed a relatively low level of agreement among individual listeners' evaluations, emphasising idiosyncrasy in the ways in which threat and intent-to-harm are perceived. This research provides a basis for the perceptual experience of a "threatening tone of voice," along with an exploration of vocal and facial cue integration in social evaluation.
Collapse
Affiliation(s)
- James Tompkinson
- Aston Institute for Forensic Linguistics, College of Business and Social Sciences, Aston University, Birmingham, UK
| | - Mila Mileva
- School of Psychology, University of Plymouth, Plymouth, UK
| | - Dominic Watt
- Department of Language and Linguistic Science, University of York, York, UK
| | - A Mike Burton
- Department of Psychology, University of York, York, UK
| |
Collapse
|
9
|
Stevenage SV, Edey R, Keay R, Morrison R, Robertson DJ. Familiarity Is Key: Exploring the Effect of Familiarity on the Face-Voice Correlation. Brain Sci 2024; 14:112. [PMID: 38391687 PMCID: PMC10887171 DOI: 10.3390/brainsci14020112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 01/15/2024] [Accepted: 01/19/2024] [Indexed: 02/24/2024] Open
Abstract
Recent research has examined the extent to which face and voice processing are associated by virtue of the fact that both tap into a common person perception system. However, existing findings do not yet fully clarify the role of familiarity in this association. Given this, two experiments are presented that examine face-voice correlations for unfamiliar stimuli (Experiment 1) and for familiar stimuli (Experiment 2). With care being taken to use tasks that avoid floor and ceiling effects and that use realistic speech-based voice clips, the results suggested a significant positive but small-sized correlation between face and voice processing when recognizing unfamiliar individuals. In contrast, the correlation when matching familiar individuals was significant and positive, but much larger. The results supported the existing literature suggesting that face and voice processing are aligned as constituents of an overarching person perception system. However, the difference in magnitude of their association here reinforced the view that familiar and unfamiliar stimuli are processed in different ways. This likely reflects the importance of a pre-existing mental representation and cross-talk within the neural architectures when processing familiar faces and voices, and yet the reliance on more superficial stimulus-based and modality-specific analysis when processing unfamiliar faces and voices.
Collapse
Affiliation(s)
- Sarah V Stevenage
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Edey
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Keay
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Morrison
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - David J Robertson
- Department of Psychological Sciences and Health, University of Strathclyde, Glasgow G1 1QE, UK
| |
Collapse
|
10
|
Gandolfo M, Abassi E, Balgova E, Downing PE, Papeo L, Koldewyn K. Converging evidence that left extrastriate body area supports visual sensitivity to social interactions. Curr Biol 2024; 34:343-351.e5. [PMID: 38181794 DOI: 10.1016/j.cub.2023.12.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 11/25/2023] [Accepted: 12/05/2023] [Indexed: 01/07/2024]
Abstract
Navigating our complex social world requires processing the interactions we observe. Recent psychophysical and neuroimaging studies provide parallel evidence that the human visual system may be attuned to efficiently perceive dyadic interactions. This work implies, but has not yet demonstrated, that activity in body-selective cortical regions causally supports efficient visual perception of interactions. We adopt a multi-method approach to close this important gap. First, using a large fMRI dataset (n = 92), we found that the left hemisphere extrastriate body area (EBA) responds more to face-to-face than non-facing dyads. Second, we replicated a behavioral marker of visual sensitivity to interactions: categorization of facing dyads is more impaired by inversion than non-facing dyads. Third, in a pre-registered experiment, we used fMRI-guided transcranial magnetic stimulation to show that online stimulation of the left EBA, but not a nearby control region, abolishes this selective inversion effect. Activity in left EBA, thus, causally supports the efficient perception of social interactions.
Collapse
Affiliation(s)
- Marco Gandolfo
- Donders Institute, Radboud University, Nijmegen 6525GD, the Netherlands; Department of Psychology, Bangor University, Bangor LL572AS, Gwynedd, UK.
| | - Etienne Abassi
- Institut des Sciences Cognitives, Marc Jeannerod, Lyon 69500, France
| | - Eva Balgova
- Department of Psychology, Bangor University, Bangor LL572AS, Gwynedd, UK; Department of Psychology, Aberystwyth University, Aberystwyth SY23 3UX, Ceredigion, UK
| | - Paul E Downing
- Department of Psychology, Bangor University, Bangor LL572AS, Gwynedd, UK
| | - Liuba Papeo
- Institut des Sciences Cognitives, Marc Jeannerod, Lyon 69500, France
| | - Kami Koldewyn
- Department of Psychology, Bangor University, Bangor LL572AS, Gwynedd, UK.
| |
Collapse
|
11
|
Harris I, Niven EC, Griffin A, Scott SK. Is song processing distinct and special in the auditory cortex? Nat Rev Neurosci 2023; 24:711-722. [PMID: 37783820 DOI: 10.1038/s41583-023-00743-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2023] [Indexed: 10/04/2023]
Abstract
Is the singing voice processed distinctively in the human brain? In this Perspective, we discuss what might distinguish song processing from speech processing in light of recent work suggesting that some cortical neuronal populations respond selectively to song and we outline the implications for our understanding of auditory processing. We review the literature regarding the neural and physiological mechanisms of song production and perception and show that this provides evidence for key differences between song and speech processing. We conclude by discussing the significance of the notion that song processing is special in terms of how this might contribute to theories of the neurobiological origins of vocal communication and to our understanding of the neural circuitry underlying sound processing in the human cortex.
Collapse
Affiliation(s)
- Ilana Harris
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Efe C Niven
- Institute of Cognitive Neuroscience, University College London, London, UK
| | - Alex Griffin
- Department of Psychology, University of Cambridge, Cambridge, UK
| | - Sophie K Scott
- Institute of Cognitive Neuroscience, University College London, London, UK.
| |
Collapse
|
12
|
Lavan N, McGettigan C. A model for person perception from familiar and unfamiliar voices. COMMUNICATIONS PSYCHOLOGY 2023; 1:1. [PMID: 38665246 PMCID: PMC11041786 DOI: 10.1038/s44271-023-00001-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/28/2023] [Indexed: 04/28/2024]
Abstract
When hearing a voice, listeners can form a detailed impression of the person behind the voice. Existing models of voice processing focus primarily on one aspect of person perception - identity recognition from familiar voices - but do not account for the perception of other person characteristics (e.g., sex, age, personality traits). Here, we present a broader perspective, proposing that listeners have a common perceptual goal of perceiving who they are hearing, whether the voice is familiar or unfamiliar. We outline and discuss a model - the Person Perception from Voices (PPV) model - that achieves this goal via a common mechanism of recognising a familiar person, persona, or set of speaker characteristics. Our PPV model aims to provide a more comprehensive account of how listeners perceive the person they are listening to, using an approach that incorporates and builds on aspects of the hierarchical frameworks and prototype-based mechanisms proposed within existing models of voice identity recognition.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Experimental and Biological Psychology, Queen Mary University of London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing, and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
13
|
Vogt C, Floegel M, Kasper J, Gispert-Sánchez S, Kell CA. Oxytocinergic modulation of speech production-a double-blind placebo-controlled fMRI study. Soc Cogn Affect Neurosci 2023; 18:nsad035. [PMID: 37384576 PMCID: PMC10348401 DOI: 10.1093/scan/nsad035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 05/21/2023] [Accepted: 06/16/2023] [Indexed: 07/01/2023] Open
Abstract
Many socio-affective behaviors, such as speech, are modulated by oxytocin. While oxytocin modulates speech perception, it is not known whether it also affects speech production. Here, we investigated effects of oxytocin administration and interactions with the functional rs53576 oxytocin receptor (OXTR) polymorphism on produced speech and its underlying brain activity. During functional magnetic resonance imaging, 52 healthy male participants read sentences out loud with either neutral or happy intonation, a covert reading condition served as a common baseline. Participants were studied once under the influence of intranasal oxytocin and in another session under placebo. Oxytocin administration increased the second formant of produced vowels. This acoustic feature has previously been associated with speech valence; however, the acoustic differences were not perceptually distinguishable in our experimental setting. When preparing to speak, oxytocin enhanced brain activity in sensorimotor cortices and regions of both dorsal and right ventral speech processing streams, as well as subcortical and cortical limbic and executive control regions. In some of these regions, the rs53576 OXTR polymorphism modulated oxytocin administration-related brain activity. Oxytocin also gated cortical-basal ganglia circuits involved in the generation of happy prosody. Our findings suggest that several neural processes underlying speech production are modulated by oxytocin, including control of not only affective intonation but also sensorimotor aspects during emotionally neutral speech.
Collapse
Affiliation(s)
- Charlotte Vogt
- Department of Neurology and Brain Imaging Center Frankfurt, Goethe University Frankfurt, Schleusenweg 2-16, Frankfurt am Main 60528, Germany
| | - Mareike Floegel
- Department of Neurology and Brain Imaging Center Frankfurt, Goethe University Frankfurt, Schleusenweg 2-16, Frankfurt am Main 60528, Germany
| | - Johannes Kasper
- Department of Neurology and Brain Imaging Center Frankfurt, Goethe University Frankfurt, Schleusenweg 2-16, Frankfurt am Main 60528, Germany
| | - Suzana Gispert-Sánchez
- Department of Neurology and Brain Imaging Center Frankfurt, Goethe University Frankfurt, Schleusenweg 2-16, Frankfurt am Main 60528, Germany
- Experimental Neurology, Department of Neurology, Goethe University Frankfurt, Frankfurt am Main 60528, Germany
| | - Christian A Kell
- Department of Neurology and Brain Imaging Center Frankfurt, Goethe University Frankfurt, Schleusenweg 2-16, Frankfurt am Main 60528, Germany
| |
Collapse
|
14
|
Whitling S, Botzum HM, van Mersbergen MR. Degree of Breathiness in a Synthesized Voice Signal as it Differentiates Masculine versus Feminine Voices. J Voice 2023:S0892-1997(23)00150-9. [PMID: 37280147 DOI: 10.1016/j.jvoice.2023.04.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/27/2023] [Accepted: 04/27/2023] [Indexed: 06/08/2023]
Abstract
INTRODUCTION Most studies determining speakers' perceived gender as binarily female or male are reliant on F0 perception, although other vocal parameters may also contribute to the perception of gender. The current study focused on the impact of breathiness on the perception of speakers' gender as a biological variable (feminine or masculine). METHODS n = 31 normal hearing, native English speakers, 18 female, 13 male, mean age 23 (SD = 3.54), were auditorily and visually trained in and then took part in a categorical perception task. A continuum of nine samples of the word "hello", was created in an airway modulation model of speech and voice production. Resting vocal fold length, resting vocal fold thickness, F0, and vocal tract length were fixed. Glottal width at the vocal process, posterior glottal gap, and bronchial pressure were continually modified for all stimuli. Each stimulus was randomly presented 30 times within each of the five blocks (150 presentations in total). Participants rated stimuli as binarily female or male. RESULTS Showed a sigmoidal shift in breathiness along the continuum between perceived feminine or masculine voicing. This shift was evident at stimuli four and five, indicating a nonlinear, discrete perception of breathiness among participants. Response times were also significantly slower in these two stimuli, suggesting a categorical perception of breathiness among participants. CONCLUSION Breathiness created by the change in glottal width of at least 0.21 cm may influence the perception of a speaker's perceived gender.
Collapse
Affiliation(s)
- Susanna Whitling
- Department of Logopedics, Phoniatrics and Audiology, Lund University, Lund, Sweden.
| | | | | |
Collapse
|
15
|
Nussbaum C, Pöhlmann M, Kreysa H, Schweinberger SR. Perceived naturalness of emotional voice morphs. Cogn Emot 2023; 37:731-747. [PMID: 37104118 DOI: 10.1080/02699931.2023.2200920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 04/03/2023] [Accepted: 04/05/2023] [Indexed: 04/28/2023]
Abstract
Research into voice perception benefits from manipulation software to gain experimental control over acoustic expression of social signals such as vocal emotions. Today, parameter-specific voice morphing allows a precise control of the emotional quality expressed by single vocal parameters, such as fundamental frequency (F0) and timbre. However, potential side effects, in particular reduced naturalness, could limit ecological validity of speech stimuli. To address this for the domain of emotion perception, we collected ratings of perceived naturalness and emotionality on voice morphs expressing different emotions either through F0 or Timbre only. In two experiments, we compared two different morphing approaches, using either neutral voices or emotional averages as emotionally non-informative reference stimuli. As expected, parameter-specific voice morphing reduced perceived naturalness. However, perceived naturalness of F0 and Timbre morphs were comparable with averaged emotions as reference, potentially making this approach more suitable for future research. Crucially, there was no relationship between ratings of emotionality and naturalness, suggesting that the perception of emotion was not substantially affected by a reduction of voice naturalness. We hold that while these findings advocate parameter-specific voice morphing as a suitable tool for research on vocal emotion perception, great care should be taken in producing ecologically valid stimuli.
Collapse
Affiliation(s)
- Christine Nussbaum
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany
- Voice Research Unit, Friedrich Schiller University, Jena, Germany
| | - Manuel Pöhlmann
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany
| | - Helene Kreysa
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany
- Voice Research Unit, Friedrich Schiller University, Jena, Germany
| | - Stefan R Schweinberger
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany
- Voice Research Unit, Friedrich Schiller University, Jena, Germany
- Swiss Center for Affective Sciences, University of Geneva, Switzerland
| |
Collapse
|
16
|
Hernández Blasi C, Bjorklund DF, Agut S, Lozano Nomdedeu F, Martínez MÁ. Young children's attributes are better conveyed by voices than by faces. J Exp Child Psychol 2023; 228:105606. [PMID: 36535204 DOI: 10.1016/j.jecp.2022.105606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 11/18/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022]
Abstract
The purpose of this study was to explore how young children's vocal and facial cues contribute to conveying to adults important information about children's attributes when presented together. In particular, the study aimed to disentangle whether children's vocal or facial cues, if either, are more dominant when both types of cues are displayed in a contradictory mode. To do this, we assigned 127 college students to one of three between-participants conditions. In the Voices-Only condition, participants listened to four pairs of synthetized voices simulating the voices of 4-5-year-old and 9-10-year-old children verbalizing a neutral-content sentence. Participants needed to indicate which voice was better associated with a series of 14 attributes organized into four trait dimensions (Positive Affect, Negative Affect, Intelligence, and Helpless), potentially meaningful in young child-adult interactions. In the Consistent condition, the same four pairs of voices delivered in the Voices-Only condition were presented jointly with morphed photographs of children's faces of equivalent age. In the Inconsistent condition, the four pairs of voices and faces were paired in a contradictory manner (immature voices with mature faces vs. mature voices with immature faces). Results revealed that vocal cues were more effective than facial cues in conveying young children's attributes to adults and that women were more efficient (i.e., faster) than men in responding to children's cues. These results confirm and extend previous evidence on the relevance of children's vocal cues to signaling important information about children's attributes and needs during their first 6 years of life.
Collapse
Affiliation(s)
| | - David F Bjorklund
- Department of Psychology, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Sonia Agut
- Departamento de Psicología, Universitat Jaume I, 12071 Castellón, Spain
| | | | | |
Collapse
|
17
|
Stevenage SV, Singh L, Dixey P. The Curious Case of Impersonators and Singers: Telling Voices Apart and Telling Voices Together under Naturally Challenging Listening Conditions. Brain Sci 2023; 13:brainsci13020358. [PMID: 36831901 PMCID: PMC9954053 DOI: 10.3390/brainsci13020358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/12/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Vocal identity processing depends on the ability to tell apart two instances of different speakers whilst also being able to tell together two instances of the same speaker. Whilst previous research has examined these voice processing capabilities under relatively common listening conditions, it has not yet tested the limits of these capabilities. Here, two studies are presented that employ challenging listening tasks to determine just how good we are at these voice processing tasks. In Experiment 1, 54 university students were asked to distinguish between very similar sounding, yet different speakers (celebrity targets and their impersonators). Participants completed a 'Same/Different' task and a 'Which is the Celebrity?' task to pairs of speakers, and a 'Real or Not?' task to individual speakers. In Experiment 2, a separate group of 40 university students was asked to pair very different sounding instances of the same speakers (speaking and singing). Participants were presented with an array of voice clips and completed a 'Pairs Task' as a variant of the more traditional voice sorting task. The results of Experiment 1 suggested that significantly more mistakes were made when distinguishing celebrity targets from their impersonators than when distinguishing the same targets from control voices. Nevertheless, listeners were significantly better than chance in all three tasks despite the challenge. Similarly, the results of Experiment 2 suggested that it was significantly more difficult to pair singing and speaking clips than to pair two speaking clips, particularly when the speakers were unfamiliar. Again, however, the performance was significantly above zero, and was again better than chance in a cautious comparison. Taken together, the results suggest that vocal identity processing is a highly adaptable task, assisted by familiarity with the speaker. However, the fact that performance remained above chance in all tasks suggests that we had not reached the limit of our listeners' capability, despite the considerable listening challenges introduced. We conclude that voice processing is far better than previous research might have presumed.
Collapse
|
18
|
Mamun N, Ghosh R, Hansen JHL. Familiar and unfamiliar speaker recognition assessment and system emulation for cochlear implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1293. [PMID: 36859118 PMCID: PMC10162836 DOI: 10.1121/10.0017216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 01/02/2023] [Accepted: 01/26/2023] [Indexed: 05/07/2023]
Abstract
In the area of speech processing, human speaker identification under naturalistic environments is a challenging task, especially for hearing-impaired individuals with cochlear implants (CIs) or hearing aids (HAs). Motivated by the fact that electrodograms reflect direct CI stimulation of input audio, this study proposes a speaker identification (ID) investigation using two-dimensional electrodograms constructed from the responses of a CI auditory system to emulate CI speaker ID capabilities. Features are extracted from electrodograms through an identity vector (i-vector) framework to train and generate identity models for each speaker using a Gaussian mixture model-universal background model followed by probabilistic linear discriminant analysis. To validate the proposed system, perceptual speaker ID for 20 normal hearing (NH) and seven CI listeners was evaluated with a total of 41 different speakers and compared with the scores from the proposed system. A one-way analysis of variance showed that the proposed system can reliably predict the speaker ID capability of CI (F[1,10] = 0.18, p = 0.68) and NH (F[1,20] = 0, p = 0.98) listeners in naturalistic environments. The impact of speaker familiarity is also addressed, and the results show a reduced performance for speaker recognition by CI subjects using their CI processor, highlighting limitations of current speech processing strategies used in CIs/HAs.
Collapse
Affiliation(s)
- Nursadul Mamun
- Cochlear Implant Processing Laboratory-Center for Robust Speech Systems (CRSS-CILab), The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - Ria Ghosh
- Cochlear Implant Processing Laboratory-Center for Robust Speech Systems (CRSS-CILab), The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - John H L Hansen
- Cochlear Implant Processing Laboratory-Center for Robust Speech Systems (CRSS-CILab), The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| |
Collapse
|
19
|
Rhone AE, Rupp K, Hect JL, Harford E, Tranel D, Howard MA, Abel TJ. Electrocorticography reveals the dynamics of famous voice responses in human fusiform gyrus. J Neurophysiol 2023; 129:342-346. [PMID: 36576268 PMCID: PMC9886354 DOI: 10.1152/jn.00459.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/14/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
Voice and face processing occur through convergent neural systems that facilitate speaker recognition. Neuroimaging studies suggest that familiar voice processing engages early visual cortex, including the bilateral fusiform gyrus (FG) on the basal temporal lobe. However, what role the FG plays in voice processing and whether it is driven by bottom-up or top-down mechanisms is unresolved. In this study we directly examined neural responses to famous voices and faces in human FG with direct cortical surface recordings (electrocorticography) in epilepsy surgery patients. We tested the hypothesis that neural populations in human FG respond to famous voices and investigated the temporal properties of voice responses in FG. Recordings were acquired from five adult participants during a person identification task using visual and auditory stimuli from famous speakers (U.S. Presidents Barack Obama, George W. Bush, and Bill Clinton). Patients were presented with images of presidents or clips of their voices and asked to identify the portrait/speaker. Our results demonstrate that a subset of face-responsive sites in and near FG also exhibit voice responses that are both lower in magnitude and delayed (300-600 ms) compared with visual responses. The dynamics of voice processing revealed by direct cortical recordings suggests a top-down feedback-mediated response to famous voices in FG that may facilitate speaker identification.NEW & NOTEWORTHY Interactions between auditory and visual cortices play an important role in person identification, but the dynamics of these interactions remain poorly understood. We performed direct brain recordings of fusiform face cortex in human epilepsy patients performing a famous voice naming task, revealing the dynamics of famous voice processing in human fusiform face cortex. The findings support a model of top-down interactions from auditory to visual cortex to facilitate famous voice recognition.
Collapse
Affiliation(s)
- Ariane E Rhone
- Department of Neurosurgery, University of Iowa, Iowa City, Iowa
| | - Kyle Rupp
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Jasmine L Hect
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Emily Harford
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Daniel Tranel
- Department of Psychology, University of Iowa, Iowa City, Iowa
| | | | - Taylor J Abel
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
20
|
Talker and accent familiarity yield advantages for voice identity perception: A voice sorting study. Mem Cognit 2023; 51:175-187. [PMID: 35274221 PMCID: PMC9943951 DOI: 10.3758/s13421-022-01296-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/15/2022] [Indexed: 11/08/2022]
Abstract
In the current study, we examine and compare the effects of talker and accent familiarity in the context of a voice identity sorting task, using naturally varying voice recording samples from the TV show Derry Girls. Voice samples were thus all spoken with a regional accent of UK/Irish English (from [London]derry). We tested four listener groups: Listeners were either familiar or unfamiliar with the TV show (and therefore the talker identities) and were either highly familiar or relatively less familiar with Northern Irish accents. Both talker and accent familiarity significantly improved accuracy of voice identity sorting performance. However, the talker familiarity benefits were overall larger, and more consistent. We discuss the results in light of a possible hierarchy of familiarity effects and argue that our findings may provide additional evidence for interactions of speech and identity processing pathways in voice identity perception. We also identify some key limitations in the current work and provide suggestions for future studies to address these.
Collapse
|
21
|
Lavan N. How do we describe other people from voices and faces? Cognition 2023; 230:105253. [PMID: 36215763 DOI: 10.1016/j.cognition.2022.105253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 07/29/2022] [Accepted: 08/06/2022] [Indexed: 11/07/2022]
Abstract
When seeing someone's face or hearing their voice, perceivers routinely infer information about a person's age, sex and social traits. While many experiments have explored how individual person characteristics are perceived in isolation, less is known about which person characteristics are described spontaneously from voices and faces and how descriptions may differ across modalities. In Experiment 1, participants provided free descriptions for voices and faces. These free descriptions followed similar patterns for voices and faces - and for individual identities: Participants spontaneously referred to a wide range of descriptors. Psychological descriptors, such as character traits, were used most frequently; physical characteristics, such as age and sex, were notable as they were mentioned earlier than other types of descriptors. After finding primarily similarities between modalities when analysing person descriptions across identities, Experiment 2 asked whether free descriptions encode how individual identities differ. For this purpose, the measures derived from the free descriptions were linked to voice/face discrimination judgements that are known to describe differences in perceptual properties between identity pairs. Significant relationships emerged within and across modalities, showing that free descriptions indeed encode differences between identities - information that is shared with discrimination judgements. This suggests that the two tasks tap into similar, high-level person representations. These findings show that free description data can offer valuable insights into person perception and underline that person perception is a multivariate process during which perceivers rapidly and spontaneously infer many different person characteristics to form a holistic impression of a person.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, United Kingdom.
| |
Collapse
|
22
|
Smith HMJ, Roeser J, Pautz N, Davis JP, Robson J, Wright D, Braber N, Stacey PC. Evaluating earwitness identification procedures: adapting pre-parade instructions and parade procedure. Memory 2023; 31:147-161. [PMID: 36201314 DOI: 10.1080/09658211.2022.2129065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
Abstract
Voice identification parades can be unreliable, as earwitness responses are error-prone. In this paper we tested performance across serial and sequential procedures, and varied pre-parade instructions, with the aim of reducing errors. The participants heard a target voice and later attempted to identify it from a parade. In Experiment 1 they were either warned that the target may or may not be present (standard warning) or encouraged to consider responding "not present" because of the associated risk of a wrongful conviction (strong warning). Strong warnings prompted a conservative criterion shift, with participants less likely to make a positive identification regardless of whether the target was present. In contrast to previous findings, we found no statistically reliable difference in accuracy between serial and sequential parades. Experiment 2 ruled out a potential confound in Experiment 1. Taken together, our results suggest that adapting pre-parade instructions provides a simple way of reducing the risk of false identifications.
Collapse
Affiliation(s)
- Harriet M J Smith
- Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| | - Jens Roeser
- Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| | - Nikolas Pautz
- Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| | - Josh P Davis
- School of Human Sciences, University of Greenwich, London, United Kingdom
| | - Jeremy Robson
- Leicester De Montfort Law School, De Montfort University, Leicester, United Kingdom
| | - David Wright
- English, Communications and Philosophy, Nottingham Trent University, Nottingham, United Kingdom
| | - Natalie Braber
- English, Communications and Philosophy, Nottingham Trent University, Nottingham, United Kingdom
| | - Paula C Stacey
- Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| |
Collapse
|
23
|
Sun Y, Ming L, Sun J, Guo F, Li Q, Hu X. Brain mechanism of unfamiliar and familiar voice processing: an activation likelihood estimation meta-analysis. PeerJ 2023; 11:e14976. [PMID: 36935917 PMCID: PMC10019337 DOI: 10.7717/peerj.14976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 02/08/2023] [Indexed: 03/14/2023] Open
Abstract
Interpersonal communication through vocal information is very important for human society. During verbal interactions, our vocal cord vibrations convey important information regarding voice identity, which allows us to decide how to respond to speakers (e.g., neither greeting a stranger too warmly or speaking too coldly to a friend). Numerous neural studies have shown that identifying familiar and unfamiliar voices may rely on different neural bases. However, the mechanism underlying voice identification of individuals of varying familiarity has not been determined due to vague definitions, confusion of terms, and differences in task design. To address this issue, the present study first categorized three kinds of voice identity processing (perception, recognition and identification) from speakers with different degrees of familiarity. We defined voice identity perception as passively listening to a voice or determining if the voice was human, voice identity recognition as determining if the sound heard was acoustically familiar, and voice identity identification as ascertaining whether a voice is associated with a name or face. Of these, voice identity perception involves processing unfamiliar voices, and voice identity recognition and identification involves processing familiar voices. According to these three definitions, we performed activation likelihood estimation (ALE) on 32 studies and revealed different brain mechanisms underlying processing of unfamiliar and familiar voice identities. The results were as follows: (1) familiar voice recognition/identification was supported by a network involving most regions in the temporal lobe, some regions in the frontal lobe, subcortical structures and regions around the marginal lobes; (2) the bilateral superior temporal gyrus was recruited for voice identity perception of an unfamiliar voice; (3) voice identity recognition/identification of familiar voices was more likely to activate the right frontal lobe than voice identity perception of unfamiliar voices, while voice identity perception of an unfamiliar voice was more likely to activate the bilateral temporal lobe and left frontal lobe; and (4) the bilateral superior temporal gyrus served as a shared neural basis of unfamiliar voice identity perception and familiar voice identity recognition/identification. In general, the results of the current study address gaps in the literature, provide clear definitions of concepts, and indicate brain mechanisms for subsequent investigations.
Collapse
|
24
|
Alispahic S, Pellicano E, Cutler A, Antoniou M. Auditory perceptual learning in autistic adults. Autism Res 2022; 15:1495-1507. [PMID: 35789543 DOI: 10.1002/aur.2778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 06/20/2022] [Indexed: 11/09/2022]
Abstract
The automatic retuning of phoneme categories to better adapt to the speech of a novel talker has been extensively documented across various (neurotypical) populations, including both adults and children. However, no studies have examined auditory perceptual learning effects in populations atypical in perceptual, social, and language processing for communication, such as populations with autism. Employing a classic lexically-guided perceptual learning paradigm, the present study investigated perceptual learning effects in Australian English autistic and non-autistic adults. The findings revealed that automatic attunement to existing phoneme categories was not activated in the autistic group in the same manner as for non-autistic control subjects. Specifically, autistic adults were able to both successfully discern lexical items and to categorize speech sounds; however, they did not show effects of perceptual retuning to talkers. These findings may have implications for the application of current sensory theories (e.g., Bayesian decision theory) to speech and language processing by autistic individuals. LAY SUMMARY: Lexically guided perceptual learning assists in the disambiguation of speech from a novel talker. The present study established that while Australian English autistic adult listeners were able to successfully discern lexical items and categorize speech sounds in their native language, perceptual flexibility in updating speaker-specific phonemic knowledge when exposed to a novel talker was not available. Implications for speech and language processing by autistic individuals as well as current sensory theories are discussed.
Collapse
Affiliation(s)
- Samra Alispahic
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, New South Wales, Australia
| | - Elizabeth Pellicano
- Department of Educational Studies, Macquarie University, Sydney, New South Wales, Australia
- Department of Clinical, Educational and Health Psychology, University College London, London, United Kindom
| | - Anne Cutler
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, New South Wales, Australia
- Language Comprehension Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- ARC Centre of Excellence for the Dynamics of Language, Australia
| | - Mark Antoniou
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, New South Wales, Australia
| |
Collapse
|
25
|
Gender and Context-Specific Effects of Vocal Dominance and Trustworthiness on Leadership Decisions. ADAPTIVE HUMAN BEHAVIOR AND PHYSIOLOGY 2022. [DOI: 10.1007/s40750-022-00194-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
Abstract
Objective
The evolutionary-contingency hypothesis, which suggests that preferences for leaders are context-dependent, has found relatively consistent support from research investigating leadership decisions based on facial pictures. Here, we test whether these results transfer to leadership decisions based on voice recordings. We examined how dominance and trustworthiness perceptions relate to leadership decisions in wartime and peacetime contexts and whether effects differ by a speaker’s gender. Further, we investigate two cues that might be related to leadership decisions, as well as dominance and trustworthiness perceptions: voice pitch and strength of regional accent.
Methods
We conducted a preregistered online study with 125 raters and recordings of 120 speakers (61 men, 59 women) from different parts in Germany. Raters were randomly distributed into four rating conditions: dominance, trustworthiness, hypothetical vote (wartime) and hypothetical vote (peacetime).
Results
We find that dominant speakers were more likely to be voted for in a wartime context while trustworthy speakers were more likely to be voted for in a peacetime context. Voice pitch functions as a main cue for dominance perceptions, while strength of regional accent functions as a main cue for trustworthiness perceptions.
Conclusions
This study adds to a stream of research that suggests that (a) people’s voices contain important information based on which we form social impressions and (b) we prefer different types of leaders across different contexts. Future research should disentangle effects of gender bias in leadership decisions and investigate underlying mechanisms that influence how people’s voices contribute to achieving social status.
Collapse
|
26
|
Zhang J, Tao S. Vocal Characteristics Influence Women's Perceptions of Infidelity and Relationship Investment in China. EVOLUTIONARY PSYCHOLOGY 2022; 20:14747049221108883. [PMID: 35898188 PMCID: PMC10303567 DOI: 10.1177/14747049221108883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 05/29/2022] [Accepted: 06/07/2022] [Indexed: 11/15/2022] Open
Abstract
Vocal characteristics are important cues to form social impressions. Previous studies indicated that men with masculine voices are perceived as engaging in higher rates of infidelity and being less committed to their relationship. In the current study, we examined how women in China perceive information regarding infidelity and relationship investment conveyed by the voices (voice pitch and vocal tract length) of males, and whether different vocal characteristics play a similar role in driving these impressions. In addition, we examined whether these perceptions are consistent in Chinese and English language contexts. The results indicated that women perceived men with more masculine voices (lower voice pitch and longer vocal tract length) as showing a lower likelihood of infidelity and higher relationship investment; further, women who preferred more masculine voices in long-term relationships, but not in short-term relationships, were more likely to perceive men with masculine voices as less likely to engage in infidelity and more likely to invest in their relationship. Moreover, the participants formed very similar impressions irrespective of whether the voices spoke native (Chinese) or foreign (English) languages. These results provide new evidence for the role of the voice in women's choices in selecting long-term partners.
Collapse
Affiliation(s)
- Jing Zhang
- School of Psychology, Sichuan Normal University, Chengdu, China
| | - Shuli Tao
- School of Psychology, Sichuan Normal University, Chengdu, China
| |
Collapse
|
27
|
Rinke P, Schmidt T, Beier K, Kaul R, Scharinger M. Rapid pre-attentive processing of a famous speaker: Electrophysiological effects of Angela Merkel's voice. Neuropsychologia 2022; 173:108312. [PMID: 35781011 DOI: 10.1016/j.neuropsychologia.2022.108312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 06/27/2022] [Accepted: 06/27/2022] [Indexed: 11/18/2022]
Abstract
The recognition of human speakers by their voices is a remarkable cognitive ability. Previous research has established a voice area in the right temporal cortex involved in the integration of speaker-specific acoustic features. This integration appears to occur rapidly, especially in case of familiar voices. However, the exact time course of this process is less well understood. To this end, we here investigated the automatic change detection response of the human brain while listening to the famous voice of German chancellor Angela Merkel, embedded in the context of acoustically matched voices. A classic passive oddball paradigm contrasted short word stimuli uttered by Merkel with word stimuli uttered by two unfamiliar female speakers. Electrophysiological voice processing indices from 21 participants were quantified as mismatch negativities (MMNs) and P3a differences. Cortical sources were approximated by variable resolution electromagnetic tomography. The results showed amplitude and latency effects for both MMN and P3a: The famous (familiar) voice elicited a smaller but earlier MMN than the unfamiliar voices. The P3a, by contrast, was both larger and later for the familiar than for the unfamiliar voices. Familiar-voice MMNs originated from right-hemispheric regions in temporal cortex, overlapping with the temporal voice area, while unfamiliar-voice MMNs stemmed from left superior temporal gyrus. These results suggest that the processing of a very famous voice relies on pre-attentive right temporal processing within the first 150 ms of the acoustic signal. The findings further our understanding of the neural dynamics underlying familiar voice processing.
Collapse
Affiliation(s)
- Paula Rinke
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany; Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany
| | - Tatjana Schmidt
- Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany; Faculté de biologie et de médecine, University of Lausanne, Switzerland
| | - Kjartan Beier
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany
| | - Ramona Kaul
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany
| | - Mathias Scharinger
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany; Research Center »Deutscher Sprachatlas«, Philipps-University Marburg, Germany; Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany.
| |
Collapse
|
28
|
Nussbaum C, Schirmer A, Schweinberger SR. Contributions of fundamental frequency and timbre to vocal emotion perception and their electrophysiological correlates. Soc Cogn Affect Neurosci 2022; 17:1145-1154. [PMID: 35522247 PMCID: PMC9714422 DOI: 10.1093/scan/nsac033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 04/12/2022] [Accepted: 05/06/2022] [Indexed: 01/12/2023] Open
Abstract
Our ability to infer a speaker's emotional state depends on the processing of acoustic parameters such as fundamental frequency (F0) and timbre. Yet, how these parameters are processed and integrated to inform emotion perception remains largely unknown. Here we pursued this issue using a novel parameter-specific voice morphing technique to create stimuli with emotion modulations in only F0 or only timbre. We used these stimuli together with fully modulated vocal stimuli in an event-related potential (ERP) study in which participants listened to and identified stimulus emotion. ERPs (P200 and N400) and behavioral data converged in showing that both F0 and timbre support emotion processing but do so differently for different emotions: Whereas F0 was most relevant for responses to happy, fearful and sad voices, timbre was most relevant for responses to voices expressing pleasure. Together, these findings offer original insights into the relative significance of different acoustic parameters for early neuronal representations of speaker emotion and show that such representations are predictive of subsequent evaluative judgments.
Collapse
Affiliation(s)
- Christine Nussbaum
- Correspondence should be addressed to Christine Nussbaum, Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Leutragraben 1, Jena 07743, Germany. E-mail:
| | - Annett Schirmer
- Department of Psychology, The Chinese University of Hong Kong, Shatin 999077, Hong Kong SAR,Brain and Mind Institute, The Chinese University of Hong Kong, Shatin 999077, Hong Kong SAR,Center for Cognition and Brain Studies, The Chinese University of Hong Kong, Shatin 999077, Hong Kong SAR
| | - Stefan R Schweinberger
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University, Jena 07743, Germany,Voice Research Unit, Friedrich Schiller University, Jena 07743, Germany,Swiss Center for Affective Sciences, University of Geneva, Geneva 1202, Switzerland
| |
Collapse
|
29
|
The Time Course of Emotional Authenticity Detection in Nonverbal Vocalizations. Cortex 2022; 151:116-132. [DOI: 10.1016/j.cortex.2022.02.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 12/23/2021] [Accepted: 02/16/2022] [Indexed: 11/24/2022]
|
30
|
Hernández Blasi C, Bjorklund DF, Agut S, Lozano Nomdedeu F, Martínez MÁ. Voices as Cues to Children's Needs for Caregiving. HUMAN NATURE (HAWTHORNE, N.Y.) 2022; 33:22-42. [PMID: 34881403 PMCID: PMC8964562 DOI: 10.1007/s12110-021-09418-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 11/05/2021] [Indexed: 11/26/2022]
Abstract
The aim of this study was to explore the role of voices as cues to adults of children’s needs for potential caregiving during early childhood. To this purpose, 74 college students listened to pairs of 5-year-old versus 10-year-old children verbalizing neutral-content sentences and indicated which voice was better associated with each of 14 traits, potentially meaningful in interactions between young children and adults. Results indicated that children with immature voices were perceived more positively and as being more helpless than children with mature voices. Children’s voices, regardless of the content of speech, seem to be a powerful source of information about children’s need for caregiving for parents and others during the first six years of life.
Collapse
Affiliation(s)
| | | | - Sonia Agut
- Departamento de Psicología, Universitat Jaume I, 12071, Castellón, Spain
| | | | | |
Collapse
|
31
|
Whitehead JC, Armony JL. Intra-individual Reliability of Voice- and Music-elicited Responses and their Modulation by Expertise. Neuroscience 2022; 487:184-197. [PMID: 35182696 DOI: 10.1016/j.neuroscience.2022.02.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/19/2022] [Accepted: 02/10/2022] [Indexed: 10/19/2022]
Abstract
A growing number of functional neuroimaging studies have identified regions within the temporal lobe, particularly along the planum polare and planum temporale, that respond more strongly to music than other types of acoustic stimuli, including voice. This "music preferred" regions have been reported using a variety of stimulus sets, paradigms and analysis approaches and their consistency across studies confirmed through meta-analyses. However, the critical question of intra-subject reliability of these responses has received less attention. Here, we directly assessed this important issue by contrasting brain responses to musical vs. vocal stimuli in the same subjects across three consecutive fMRI runs, using different types of stimuli. Moreover, we investigated whether these music- and voice-preferred responses were reliably modulated by expertise. Results demonstrated that music-preferred activity previously reported in temporal regions, and its modulation by expertise, exhibits a high intra-subject reliability. However, we also found that activity in some extra-temporal regions, such as the precentral and middle frontal gyri, did depend on the particular stimuli employed, which may explain why these are less consistently reported in the literature. Taken together, our findings confirm and extend the notion that specific regions in the brain consistently respond more strongly to certain socially-relevant stimulus categories, such as faces, voices and music, but that some of these responses appear to depend, at least to some extent, on the specific features of the paradigm employed.
Collapse
Affiliation(s)
- Jocelyne C Whitehead
- Douglas Mental Health University Institute, Verdun, Canada; BRAMS Laboratory, Centre for Research on Brain, Language and Music, Montreal, Canada; Integrated Program in Neuroscience, McGill University, Montreal, Canada.
| | - Jorge L Armony
- Douglas Mental Health University Institute, Verdun, Canada; BRAMS Laboratory, Centre for Research on Brain, Language and Music, Montreal, Canada; Department of Psychiatry, McGill University, Montreal, Canada
| |
Collapse
|
32
|
Paz KEDS, de Almeida AAF, Almeida LNA, Sousa ESDS, Lopes LW. Auditory Perception of Roughness and Breathiness by Dysphonic Women. J Voice 2022:S0892-1997(22)00006-6. [PMID: 35082050 DOI: 10.1016/j.jvoice.2022.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 12/31/2021] [Accepted: 01/04/2022] [Indexed: 11/25/2022]
Abstract
OBJECTIVE To investigate the auditory perception of roughness and breathiness by dysphonic women. METHODS Twenty-two dysphonic native Brazilian Portuguese women participated in this research. All participants underwent audiological evaluation and laryngeal examination to confirm the diagnosis. During the tests, they recorded the sustained vowel /Ɛ/. A speech-language pathologist performed the auditory-perceptual judgment of voice quality for these vocal samples, categorizing the general degree of vocal deviation (mild, moderate, and severe degree) and the predominant type of deviation (roughness or breathiness). Thirty-two (32) stimuli were selected from a voice database, including twenty-four (24) dysphonic voice samples and eight (8) voice samples from vocally healthy women. The authors conducted five perception experiments, being three categorization tasks (normal vs. deviated, breathy vs. nonbreathy, rough vs. nonrough) and two tasks for discriminating the degree of deviation (roughness degree and breathiness degree). RESULTS The experiments showed a difference between the answers for presence/absence of deviation, presence/absence of breathiness, and presence/absence of roughness in the stimuli, and a difference in the proportion of similar answers of dysphonic women (P < 0.001) regarding the identification of the deviation. Participants classified a large part of the deviated (57.9%), breathy (63.13%), and rough (65.31%) voices as normal. The degree of vocal deviation (P = 0.008) and the degree of roughness in the stimuli correlated positively with the proportion of similar answers of the participants. As for the discrimination of breathiness degrees, less deviated (normal and mild) voices were less discriminated, and more deviated (moderate and severe) voices were better discriminated. Regarding the discrimination of roughness degrees, only the voices with severe deviations showed good discrimination. CONCLUSION Dysphonic women had a high rate of not similar answers in the identification of normal and deviated voices. They identified more than half of the deviated voices as normal. Samples with more severe deviations were proportionally more identified as deviated by the participants. The greater the vocal deviation of the participants' voices, the smallest the number of similar answers. Participants had a high rate of not similar answers in the identification of normal and breathy voices. Dysphonic women show less ability to perceive mildly and moderately breathy voices in the breathy category. Participants had a high rate of similar answers in the identification of normal and rough voices. Dysphonic women show less ability to perceive mildly and moderately breathy voices in the breathy category. Participants show less ability to perceive only mildly roughness voices with similar responses. Dysphonic women could discriminate between voices with adjacent degrees of roughness but had a low percentage of similar answers for discrimination between voices with adjacent degrees of breathiness.
Collapse
Affiliation(s)
- Karoline Evangelista da Silva Paz
- Master degree at the Speech, Language, and Hearing Sciences Graduate Program at the Federal University of Paraíba (Universidade Federal da Paraíba-UFPB), João Pessoa, Paraíba, Brazil
| | - Anna Alice Figueiredo de Almeida
- Professor at the Speech, Language, and Hearing Sciences Graduate Program at the Federal University of Paraíba (Universidade Federal da Paraíba-UFPB), João Pessoa, Paraíba, Brazil
| | - Larissa Nadjara Alves Almeida
- Member of Integrated Voice Studies Laboratory Speech, Language, and Hearing Sciences Graduate Program at the Federal University of Paraíba (Universidade Federal da Paraíba-UFPB), João Pessoa, Paraíba, Brazil
| | - Estevão Silvestre da Silva Sousa
- Member of Integrated Voice Studies Laboratory Speech, Language, and Hearing Sciences Graduate Program at the Federal University of Paraíba (Universidade Federal da Paraíba-UFPB), João Pessoa, Paraíba, Brazil
| | - Leonardo Wanderley Lopes
- Professor at the Speech, Language, and Hearing Sciences Graduate Program at the Federal University of Paraíba (Universidade Federal da Paraíba-UFPB), João Pessoa, Paraíba, Brazil.
| |
Collapse
|
33
|
Nussbaum C, von Eiff CI, Skuk VG, Schweinberger SR. Vocal emotion adaptation aftereffects within and across speaker genders: Roles of timbre and fundamental frequency. Cognition 2021; 219:104967. [PMID: 34875400 DOI: 10.1016/j.cognition.2021.104967] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 10/22/2021] [Accepted: 11/23/2021] [Indexed: 12/12/2022]
Abstract
While the human perceptual system constantly adapts to the environment, some of the underlying mechanisms are still poorly understood. For instance, although previous research demonstrated perceptual aftereffects in emotional voice adaptation, the contribution of different vocal cues to these effects is unclear. In two experiments, we used parameter-specific morphing of adaptor voices to investigate the relative roles of fundamental frequency (F0) and timbre in vocal emotion adaptation, using angry and fearful utterances. Participants adapted to voices containing emotion-specific information in either F0 or timbre, with all other parameters kept constant at an intermediate 50% morph level. Full emotional voices and ambiguous voices were used as reference conditions. All adaptor stimuli were either of the same (Experiment 1) or opposite speaker gender (Experiment 2) of subsequently presented target voices. In Experiment 1, we found consistent aftereffects in all adaptation conditions. Crucially, aftereffects following timbre adaptation were much larger than following F0 adaptation and were only marginally smaller than those following full adaptation. In Experiment 2, adaptation aftereffects appeared massively and proportionally reduced, with differences between morph types being no longer significant. These results suggest that timbre plays a larger role than F0 in vocal emotion adaptation, and that vocal emotion adaptation is compromised by eliminating gender-correspondence between adaptor and target stimuli. Our findings also add to mounting evidence suggesting a major role of timbre in auditory adaptation.
Collapse
Affiliation(s)
- Christine Nussbaum
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany.
| | - Celina I von Eiff
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany
| | - Verena G Skuk
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany
| | - Stefan R Schweinberger
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany.
| |
Collapse
|
34
|
King LS, Salo VC, Kujawa A, Humphreys KL. Advancing the RDoC initiative through the assessment of caregiver social processes. Dev Psychopathol 2021; 33:1648-1664. [PMID: 34311802 PMCID: PMC8792111 DOI: 10.1017/s095457942100064x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The relationships infants and young children have with their caregivers are fundamental to their survival and well-being. Theorists and researchers across disciplines have attempted to describe and assess the variation in these relationships, leading to a general acceptance that caregiving is critical to understanding child functioning, including developmental psychopathology. At the same time, we lack consensus on how to assess these fundamental relationships. In the present paper, we first review research documenting the importance of the caregiver-child relationship in understanding environmental risk for psychopathology. Second, we propose that the National Institute of Mental Health's Research Domain Criteria (RDoC) initiative provides a useful framework for extending the study of children's risk for psychopathology by assessing their caregivers' social processes. Third, we describe the units of analysis for caregiver social processes, documenting how the specific subconstructs in the domain of social processes are relevant to the goal of enhancing knowledge of developmental psychopathology. Lastly, we highlight how past research can inform new directions in the study of caregiving and the parent-child relationship through this innovative extension of the RDoC initiative.
Collapse
Affiliation(s)
- Lucy S King
- Department of Psychology, Stanford University, Stanford, CA, USA
| | - Virginia C Salo
- Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA
| | - Autumn Kujawa
- Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA
| | - Kathryn L Humphreys
- Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
35
|
Lavan N, Collins MRN, Miah JFM. Audiovisual identity perception from naturally-varying stimuli is driven by visual information. Br J Psychol 2021; 113:248-263. [PMID: 34490897 DOI: 10.1111/bjop.12531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/19/2021] [Indexed: 11/30/2022]
Abstract
Identity perception often takes place in multimodal settings, where perceivers have access to both visual (face) and auditory (voice) information. Despite this, identity perception is usually studied in unimodal contexts, where face and voice identity perception are modelled independently from one another. In this study, we asked whether and how much auditory and visual information contribute to audiovisual identity perception from naturally-varying stimuli. In a between-subjects design, participants completed an identity sorting task with either dynamic video-only, audio-only or dynamic audiovisual stimuli. In this task, participants were asked to sort multiple, naturally-varying stimuli from three different people by perceived identity. We found that identity perception was more accurate for video-only and audiovisual stimuli compared with audio-only stimuli. Interestingly, there was no difference in accuracy between video-only and audiovisual stimuli. Auditory information nonetheless played a role alongside visual information as audiovisual identity judgements per stimulus could be predicted from both auditory and visual identity judgements, respectively. While the relationship was stronger for visual information and audiovisual information, auditory information still uniquely explained a significant portion of the variance in audiovisual identity judgements. Our findings thus align with previous theoretical and empirical work that proposes that, compared with faces, voices are an important but relatively less salient and a weaker cue to identity perception. We expand on this work to show that, at least in the context of this study, having access to voices in addition to faces does not result in better identity perception accuracy.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Madeleine Rose Niamh Collins
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Jannatul Firdaus Monisha Miah
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| |
Collapse
|
36
|
Krasotkina A, Götz A, Höhle B, Schwarzer G. Perceptual narrowing in face- and speech-perception domains in infancy: A longitudinal approach. Infant Behav Dev 2021; 64:101607. [PMID: 34274849 DOI: 10.1016/j.infbeh.2021.101607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 06/30/2021] [Accepted: 07/04/2021] [Indexed: 10/20/2022]
Abstract
During the first year of life, infants undergo a process known as perceptual narrowing, which reduces their sensitivity to classes of stimuli which the infants do not encounter in their environment. It has been proposed that perceptual narrowing for faces and speech may be driven by shared domain-general processes. To investigate this theory, our study longitudinally tested 50 German Caucasian infants with respect to these domains first at 6 months of age followed by a second testing at 9 months of age. We used an infant-controlled habituation-dishabituation paradigm to test the infants' ability to discriminate among other-race Asian faces and non-native Cantonese speech tones, as well as same-race Caucasian faces as a control. We found that while at 6 months of age infants could discriminate among all stimuli, by 9 months of age they could no longer discriminate among other-race faces or non-native tones. However, infants could discriminate among same-race stimuli both at 6 and at 9 months of age. These results demonstrate that the same infants undergo perceptual narrowing for both other-race faces and non-native speech tones between the ages of 6 and 9 months. This parallel development of perceptual narrowing occurring in both the face and speech perception modalities over the same period of time lends support to the domain-general theory of perceptual narrowing in face and speech perception.
Collapse
|
37
|
Abstract
The way we process language is influenced by our experience. We are more likely to attend to features that proved to be useful in the past. Importantly, the size of individuals’ social network can influence their experience, and consequently, how they process language. In the case of voice recognition, having a larger social network might provide more variable input and thus enhance the ability to recognise new voices. On the other hand, learning to recognise voices is more demanding and less beneficial for people with a larger social network as they have more speakers to learn yet spend less time with each. This paper tests whether social network size influences voice recognition, and if so, in which direction. Native Dutch speakers listed their social network and performed a voice recognition task. Results showed that people with larger social networks were poorer at learning to recognise voices. Experiment 2 replicated the results with a British sample and English stimuli. Experiment 3 showed that the effect does not generalise to voice recognition in an unfamiliar language suggesting that social network size influences attention to the linguistic rather than non-linguistic markers that differentiate speakers. The studies thus show that our social network size influences our inclination to learn speaker-specific patterns in our environment, and consequently, the development of skills that rely on such learned patterns, such as voice recognition.
Collapse
|
38
|
Fast Periodic Auditory Stimulation Reveals a Robust Categorical Response to Voices in the Human Brain. eNeuro 2021; 8:ENEURO.0471-20.2021. [PMID: 34016602 PMCID: PMC8225406 DOI: 10.1523/eneuro.0471-20.2021] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 03/03/2021] [Accepted: 04/04/2021] [Indexed: 11/21/2022] Open
Abstract
Voices are arguably among the most relevant sounds in humans' everyday life, and several studies have suggested the existence of voice-selective regions in the human brain. Despite two decades of research, defining the human brain regions supporting voice recognition remains challenging. Moreover, whether neural selectivity to voices is merely driven by acoustic properties specific to human voices (e.g., spectrogram, harmonicity), or whether it also reflects a higher-level categorization response is still under debate. Here, we objectively measured rapid automatic categorization responses to human voices with fast periodic auditory stimulation (FPAS) combined with electroencephalography (EEG). Participants were tested with stimulation sequences containing heterogeneous non-vocal sounds from different categories presented at 4 Hz (i.e., four stimuli/s), with vocal sounds appearing every three stimuli (1.333 Hz). A few minutes of stimulation are sufficient to elicit robust 1.333 Hz voice-selective focal brain responses over superior temporal regions of individual participants. This response is virtually absent for sequences using frequency-scrambled sounds, but is clearly observed when voices are presented among sounds from musical instruments matched for pitch and harmonicity-to-noise ratio (HNR). Overall, our FPAS paradigm demonstrates that the human brain seamlessly categorizes human voices when compared with other sounds including musical instruments' sounds matched for low level acoustic features and that voice-selective responses are at least partially independent from low-level acoustic features, making it a powerful and versatile tool to understand human auditory categorization in general.
Collapse
|
39
|
Strelnikov K, Hervault M, Laurent L, Barone P. When two is worse than one: The deleterious impact of multisensory stimulation on response inhibition. PLoS One 2021; 16:e0251739. [PMID: 34014959 PMCID: PMC8136741 DOI: 10.1371/journal.pone.0251739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 05/01/2021] [Indexed: 11/18/2022] Open
Abstract
Multisensory facilitation is known to improve the perceptual performances and reaction times of participants in a wide range of tasks, from detection and discrimination to memorization. We asked whether a multimodal signal can similarly improve action inhibition using the stop-signal paradigm. Indeed, consistent with a crossmodal redundant signal effect that relies on multisensory neuronal integration, the threshold for initiating behavioral responses is known for being reached faster with multisensory stimuli. To evaluate whether this phenomenon also occurs for inhibition, we compared stop signals in unimodal (human faces or voices) versus audiovisual modalities in natural or degraded conditions. In contrast to the expected multisensory facilitation, we observed poorer inhibition efficiency in the audiovisual modality compared with the visual and auditory modalities. This result was corroborated by both response probabilities and stop-signal reaction times. The visual modality (faces) was the most effective. This is the first demonstration of an audiovisual impairment in the domain of perception and action. It suggests that when individuals are engaged in a high-level decisional conflict, bimodal stimulation is not processed as a simple multisensory object improving the performance but is perceived as concurrent visual and auditory information. This absence of unity increases task demand and thus impairs the ability to revise the response.
Collapse
Affiliation(s)
- Kuzma Strelnikov
- Brain & Cognition Research Center (CerCo), University of Toulouse 3 –CNRS, Toulouse, France
- Purpan University Hospital, Toulouse, France
- * E-mail:
| | - Mario Hervault
- Brain & Cognition Research Center (CerCo), University of Toulouse 3 –CNRS, Toulouse, France
| | - Lidwine Laurent
- Brain & Cognition Research Center (CerCo), University of Toulouse 3 –CNRS, Toulouse, France
| | - Pascal Barone
- Brain & Cognition Research Center (CerCo), University of Toulouse 3 –CNRS, Toulouse, France
| |
Collapse
|
40
|
Nuyen B, Kandathil C, McDonald D, Thomas J, Most SP. The impact of living with transfeminine vocal gender dysphoria: Health utility outcomes assessment. INTERNATIONAL JOURNAL OF TRANSGENDER HEALTH 2021; 24:99-107. [PMID: 36713148 PMCID: PMC9879186 DOI: 10.1080/26895269.2021.1919277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Background: The voice signals a tremendous amount of gender cues. Transfeminine individuals report debilitating quality-of-life deficits as a result of their vocal gender dysphoria.Aims: We aimed to quantify the potential impact of this dysphoria experienced with quality-adjusted life years (QALYs), as well as associated treatments, through validated health utility measures. Methods: Peri-operative phonometric audio recordings of a consented transfeminine patient volunteer with a history of vocal gender dysphoria aided in the description of two transfeminine health states, pre- and post-vocal feminization gender dysphoria; monocular and binocular blindness were health state controls. Survey responses from general population adults rate these four health states via visual analogue scale (VAS), standard gamble (SG), and time tradeoff (TTO). Results: Survey respondents totaled 206 with a mean age of 35.8 years. Through VAS measures, these general adult respondents on average perceived a year of life with transfeminine vocal gender dysphoria as approximately three-quarters of a life-year of perfect health. Respondents also on average would have risked a 15%-20% chance of death on SG analysis and would have sacrificed 10 years of their remaining life on TTO measures to cure the condition. The QALY scores for the post-gender affirming treatments for vocal gender dysphoria (+0.09 VAS, p < 0.01) were significantly higher compared to the pretreatment state. There were no differences in the severity of these QALY scores by survey respondent's political affiliation or gender identity. Conclusions: To our knowledge, this study is the first to quantify how the general population perceives the health burden of vocal gender dysphoria experienced by transfeminine patients. Feminization treatments including voice therapy with feminization laryngoplasty appear to significantly increase health utility scores.
Collapse
Affiliation(s)
- Brian Nuyen
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Cherian Kandathil
- Division of Facial Plastic and Reconstructive Surgery, Stanford University School of Medicine, Stanford, California, USA
| | - Daniella McDonald
- Medical Scientist Training Program, University of California, San Diego School of Medicine, La Jolla, California, USA
| | - James Thomas
- Clinic for Voice Disorders, Portland, Oregon, USA
| | - Sam P. Most
- Division of Facial Plastic and Reconstructive Surgery, Stanford University School of Medicine, Stanford, California, USA
| |
Collapse
|
41
|
Holmes E, Johnsrude IS. Speech-evoked brain activity is more robust to competing speech when it is spoken by someone familiar. Neuroimage 2021; 237:118107. [PMID: 33933598 DOI: 10.1016/j.neuroimage.2021.118107] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 04/19/2021] [Accepted: 04/25/2021] [Indexed: 10/21/2022] Open
Abstract
When speech is masked by competing sound, people are better at understanding what is said if the talker is familiar compared to unfamiliar. The benefit is robust, but how does processing of familiar voices facilitate intelligibility? We combined high-resolution fMRI with representational similarity analysis to quantify the difference in distributed activity between clear and masked speech. We demonstrate that brain representations of spoken sentences are less affected by a competing sentence when they are spoken by a friend or partner than by someone unfamiliar-effectively, showing a cortical signal-to-noise ratio (SNR) enhancement for familiar voices. This effect correlated with the familiar-voice intelligibility benefit. We functionally parcellated auditory cortex, and found that the most prominent familiar-voice advantage was manifest along the posterior superior and middle temporal gyri. Overall, our results demonstrate that experience-driven improvements in intelligibility are associated with enhanced multivariate pattern activity in posterior temporal cortex.
Collapse
Affiliation(s)
- Emma Holmes
- The Brain and Mind Institute, University of Western Ontario, London, Ontario, N6A 3K7, Canada.
| | - Ingrid S Johnsrude
- The Brain and Mind Institute, University of Western Ontario, London, Ontario, N6A 3K7, Canada; School of Communication Sciences and Disorders, University of Western Ontario, London, Ontario, London, N6G 1H1, Canada
| |
Collapse
|
42
|
Jenkins RE, Tsermentseli S, Monks CP, Robertson DJ, Stevenage SV, Symons AE, Davis JP. Are super‐face‐recognisers also super‐voice‐recognisers? Evidence from cross‐modal identification tasks. APPLIED COGNITIVE PSYCHOLOGY 2021. [DOI: 10.1002/acp.3813] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Ryan E. Jenkins
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - Stella Tsermentseli
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - Claire P. Monks
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - David J. Robertson
- School of Psychological Sciences and Health University of Strathclyde Glasgow UK
| | | | - Ashley E. Symons
- Department of Psychology University of Southampton Southampton UK
| | - Josh P. Davis
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| |
Collapse
|
43
|
Lã FMB, Polo N, Granqvist S, Cova T, Pais AC. Female Voice-Related Sexual Attractiveness to Males: Does it Vary With Different Degrees of Conception Likelihood? J Voice 2021; 37:467.e19-467.e31. [PMID: 33678535 DOI: 10.1016/j.jvoice.2021.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 02/05/2021] [Accepted: 02/05/2021] [Indexed: 11/28/2022]
Abstract
Previous investigations have found that female voice-related attractiveness to males increases when both conception likelihood (CL) and voice fundamental frequency (fo) are elevated. To test this hypothesis, we conducted a perceptual experiment where 78 heterosexual males rated sexual attractiveness of 9 female voice samples, recorded at menstrual, follicular and luteal phases of the menstrual cycle under two double-blinded randomly allocated conditions: a natural menstrual cycle (placebo condition) and when using an oral contraceptive pill (OCP condition). The voice samples yielded a total of 54 stimuli that were visually sorted and rated using Visor software. Concentrations of estrogens, progesterone and testosterone were analyzed, and measurements of speaking fundamental frequency (sfo) and its standard deviation (sfoSD), fo derivative (dfo) and fo slope were made. A multilevel ordinal logistic regression model nested in listeners and in females, and adjusted by phase and condition, was carried out to assess the association between ratings and: (1) phases and conditions; (2) sex steroid hormonal concentrations; and (3) voice parameters. A high probability of obtaining high ratings of voice sexual attractiveness was found for: (1) menstrual phase of placebo use and follicular phase of OCP use; (2) for low estradiol to progesterone ratio and testosterone concentrations; and (3) for low dfo. The latter showed a moderate statistical association with ratings of high attractiveness, as compared with the small association found for the remaining variables. It seems that the voice is a weak cue for female CL. Female sexual attraction to males may be a consequence of what females do in order to regulate their extended sexuality across the menstrual cycle rather than of estrus cues, the use of paralinguistic speech patterns being an example.
Collapse
Affiliation(s)
- Filipa M B Lã
- Faculty of Education, National University of Distance Learning, Madrid, Spain; Centre of Social Studies, University of Coimbra, Coimbra, Portugal.
| | - Nuria Polo
- Faculty of Philology, National University of Distance Learning, Madrid, Spain
| | - Svante Granqvist
- KTH Royal Institute of Technology, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Department of Biomedical engineering and Health systems, Karolinska University Hospital, KTH Royal Institute of Technology, Huddinge, Stockholm, Sweden; Karolinska Institute, Department of Clinical Science, Intervention and Technology (CLINTEC), Division of Speech and Language Pathology, Huddinge, Stockholm, Sweden
| | - Tânia Cova
- Coimbra Chemistry Center, University of Coimbra, Coimbra, Portugal
| | - Alberto C Pais
- Coimbra Chemistry Center, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
44
|
Acoustic salience in emotional voice perception and its relationship with hallucination proneness. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2021; 21:412-425. [DOI: 10.3758/s13415-021-00864-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/23/2020] [Indexed: 01/01/2023]
|
45
|
Johnson J, McGettigan C, Lavan N. Comparing unfamiliar voice and face identity perception using identity sorting tasks. Q J Exp Psychol (Hove) 2020; 73:1537-1545. [PMID: 32530364 PMCID: PMC7534197 DOI: 10.1177/1747021820938659] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 02/11/2020] [Accepted: 03/03/2020] [Indexed: 11/16/2022]
Abstract
Identity sorting tasks, in which participants sort multiple naturally varying stimuli of usually two identities into perceived identities, have recently gained popularity in voice and face processing research. In both modalities, participants who are unfamiliar with the identities tend to perceive multiple stimuli of the same identity as different people and thus fail to "tell people together." These similarities across modalities suggest that modality-general mechanisms may underpin sorting behaviour. In this study, participants completed a voice sorting and a face sorting task. Taking an individual differences approach, we asked whether participants' performance on voice and face sorting of unfamiliar identities is correlated. Participants additionally completed a voice discrimination (Bangor Voice Matching Test) and a face discrimination task (Glasgow Face Matching Test). Using these tasks, we tested whether performance on sorting related to explicit identity discrimination. Performance on voice sorting and face sorting tasks was correlated, suggesting that common modality-general processes underpin these tasks. However, no significant correlations were found between sorting and discrimination performance, with the exception of significant relationships for performance on "same identity" trials with "telling people together" for voices and faces. Overall, any reported relationships were however relatively weak, suggesting the presence of additional modality-specific and task-specific processes.
Collapse
Affiliation(s)
- Justine Johnson
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
46
|
Russo AG, De Martino M, Mancuso A, Iaconetta G, Manara R, Elia A, Laudanna A, Di Salle F, Esposito F. Semantics-weighted lexical surprisal modeling of naturalistic functional MRI time-series during spoken narrative listening. Neuroimage 2020; 222:117281. [PMID: 32828929 DOI: 10.1016/j.neuroimage.2020.117281] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Revised: 06/22/2020] [Accepted: 08/13/2020] [Indexed: 11/16/2022] Open
Abstract
Probabilistic language models are increasingly used to provide neural representations of linguistic features under naturalistic settings. Word surprisal models can be applied to continuous fMRI recordings during task-free listening of narratives, to detect regions linked to language prediction and comprehension. Here, to this purpose, a novel semantics-weighted lexical surprisal is applied to naturalistic fMRI data. FMRI was performed at 3 Tesla in 31 subjects during task-free listening to a 12-minute audiobook played in both original and word-reversed (control) version. Lexical-only and semantics-weighted lexical surprisal models were estimated for the original and control word series. The two series were alternatively chosen to build the predictor of interest in the first-level general linear model and were compared in the second-level (group) analysis. The addition of the surprisal predictor to the stimulus-related predictors significantly improved the fitting of the neural signal. In average, the semantics-weighted model yielded lower surprisal values and, in some areas, better fitting of the fMRI data compared to the lexical-only model. The two models produced both overlapping and distinct activations: while lexical-only surprisal activated secondary auditory areas in the superior temporal gyri and the cerebellum, semantics-weighted surprisal additionally activated the left inferior frontal gyrus. These results confirm the usefulness of surprisal models in the naturalistic fMRI analysis of linguistic processes and suggest that the use of semantic information may increase the sensitivity of a probabilistic language model in higher-order language-related areas, with possible implications for future naturalistic fMRI studies of language under normal and (clinically or pharmacologically) modified conditions.
Collapse
Affiliation(s)
- Andrea G Russo
- Department of Political and Communication Sciences, University of Salerno, Fisciano, Salerno, Italy; Department of Medicine, Surgery and Dentistry, "Scuola Medica Salernitana", University of Salerno, Baronissi, Salerno, Italy.
| | - Maria De Martino
- Department of Political and Communication Sciences, University of Salerno, Fisciano, Salerno, Italy
| | - Azzurra Mancuso
- Department of Political and Communication Sciences, University of Salerno, Fisciano, Salerno, Italy
| | - Giorgio Iaconetta
- Department of Medicine, Surgery and Dentistry, "Scuola Medica Salernitana", University of Salerno, Baronissi, Salerno, Italy; Department of Diagnostic Imaging, University Hospital "San Giovanni di Dio e Ruggi D'Aragona", Salerno, Italy
| | - Renzo Manara
- Department of Medicine, Surgery and Dentistry, "Scuola Medica Salernitana", University of Salerno, Baronissi, Salerno, Italy; Department of Diagnostic Imaging, University Hospital "San Giovanni di Dio e Ruggi D'Aragona", Salerno, Italy
| | - Annibale Elia
- Department of Political and Communication Sciences, University of Salerno, Fisciano, Salerno, Italy
| | - Alessandro Laudanna
- Department of Political and Communication Sciences, University of Salerno, Fisciano, Salerno, Italy
| | - Francesco Di Salle
- Department of Medicine, Surgery and Dentistry, "Scuola Medica Salernitana", University of Salerno, Baronissi, Salerno, Italy; Department of Diagnostic Imaging, University Hospital "San Giovanni di Dio e Ruggi D'Aragona", Salerno, Italy
| | - Fabrizio Esposito
- Department of Medicine, Surgery and Dentistry, "Scuola Medica Salernitana", University of Salerno, Baronissi, Salerno, Italy; Department of Diagnostic Imaging, University Hospital "San Giovanni di Dio e Ruggi D'Aragona", Salerno, Italy
| |
Collapse
|
47
|
Abrams DA, Kochalka J, Bhide S, Ryali S, Menon V. Intrinsic functional architecture of the human speech processing network. Cortex 2020; 129:41-56. [PMID: 32428761 DOI: 10.1016/j.cortex.2020.03.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 02/12/2020] [Accepted: 03/26/2020] [Indexed: 11/20/2022]
Abstract
Speech engages distributed temporo-fronto-parietal brain regions, however a comprehensive understanding of its intrinsic functional network architecture is lacking. Here we investigate the human speech processing network using the largest sample to date, high temporal resolution resting-state fMRI data, network stability analysis, and theoretically informed models. Network consensus analysis revealed three stable functional modules encompassing: (1) superior temporal plane (STP) and Area Spt, (2) superior temporal sulcus (STS) + ventral frontoparietal cortex, and (3) dorsal frontoparietal cortex. The STS + ventral frontoparietal cortex module showed the highest participation coefficient, and a hub-like organization linking STP with frontoparietal cortical nodes. Node-wise analysis revealed key connectivity features underlying this modular architecture, including a leftward asymmetric connectivity profile, and differential connectivity of STS and STP, with frontoparietal cortex. Our findings, replicated across cohorts, reveal a tripartite functional network architecture supporting speech processing and provide a novel template for future studies.
Collapse
Affiliation(s)
- Daniel A Abrams
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA.
| | - John Kochalka
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Sayuli Bhide
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Srikanth Ryali
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Vinod Menon
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA; Program in Neuroscience, Stanford University School of Medicine, Stanford, CA, USA; Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
48
|
Geangu E, Vuong QC. Look up to the body: An eye-tracking investigation of 7-months-old infants' visual exploration of emotional body expressions. Infant Behav Dev 2020; 60:101473. [PMID: 32739668 DOI: 10.1016/j.infbeh.2020.101473] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 07/22/2020] [Accepted: 07/22/2020] [Indexed: 02/02/2023]
Abstract
The human body is an important source of information to infer a person's emotional state. Research with adult observers indicate that the posture of the torso, arms and hands provide important perceptual cues for recognising anger, fear and happy expressions. Much less is known about whether infants process body regions differently for different body expressions. To address this issue, we used eye tracking to investigate whether infants' visual exploration patterns differed when viewing body expressions. Forty-eight 7-months-old infants were randomly presented with static images of adult female bodies expressing anger, fear and happiness, as well as an emotionally-neutral posture. Facial cues to emotional state were removed by masking the faces. We measured the proportion of looking time, proportion and number of fixations, and duration of fixations on the head, upper body and lower body regions for the different expressions. We showed that infants explored the upper body more than the lower body. Importantly, infants at this age fixated differently on different body regions depending on the expression of the body posture. In particular, infants spent a larger proportion of their looking times and had longer fixation durations on the upper body for fear relative to the other expressions. These results extend and replicate the information about infant processing of emotional expressions displayed by human bodies, and they support the hypothesis that infants' visual exploration of human bodies is driven by the upper body.
Collapse
|
49
|
Schirmer A. Frühholz, S., & Belin, P. (2018). Oxford Handbook of Voice Perception. Perception 2020. [DOI: 10.1177/0301006620938229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
50
|
Xu M, Tachibana RO, Okanoya K, Hagiwara H, Hashimoto RI, Homae F. Unconscious and Distinctive Control of Vocal Pitch and Timbre During Altered Auditory Feedback. Front Psychol 2020; 11:1224. [PMID: 32581975 PMCID: PMC7294928 DOI: 10.3389/fpsyg.2020.01224] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 05/11/2020] [Indexed: 01/01/2023] Open
Abstract
Vocal control plays a critical role in smooth social communication. Speakers constantly monitor auditory feedback (AF) and make adjustments when their voices deviate from their intentions. Previous studies have shown that when certain acoustic features of the AF are artificially altered, speakers compensate for this alteration in the opposite direction. However, little is known about how the vocal control system implements compensations for alterations of different acoustic features, and associates them with subjective consciousness. The present study investigated whether compensations for the fundamental frequency (F0), which corresponds to perceived pitch, and formants, which contribute to perceived timbre, can be performed unconsciously and independently. Forty native Japanese speakers received two types of altered AF during vowel production that involved shifts of either only the formant frequencies (formant modification; Fm) or both the pitch and formant frequencies (pitch + formant modification; PFm). For each type, three levels of shift (slight, medium, and severe) in both directions (increase or decrease) were used. After the experiment, participants were tested for whether they had perceived a change in the F0 and/or formants. The results showed that (i) only formants were compensated for in the Fm condition, while both the F0 and formants were compensated for in the PFm condition; (ii) the F0 compensation exhibited greater precision than the formant compensation in PFm; and (iii) compensation occurred even when participants misperceived or could not explicitly perceive the alteration in AF. These findings indicate that non-experts can compensate for both formant and F0 modifications in the AF during vocal production, even when the modifications are not explicitly or correctly perceived, which provides further evidence for a dissociation between conscious perception and action in vocal control. We propose that such unconscious control of voice production may enhance rapid adaptation to changing speech environments and facilitate mutual communication.
Collapse
Affiliation(s)
- Mingdi Xu
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan
| | - Ryosuke O Tachibana
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Kazuo Okanoya
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Hiroko Hagiwara
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan.,Research Center for Language, Brain and Genetics, Tokyo Metropolitan University, Tokyo, Japan
| | - Ryu-Ichiro Hashimoto
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan.,Research Center for Language, Brain and Genetics, Tokyo Metropolitan University, Tokyo, Japan
| | - Fumitaka Homae
- Department of Language Sciences, Graduate School of Humanities, Tokyo Metropolitan University, Tokyo, Japan.,Research Center for Language, Brain and Genetics, Tokyo Metropolitan University, Tokyo, Japan
| |
Collapse
|