1
|
Alispahic S, Pellicano E, Cutler A, Antoniou M. Multiple talker processing in autistic adult listeners. Sci Rep 2024; 14:14698. [PMID: 38926416 PMCID: PMC11208580 DOI: 10.1038/s41598-024-62429-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 05/16/2024] [Indexed: 06/28/2024] Open
Abstract
Accommodating talker variability is a complex and multi-layered cognitive process. It involves shifting attention to the vocal characteristics of the talker as well as the linguistic content of their speech. Due to an interdependence between voice and phonological processing, multi-talker environments typically incur additional processing costs compared to single-talker environments. A failure or inability to efficiently distribute attention over multiple acoustic cues in the speech signal may have detrimental language learning consequences. Yet, no studies have examined effects of multi-talker processing in populations with atypical perceptual, social and language processing for communication, including autistic people. Employing a classic word-monitoring task, we investigated effects of talker variability in Australian English autistic (n = 24) and non-autistic (n = 28) adults. Listeners responded to target words (e.g., apple, duck, corn) in randomised sequences of words. Half of the sequences were spoken by a single talker and the other half by multiple talkers. Results revealed that autistic participants' sensitivity scores to accurately-spotted target words did not differ to those of non-autistic participants, regardless of whether they were spoken by a single or multiple talkers. As expected, the non-autistic group showed the well-established processing cost associated with talker variability (e.g., slower response times). Remarkably, autistic listeners' response times did not differ across single- or multi-talker conditions, indicating they did not show perceptual processing costs when accommodating talker variability. The present findings have implications for theories of autistic perception and speech and language processing.
Collapse
Affiliation(s)
- Samra Alispahic
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia.
| | - Elizabeth Pellicano
- Department of Educational Studies, Macquarie University, Sydney, Australia
- Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | - Anne Cutler
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- ARC Centre of Excellence for the Dynamics of Language, Clayton, Australia
| | - Mark Antoniou
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
2
|
Ziereis A, Schacht A. Validation of scrambling methods for vocal affect bursts. Behav Res Methods 2024; 56:3089-3101. [PMID: 37673809 PMCID: PMC11133081 DOI: 10.3758/s13428-023-02222-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/14/2023] [Indexed: 09/08/2023]
Abstract
Studies on perception and cognition require sound methods allowing us to disentangle the basic sensory processing of physical stimulus properties from the cognitive processing of stimulus meaning. Similar to the scrambling of images, the scrambling of auditory signals is aimed at creating stimulus instances that are unrecognizable but have comparable low-level features. In the present study, we generated scrambled stimuli of short vocalizations taken from the Montreal Affective Voices database (Belin et al., Behav Res Methods, 40(2):531-539, 2008) by applying four different scrambling methods (frequency-, phase-, and two time-scrambling transformations). The original stimuli and their scrambled versions were judged by 60 participants for the apparency of a human voice, gender, and valence of the expressions, or, if no human voice was detected, for the valence of the subjective response to the stimulus. The human-likeness ratings were reduced for all scrambled versions relative to the original stimuli, albeit to a lesser extent for phase-scrambled versions of neutral bursts. For phase-scrambled neutral bursts, valence ratings were equivalent to those of the original neutral burst. All other scrambled versions were rated as slightly unpleasant, indicating that they should be used with caution due to their potential aversiveness.
Collapse
Affiliation(s)
- Annika Ziereis
- Department for Cognition, Emotion and Behavior, Affective Neuroscience and Psychophysiology Laboratory, Institute of Psychology, University of Göttingen, Göttingen, Germany.
| | - Anne Schacht
- Department for Cognition, Emotion and Behavior, Affective Neuroscience and Psychophysiology Laboratory, Institute of Psychology, University of Göttingen, Göttingen, Germany
| |
Collapse
|
3
|
Human voices escape the auditory attentional blink: Evidence from detections and pupil responses. Brain Cogn 2023; 165:105928. [PMID: 36459865 DOI: 10.1016/j.bandc.2022.105928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/30/2022] [Accepted: 11/03/2022] [Indexed: 11/30/2022]
Abstract
Attentional selection of a second target in a rapid stream of stimuli embedding two targets tends to be briefly impaired when two targets are presented in close temporal proximity, an effect known as an attentional blink (AB). Two target sounds (T1 and T2) were embedded in a rapid serial auditory presentation of environmental sounds with a short (Lag 3) or long lag (Lag 9). Participants were to first identify T1 (bell or sine tone) and then to detect T2 (present or absent). Individual stimuli had durations of either 30 or 90 ms, and were presented in streams of 20 sounds. The T2 varied in category: human voice, cello, or dog sound. Previous research has introduced pupillometry as a useful marker of the intensity of cognitive processing and attentional allocation in the visual AB paradigm. Results suggest that the interplay of stimulus factors is critical for target detection accuracy and provides support for the hypothesis that the human voice is the least likely to show an auditory AB (in the 90 ms condition). For the other stimuli, accuracy for T2 was significantly worse at Lag 3 than at Lag 9 in the 90 ms condition, suggesting the presence of an auditory AB. When AB occurred (at Lag 3), we observed smaller pupil dilations, time-locked to the onset of T2, compared to Lag 9, reflecting lower attentional processing when 'blinking' during target detection. Taken together, these findings support the conclusion that human voices escape the AB and that the pupillary changes are consistent with the so-called T2 attentional deficit. In addition, we found some indication that salient stimuli like human voices could require a less intense allocation of attention, or noradrenergic potentiation, compared to other auditory stimuli.
Collapse
|
4
|
Chaminade T, Spatola N. Perceived facial happiness during conversation correlates with insular and hypothalamus activity for humans, not robots. Front Psychol 2022; 13:871676. [PMID: 36262453 PMCID: PMC9575595 DOI: 10.3389/fpsyg.2022.871676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 08/17/2022] [Indexed: 11/13/2022] Open
Abstract
Emotional contagion, in particular of happiness, is essential to creating social bonds. The somatic marker hypothesis posits that embodied physiological changes associated with emotions and relayed to the brain by the autonomous nervous system influence behavior. Perceiving others’ positive emotions should thus be associated with activity in brain regions relaying information from and to the autonomic nervous system. Here, we address this question using a unique corpus of brain activity recorded during unconstrained conversations between participants and a human or a humanoid robot. fMRI recordings are used to test whether activity in key brain regions of the autonomic system, the amygdala, hypothalamus, and insula, is differentially affected by the level of happiness expressed by the human and robot agents. Results indicate that for the hypothalamus and the insula, in particular the anterior agranular region strongly involved in processing social emotions, activity in the right hemisphere increases with the level of happiness expressed by the human but not the robot. Perceiving positive emotions in social interactions induces local brain responses predicted by the contagion of somatic markers of emotions only when the interacting agent is a fellow human.
Collapse
Affiliation(s)
- Thierry Chaminade
- Institut de Neurosciences de la Timone, UMR 7289, Aix-Marseille Université-CNRS, Marseille, France
- *Correspondence: Thierry Chaminade,
| | - Nicolas Spatola
- Center for Human Technologies, Istituto Italiano di Tecnologia, Genoa, Italy
| |
Collapse
|
5
|
Socially meaningful visual context either enhances or inhibits vocalisation processing in the macaque brain. Nat Commun 2022; 13:4886. [PMID: 35985995 PMCID: PMC9391382 DOI: 10.1038/s41467-022-32512-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 08/03/2022] [Indexed: 11/08/2022] Open
Abstract
Social interactions rely on the interpretation of semantic and emotional information, often from multiple sensory modalities. Nonhuman primates send and receive auditory and visual communicative signals. However, the neural mechanisms underlying the association of visual and auditory information based on their common social meaning are unknown. Using heart rate estimates and functional neuroimaging, we show that in the lateral and superior temporal sulcus of the macaque monkey, neural responses are enhanced in response to species-specific vocalisations paired with a matching visual context, or when vocalisations follow, in time, visual information, but inhibited when vocalisation are incongruent with the visual context. For example, responses to affiliative vocalisations are enhanced when paired with affiliative contexts but inhibited when paired with aggressive or escape contexts. Overall, we propose that the identified neural network represents social meaning irrespective of sensory modality. Social interaction involves processing semantic and emotional information. Here the authors show that in the macaque monkey lateral and superior temporal sulcus, cortical activity is enhanced in response to species-specific vocalisations predicted by matching face or social visual stimuli but inhibited when vocalisations are incongruent with the predictive visual context.
Collapse
|
6
|
Abbatecola C, Gerardin P, Beneyton K, Kennedy H, Knoblauch K. The Role of Unimodal Feedback Pathways in Gender Perception During Activation of Voice and Face Areas. Front Syst Neurosci 2021; 15:669256. [PMID: 34122023 PMCID: PMC8194406 DOI: 10.3389/fnsys.2021.669256] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 04/22/2021] [Indexed: 11/18/2022] Open
Abstract
Cross-modal effects provide a model framework for investigating hierarchical inter-areal processing, particularly, under conditions where unimodal cortical areas receive contextual feedback from other modalities. Here, using complementary behavioral and brain imaging techniques, we investigated the functional networks participating in face and voice processing during gender perception, a high-level feature of voice and face perception. Within the framework of a signal detection decision model, Maximum likelihood conjoint measurement (MLCM) was used to estimate the contributions of the face and voice to gender comparisons between pairs of audio-visual stimuli in which the face and voice were independently modulated. Top–down contributions were varied by instructing participants to make judgments based on the gender of either the face, the voice or both modalities (N = 12 for each task). Estimated face and voice contributions to the judgments of the stimulus pairs were not independent; both contributed to all tasks, but their respective weights varied over a 40-fold range due to top–down influences. Models that best described the modal contributions required the inclusion of two different top–down interactions: (i) an interaction that depended on gender congruence across modalities (i.e., difference between face and voice modalities for each stimulus); (ii) an interaction that depended on the within modalities’ gender magnitude. The significance of these interactions was task dependent. Specifically, gender congruence interaction was significant for the face and voice tasks while the gender magnitude interaction was significant for the face and stimulus tasks. Subsequently, we used the same stimuli and related tasks in a functional magnetic resonance imaging (fMRI) paradigm (N = 12) to explore the neural correlates of these perceptual processes, analyzed with Dynamic Causal Modeling (DCM) and Bayesian Model Selection. Results revealed changes in effective connectivity between the unimodal Fusiform Face Area (FFA) and Temporal Voice Area (TVA) in a fashion that paralleled the face and voice behavioral interactions observed in the psychophysical data. These findings explore the role in perception of multiple unimodal parallel feedback pathways.
Collapse
Affiliation(s)
- Clement Abbatecola
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France.,Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Peggy Gerardin
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France
| | - Kim Beneyton
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France
| | - Henry Kennedy
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France.,Institute of Neuroscience, State Key Laboratory of Neuroscience, Chinese Academy of Sciences Key Laboratory of Primate Neurobiology, Shanghai, China
| | - Kenneth Knoblauch
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM, Stem Cell and Brain Research Institute U1208, Bron, France.,National Centre for Optics, Vision and Eye Care, Faculty of Health and Social Sciences, University of South-Eastern Norway, Kongsberg, Norway
| |
Collapse
|
7
|
Explaining face-voice matching decisions: The contribution of mouth movements, stimulus effects and response biases. Atten Percept Psychophys 2021; 83:2205-2216. [PMID: 33797024 PMCID: PMC8213568 DOI: 10.3758/s13414-021-02290-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2021] [Indexed: 11/08/2022]
Abstract
Previous studies have shown that face-voice matching accuracy is more consistently above chance for dynamic (i.e. speaking) faces than for static faces. This suggests that dynamic information can play an important role in informing matching decisions. We initially asked whether this advantage for dynamic stimuli is due to shared information across modalities that is encoded in articulatory mouth movements. Participants completed a sequential face-voice matching task with (1) static images of faces, (2) dynamic videos of faces, (3) dynamic videos where only the mouth was visible, and (4) dynamic videos where the mouth was occluded, in a well-controlled stimulus set. Surprisingly, after accounting for random variation in the data due to design choices, accuracy for all four conditions was at chance. Crucially, however, exploratory analyses revealed that participants were not responding randomly, with different patterns of response biases being apparent for different conditions. Our findings suggest that face-voice identity matching may not be possible with above-chance accuracy but that analyses of response biases can shed light upon how people attempt face-voice matching. We discuss these findings with reference to the differential functional roles for faces and voices recently proposed for multimodal person perception.
Collapse
|
8
|
Young AW, Frühholz S, Schweinberger SR. Face and Voice Perception: Understanding Commonalities and Differences. Trends Cogn Sci 2020; 24:398-410. [DOI: 10.1016/j.tics.2020.02.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/16/2020] [Accepted: 02/03/2020] [Indexed: 01/01/2023]
|
9
|
Cecchetto C, Fischmeister FPS, Gorkiewicz S, Schuehly W, Bagga D, Parma V, Schöpf V. Human body odor increases familiarity for faces during encoding-retrieval task. Hum Brain Mapp 2020; 41:1904-1919. [PMID: 31904899 PMCID: PMC7268037 DOI: 10.1002/hbm.24920] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 10/31/2019] [Accepted: 12/29/2019] [Indexed: 01/27/2023] Open
Abstract
Odors can increase memory performance when presented as context during both encoding and retrieval phases. Since information from different sensory modalities is integrated into a unified conceptual knowledge, we hypothesize that the social information from body odors and faces would be integrated during encoding. The integration of such social information would enhance retrieval more so than when the encoding occurs in the context of common odors. To examine this hypothesis and to further explore the underlying neural correlates of this behavior, we have conducted a functional magnetic resonance imaging study in which participants performed an encoding‐retrieval memory task for faces during the presentation of common odor, body odor or clean air. At the behavioral level, results show that participants were less biased and faster in recognizing faces when presented in concomitance with the body odor compared to the common odor. At the neural level, the encoding of faces in the body odor condition, compared to common odor and clean air conditions, showed greater activation in areas related to associative memory (dorsolateral prefrontal cortex), odor perception and multisensory integration (orbitofrontal cortex). These results suggest that face and body odor information were integrated and as a result, participants were faster in recognizing previously presented material.
Collapse
Affiliation(s)
- Cinzia Cecchetto
- Institute of Psychology, University of Graz, Graz, Austria.,BioTechMed, Graz, Austria
| | | | | | | | - Deepika Bagga
- Institute of Psychology, University of Graz, Graz, Austria.,BioTechMed, Graz, Austria
| | - Valentina Parma
- Department of Psychology, Temple University, Philadelphia, Pennsylvania
| | - Veronika Schöpf
- Institute of Psychology, University of Graz, Graz, Austria.,BioTechMed, Graz, Austria.,Computational Imaging Research Lab (CIR), Department of Biomedical Imaging and Image-guided Therapy, Medical University of Vienna, Vienna, Austria
| |
Collapse
|