1
|
Becker J, Viertler M, Korn CW, Blank H. The pupil dilation response as an indicator of visual cue uncertainty and auditory outcome surprise. Eur J Neurosci 2024; 59:2686-2701. [PMID: 38469976 DOI: 10.1111/ejn.16306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 01/05/2024] [Accepted: 02/18/2024] [Indexed: 03/13/2024]
Abstract
In everyday perception, we combine incoming sensory information with prior expectations. Expectations can be induced by cues that indicate the probability of following sensory events. The information provided by cues may differ and hence lead to different levels of uncertainty about which event will follow. In this experiment, we employed pupillometry to investigate whether the pupil dilation response to visual cues varies depending on the level of cue-associated uncertainty about a following auditory outcome. Also, we tested whether the pupil dilation response reflects the amount of surprise about the subsequently presented auditory stimulus. In each trial, participants were presented with a visual cue (face image) which was followed by an auditory outcome (spoken vowel). After the face cue, participants had to indicate by keypress which of three auditory vowels they expected to hear next. We manipulated the cue-associated uncertainty by varying the probabilistic cue-outcome contingencies: One face was most likely followed by one specific vowel (low cue uncertainty), another face was equally likely followed by either of two vowels (intermediate cue uncertainty) and the third face was followed by all three vowels (high cue uncertainty). Our results suggest that pupil dilation in response to task-relevant cues depends on the associated uncertainty, but only for large differences in the cue-associated uncertainty. Additionally, in response to the auditory outcomes, the pupil dilation scaled negatively with the cue-dependent probabilities, likely signalling the amount of surprise.
Collapse
Affiliation(s)
- Janika Becker
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Marvin Viertler
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Christoph W Korn
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Section Social Neuroscience, Department of General Psychiatry, University of Heidelberg, Heidelberg, Germany
| | - Helen Blank
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
2
|
Wang K, Fang Y, Guo Q, Shen L, Chen Q. Superior Attentional Efficiency of Auditory Cue via the Ventral Auditory-thalamic Pathway. J Cogn Neurosci 2024; 36:303-326. [PMID: 38010315 DOI: 10.1162/jocn_a_02090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Auditory commands are often executed more efficiently than visual commands. However, empirical evidence on the underlying behavioral and neural mechanisms remains scarce. In two experiments, we manipulated the delivery modality of informative cues and the prediction violation effect and found consistently enhanced RT benefits for the matched auditory cues compared with the matched visual cues. At the neural level, when the bottom-up perceptual input matched the prior prediction induced by the auditory cue, the auditory-thalamic pathway was significantly activated. Moreover, the stronger the auditory-thalamic connectivity, the higher the behavioral benefits of the matched auditory cue. When the bottom-up input violated the prior prediction induced by the auditory cue, the ventral auditory pathway was specifically involved. Moreover, the stronger the ventral auditory-prefrontal connectivity, the larger the behavioral costs caused by the violation of the auditory cue. In addition, the dorsal frontoparietal network showed a supramodal function in reacting to the violation of informative cues irrespective of the delivery modality of the cue. Taken together, the results reveal novel behavioral and neural evidence that the superior efficiency of the auditory cue is twofold: The auditory-thalamic pathway is associated with improvements in task performance when the bottom-up input matches the auditory cue, whereas the ventral auditory-prefrontal pathway is involved when the auditory cue is violated.
Collapse
Affiliation(s)
- Ke Wang
- South China Normal University, Guangzhou, China
| | - Ying Fang
- South China Normal University, Guangzhou, China
| | - Qiang Guo
- Guangdong Sanjiu Brain Hospital, Guangzhou, China
| | - Lu Shen
- South China Normal University, Guangzhou, China
| | - Qi Chen
- South China Normal University, Guangzhou, China
| |
Collapse
|
3
|
Jeschke L, Mathias B, von Kriegstein K. Inhibitory TMS over Visual Area V5/MT Disrupts Visual Speech Recognition. J Neurosci 2023; 43:7690-7699. [PMID: 37848284 PMCID: PMC10634547 DOI: 10.1523/jneurosci.0975-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/26/2023] [Accepted: 09/04/2023] [Indexed: 10/19/2023] Open
Abstract
During face-to-face communication, the perception and recognition of facial movements can facilitate individuals' understanding of what is said. Facial movements are a form of complex biological motion. Separate neural pathways are thought to processing (1) simple, nonbiological motion with an obligatory waypoint in the motion-sensitive visual middle temporal area (V5/MT); and (2) complex biological motion. Here, we present findings that challenge this dichotomy. Neuronavigated offline transcranial magnetic stimulation (TMS) over V5/MT on 24 participants (17 females and 7 males) led to increased response times in the recognition of simple, nonbiological motion as well as visual speech recognition compared with TMS over the vertex, an active control region. TMS of area V5/MT also reduced practice effects on response times, that are typically observed in both visual speech and motion recognition tasks over time. Our findings provide the first indication that area V5/MT causally influences the recognition of visual speech.SIGNIFICANCE STATEMENT In everyday face-to-face communication, speech comprehension is often facilitated by viewing a speaker's facial movements. Several brain areas contribute to the recognition of visual speech. One area of interest is the motion-sensitive visual medial temporal area (V5/MT), which has been associated with the perception of simple, nonbiological motion such as moving dots, as well as more complex, biological motion such as visual speech. Here, we demonstrate using noninvasive brain stimulation that area V5/MT is causally relevant in recognizing visual speech. This finding provides new insights into the neural mechanisms that support the perception of human communication signals, which will help guide future research in typically developed individuals and populations with communication difficulties.
Collapse
Affiliation(s)
- Lisa Jeschke
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, 01069 Dresden, Germany
| | - Brian Mathias
- School of Psychology, University of Aberdeen, Aberdeen AB243FX, United Kingdom
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, 01069 Dresden, Germany
| |
Collapse
|
4
|
Saalasti S, Alho J, Lahnakoski JM, Bacha-Trams M, Glerean E, Jääskeläinen IP, Hasson U, Sams M. Lipreading a naturalistic narrative in a female population: Neural characteristics shared with listening and reading. Brain Behav 2023; 13:e2869. [PMID: 36579557 PMCID: PMC9927859 DOI: 10.1002/brb3.2869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 12/30/2022] Open
Abstract
INTRODUCTION Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.
Collapse
Affiliation(s)
- Satu Saalasti
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.,Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Advanced Magnetic Imaging (AMI) Centre, Aalto NeuroImaging, School of Science, Aalto University, Espoo, Finland
| | - Jussi Alho
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Juha M Lahnakoski
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Munich, Germany.,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany.,Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mareike Bacha-Trams
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Enrico Glerean
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Iiro P Jääskeläinen
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto Studios - MAGICS, Aalto University, Espoo, Finland
| |
Collapse
|
5
|
Blank H, Alink A, Büchel C. Multivariate functional neuroimaging analyses reveal that strength-dependent face expectations are represented in higher-level face-identity areas. Commun Biol 2023; 6:135. [PMID: 36725984 PMCID: PMC9892564 DOI: 10.1038/s42003-023-04508-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 01/19/2023] [Indexed: 02/03/2023] Open
Abstract
Perception is an active inference in which prior expectations are combined with sensory input. It is still unclear how the strength of prior expectations is represented in the human brain. The strength, or precision, of a prior could be represented with its content, potentially in higher-level sensory areas. We used multivariate analyses of functional resonance imaging data to test whether expectation strength is represented together with the expected face in high-level face-sensitive regions. Participants were trained to associate images of scenes with subsequently presented images of different faces. Each scene predicted three faces, each with either low, intermediate, or high probability. We found that anticipation enhances the similarity of response patterns in the face-sensitive anterior temporal lobe to response patterns specifically associated with the image of the expected face. In contrast, during face presentation, activity increased for unexpected faces in a typical prediction error network, containing areas such as the caudate and the insula. Our findings show that strength-dependent face expectations are represented in higher-level face-identity areas, supporting hierarchical theories of predictive processing according to which higher-level sensory regions represent weighted priors.
Collapse
Affiliation(s)
- Helen Blank
- grid.13648.380000 0001 2180 3484Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Arjen Alink
- grid.13648.380000 0001 2180 3484Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Christian Büchel
- grid.13648.380000 0001 2180 3484Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| |
Collapse
|
6
|
Csonka M, Mardmomen N, Webster PJ, Brefczynski-Lewis JA, Frum C, Lewis JW. Meta-Analyses Support a Taxonomic Model for Representations of Different Categories of Audio-Visual Interaction Events in the Human Brain. Cereb Cortex Commun 2021; 2:tgab002. [PMID: 33718874 PMCID: PMC7941256 DOI: 10.1093/texcom/tgab002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 12/31/2020] [Accepted: 01/06/2021] [Indexed: 01/23/2023] Open
Abstract
Our ability to perceive meaningful action events involving objects, people, and other animate agents is characterized in part by an interplay of visual and auditory sensory processing and their cross-modal interactions. However, this multisensory ability can be altered or dysfunctional in some hearing and sighted individuals, and in some clinical populations. The present meta-analysis sought to test current hypotheses regarding neurobiological architectures that may mediate audio-visual multisensory processing. Reported coordinates from 82 neuroimaging studies (137 experiments) that revealed some form of audio-visual interaction in discrete brain regions were compiled, converted to a common coordinate space, and then organized along specific categorical dimensions to generate activation likelihood estimate (ALE) brain maps and various contrasts of those derived maps. The results revealed brain regions (cortical "hubs") preferentially involved in multisensory processing along different stimulus category dimensions, including 1) living versus nonliving audio-visual events, 2) audio-visual events involving vocalizations versus actions by living sources, 3) emotionally valent events, and 4) dynamic-visual versus static-visual audio-visual stimuli. These meta-analysis results are discussed in the context of neurocomputational theories of semantic knowledge representations and perception, and the brain volumes of interest are available for download to facilitate data interpretation for future neuroimaging studies.
Collapse
Affiliation(s)
- Matt Csonka
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Nadia Mardmomen
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Paula J Webster
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Julie A Brefczynski-Lewis
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Chris Frum
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - James W Lewis
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
7
|
Processing communicative facial and vocal cues in the superior temporal sulcus. Neuroimage 2020; 221:117191. [PMID: 32711066 DOI: 10.1016/j.neuroimage.2020.117191] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 07/14/2020] [Accepted: 07/19/2020] [Indexed: 11/20/2022] Open
Abstract
Facial and vocal cues provide critical social information about other humans, including their emotional and attentional states and the content of their speech. Recent work has shown that the face-responsive region of posterior superior temporal sulcus ("fSTS") also responds strongly to vocal sounds. Here, we investigate the functional role of this region and the broader STS by measuring responses to a range of face movements, vocal sounds, and hand movements using fMRI. We find that the fSTS responds broadly to different types of audio and visual face action, including both richly social communicative actions, as well as minimally social noncommunicative actions, ruling out hypotheses of specialization for processing speech signals, or communicative signals more generally. Strikingly, however, responses to hand movements were very low, whether communicative or not, indicating a specific role in the analysis of face actions (facial and vocal), not a general role in the perception of any human action. Furthermore, spatial patterns of response in this region were able to decode communicative from noncommunicative face actions, both within and across modality (facial/vocal cues), indicating sensitivity to an abstract social dimension. These functional properties of the fSTS contrast with a region of middle STS that has a selective, largely unimodal auditory response to speech sounds over both communicative and noncommunicative vocal nonspeech sounds, and nonvocal sounds. Region of interest analyses were corroborated by a data-driven independent component analysis, identifying face-voice and auditory speech responses as dominant sources of voxelwise variance across the STS. These results suggest that the STS contains separate processing streams for the audiovisual analysis of face actions and auditory speech processing.
Collapse
|
8
|
Michon M, Boncompte G, López V. Electrophysiological Dynamics of Visual Speech Processing and the Role of Orofacial Effectors for Cross-Modal Predictions. Front Hum Neurosci 2020; 14:538619. [PMID: 33192386 PMCID: PMC7653187 DOI: 10.3389/fnhum.2020.538619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
The human brain generates predictions about future events. During face-to-face conversations, visemic information is used to predict upcoming auditory input. Recent studies suggest that the speech motor system plays a role in these cross-modal predictions, however, usually only audio-visual paradigms are employed. Here we tested whether speech sounds can be predicted on the basis of visemic information only, and to what extent interfering with orofacial articulatory effectors can affect these predictions. We registered EEG and employed N400 as an index of such predictions. Our results show that N400's amplitude was strongly modulated by visemic salience, coherent with cross-modal speech predictions. Additionally, N400 ceased to be evoked when syllables' visemes were presented backwards, suggesting that predictions occur only when the observed viseme matched an existing articuleme in the observer's speech motor system (i.e., the articulatory neural sequence required to produce a particular phoneme/viseme). Importantly, we found that interfering with the motor articulatory system strongly disrupted cross-modal predictions. We also observed a late P1000 that was evoked only for syllable-related visual stimuli, but whose amplitude was not modulated by interfering with the motor system. The present study provides further evidence of the importance of the speech production system for speech sounds predictions based on visemic information at the pre-lexical level. The implications of these results are discussed in the context of a hypothesized trimodal repertoire for speech, in which speech perception is conceived as a highly interactive process that involves not only your ears but also your eyes, lips and tongue.
Collapse
Affiliation(s)
- Maëva Michon
- Laboratorio de Neurociencia Cognitiva y Evolutiva, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
- Laboratorio de Neurociencia Cognitiva y Social, Facultad de Psicología, Universidad Diego Portales, Santiago, Chile
| | - Gonzalo Boncompte
- Laboratorio de Neurodinámicas de la Cognición, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Vladimir López
- Laboratorio de Psicología Experimental, Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
9
|
Origin and evolution of human speech: Emergence from a trimodal auditory, visual and vocal network. PROGRESS IN BRAIN RESEARCH 2019; 250:345-371. [PMID: 31703907 DOI: 10.1016/bs.pbr.2019.01.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In recent years, there have been important additions to the classical model of speech processing as originally depicted by the Broca-Wernicke model consisting of an anterior, productive region and a posterior, perceptive region, both connected via the arcuate fasciculus. The modern view implies a separation into a dorsal and a ventral pathway conveying different kinds of linguistic information, which parallels the organization of the visual system. Furthermore, this organization is highly conserved in evolution and can be seen as the neural scaffolding from which the speech networks originated. In this chapter we emphasize that the speech networks are embedded in a multimodal system encompassing audio-vocal and visuo-vocal connections, which can be referred to an ancestral audio-visuo-motor pathway present in nonhuman primates. Likewise, we propose a trimodal repertoire for speech processing and acquisition involving auditory, visual and motor representations of the basic elements of speech: phoneme, observation of mouth movements, and articulatory processes. Finally, we discuss this proposal in the context of a scenario for early speech acquisition in infants and in human evolution.
Collapse
|
10
|
Wang X, Gu J, Xu J, Li X, Geng J, Wang B, Liu B. Decoding natural scenes based on sounds of objects within scenes using multivariate pattern analysis. Neurosci Res 2018; 148:9-18. [PMID: 30513353 DOI: 10.1016/j.neures.2018.11.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 11/21/2018] [Accepted: 11/30/2018] [Indexed: 10/27/2022]
Abstract
Scene recognition plays an important role in spatial navigation and scene classification. It remains unknown whether the occipitotemporal cortex could represent the semantic association between the scenes and sounds of objects within the scenes. In this study, we used the functional magnetic resonance imaging (fMRI) technique and multivariate pattern analysis to assess whether diff ; ;erent scenes could be discriminated based on the patterns evoked by sounds of objects within the scenes. We found that patterns evoked by scenes could be predicted with patterns evoked by sounds of objects within the scenes in the posterior fusiform area (pF), lateral occipital area (LO) and superior temporal sulcus (STS). The further functional connectivity analysis suggested significant correlations between pF, LO and parahippocampal place area (PPA) except that between STS and other three regions under the scene and sound conditions. A distinct network in processing scenes and sounds was discovered using a seed-to-voxel analysis with STS as the seed. This study may provide a cross-modal channel of scene decoding through the sounds of objects within the scenes in the occipitotemporal cortex, which could complement the single-modal channel of scene decoding based on the global scene properties or objects within the scenes.
Collapse
Affiliation(s)
- Xiaojing Wang
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300350, China
| | - Jin Gu
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300350, China
| | - Junhai Xu
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300350, China
| | - Xianglin Li
- Medical Imaging Research Institute, Binzhou Medical University, Yantai, Shandong 264003, China
| | - Junzu Geng
- Department of Radiology, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, Shandong, 264003, China
| | - Bin Wang
- Medical Imaging Research Institute, Binzhou Medical University, Yantai, Shandong 264003, China
| | - Baolin Liu
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300350, China; State Key Laboratory of Intelligent Technology and Systems, National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
11
|
Borowiak K, Schelinski S, von Kriegstein K. Recognizing visual speech: Reduced responses in visual-movement regions, but not other speech regions in autism. Neuroimage Clin 2018; 20:1078-1091. [PMID: 30368195 PMCID: PMC6202694 DOI: 10.1016/j.nicl.2018.09.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 09/19/2018] [Accepted: 09/21/2018] [Indexed: 12/23/2022]
Abstract
Speech information inherent in face movements is important for understanding what is said in face-to-face communication. Individuals with autism spectrum disorders (ASD) have difficulties in extracting speech information from face movements, a process called visual-speech recognition. Currently, it is unknown what dysfunctional brain regions or networks underlie the visual-speech recognition deficit in ASD. We conducted a functional magnetic resonance imaging (fMRI) study with concurrent eye tracking to investigate visual-speech recognition in adults diagnosed with high-functioning autism and pairwise matched typically developed controls. Compared to the control group (n = 17), the ASD group (n = 17) showed decreased Blood Oxygenation Level Dependent (BOLD) response during visual-speech recognition in the right visual area 5 (V5/MT) and left temporal visual speech area (TVSA) - brain regions implicated in visual-movement perception. The right V5/MT showed positive correlation with visual-speech task performance in the ASD group, but not in the control group. Psychophysiological interaction analysis (PPI) revealed that functional connectivity between the left TVSA and the bilateral V5/MT and between the right V5/MT and the left IFG was lower in the ASD than in the control group. In contrast, responses in other speech-motor regions and their connectivity were on the neurotypical level. Reduced responses and network connectivity of the visual-movement regions in conjunction with intact speech-related mechanisms indicate that perceptual mechanisms might be at the core of the visual-speech recognition deficit in ASD. Communication deficits in ASD might at least partly stem from atypical sensory processing and not higher-order cognitive processing of socially relevant information.
Collapse
Affiliation(s)
- Kamila Borowiak
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103 Leipzig, Germany; Berlin School of Mind and Brain, Humboldt University of Berlin, Luisenstraße 56, 10117 Berlin, Germany; Technische Universität Dresden, Bamberger Straße 7, 01187 Dresden, Germany.
| | - Stefanie Schelinski
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103 Leipzig, Germany; Technische Universität Dresden, Bamberger Straße 7, 01187 Dresden, Germany
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103 Leipzig, Germany; Technische Universität Dresden, Bamberger Straße 7, 01187 Dresden, Germany
| |
Collapse
|
12
|
Neural Prediction Errors Distinguish Perception and Misperception of Speech. J Neurosci 2018; 38:6076-6089. [PMID: 29891730 DOI: 10.1523/jneurosci.3258-17.2018] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/08/2018] [Accepted: 03/28/2018] [Indexed: 11/21/2022] Open
Abstract
Humans use prior expectations to improve perception, especially of sensory signals that are degraded or ambiguous. However, if sensory input deviates from prior expectations, then correct perception depends on adjusting or rejecting prior expectations. Failure to adjust or reject the prior leads to perceptual illusions, especially if there is partial overlap (and thus partial mismatch) between expectations and input. With speech, "slips of the ear" occur when expectations lead to misperception. For instance, an entomologist might be more susceptible to hear "The ants are my friends" for "The answer, my friend" (in the Bob Dylan song Blowing in the Wind). Here, we contrast two mechanisms by which prior expectations may lead to misperception of degraded speech. First, clear representations of the common sounds in the prior and input (i.e., expected sounds) may lead to incorrect confirmation of the prior. Second, insufficient representations of sounds that deviate between prior and input (i.e., prediction errors) could lead to deception. We used crossmodal predictions from written words that partially match degraded speech to compare neural responses when male and female human listeners were deceived into accepting the prior or correctly reject it. Combined behavioral and multivariate representational similarity analysis of fMRI data show that veridical perception of degraded speech is signaled by representations of prediction error in the left superior temporal sulcus. Instead of using top-down processes to support perception of expected sensory input, our findings suggest that the strength of neural prediction error representations distinguishes correct perception and misperception.SIGNIFICANCE STATEMENT Misperceiving spoken words is an everyday experience, with outcomes that range from shared amusement to serious miscommunication. For hearing-impaired individuals, frequent misperception can lead to social withdrawal and isolation, with severe consequences for wellbeing. In this work, we specify the neural mechanisms by which prior expectations, which are so often helpful for perception, can lead to misperception of degraded sensory signals. Most descriptive theories of illusory perception explain misperception as arising from a clear sensory representation of features or sounds that are in common between prior expectations and sensory input. Our work instead provides support for a complementary proposal: that misperception occurs when there is an insufficient sensory representations of the deviation between expectations and sensory signals.
Collapse
|
13
|
Díaz B, Blank H, von Kriegstein K. Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition. Neuroimage 2018; 178:721-734. [PMID: 29772380 DOI: 10.1016/j.neuroimage.2018.05.032] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 04/12/2018] [Accepted: 05/12/2018] [Indexed: 11/19/2022] Open
Abstract
The cerebral cortex modulates early sensory processing via feed-back connections to sensory pathway nuclei. The functions of this top-down modulation for human behavior are poorly understood. Here, we show that top-down modulation of the visual sensory thalamus (the lateral geniculate body, LGN) is involved in visual-speech recognition. In two independent functional magnetic resonance imaging (fMRI) studies, LGN response increased when participants processed fast-varying features of articulatory movements required for visual-speech recognition, as compared to temporally more stable features required for face identification with the same stimulus material. The LGN response during the visual-speech task correlated positively with the visual-speech recognition scores across participants. In addition, the task-dependent modulation was present for speech movements and did not occur for control conditions involving non-speech biological movements. In face-to-face communication, visual speech recognition is used to enhance or even enable understanding what is said. Speech recognition is commonly explained in frameworks focusing on cerebral cortex areas. Our findings suggest that task-dependent modulation at subcortical sensory stages has an important role for communication: Together with similar findings in the auditory modality the findings imply that task-dependent modulation of the sensory thalami is a general mechanism to optimize speech recognition.
Collapse
Affiliation(s)
- Begoña Díaz
- Center for Brain and Cognition, Pompeu Fabra University, Barcelona, 08018, Spain; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, 04103, Germany; Department of Basic Sciences, Faculty of Medicine and Health Sciences, International University of Catalonia, 08195 Sant Cugat del Vallès, Spain.
| | - Helen Blank
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, 04103, Germany; University Medical Center Hamburg-Eppendorf, 20246, Hamburg, Germany
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, 04103, Germany; Faculty of Psychology, Technische Universität Dresden, 01187, Dresden, Germany
| |
Collapse
|
14
|
Noppeney U, Lee HL. Causal inference and temporal predictions in audiovisual perception of speech and music. Ann N Y Acad Sci 2018; 1423:102-116. [PMID: 29604082 DOI: 10.1111/nyas.13615] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Revised: 12/13/2017] [Accepted: 12/22/2017] [Indexed: 11/28/2022]
Abstract
To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses.
Collapse
Affiliation(s)
- Uta Noppeney
- Computational Neuroscience and Cognitive Robotics Centre, University of Birmingham, Birmingham, UK
| | - Hwee Ling Lee
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| |
Collapse
|
15
|
Neural Mechanisms Underlying Cross-Modal Phonetic Encoding. J Neurosci 2017; 38:1835-1849. [PMID: 29263241 DOI: 10.1523/jneurosci.1566-17.2017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 11/17/2017] [Accepted: 12/08/2017] [Indexed: 11/21/2022] Open
Abstract
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (illusion-fa), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (illusion-ba), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba, and a reduced N1 when they perceived illusion-fa, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex.SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
Collapse
|
16
|
Jiang J, Borowiak K, Tudge L, Otto C, von Kriegstein K. Neural mechanisms of eye contact when listening to another person talking. Soc Cogn Affect Neurosci 2017; 12:319-328. [PMID: 27576745 PMCID: PMC5390711 DOI: 10.1093/scan/nsw127] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 08/24/2016] [Indexed: 11/14/2022] Open
Abstract
Eye contact occurs frequently and voluntarily during face-to-face verbal communication. However, the neural mechanisms underlying eye contact when it is accompanied by spoken language remain unexplored to date. Here we used a novel approach, fixation-based event-related functional magnetic resonance imaging (fMRI), to simulate the listener making eye contact with a speaker during verbal communication. Participants’ eye movements and fMRI data were recorded simultaneously while they were freely viewing a pre-recorded speaker talking. The eye tracking data were then used to define events for the fMRI analyses. The results showed that eye contact in contrast to mouth fixation involved visual cortical areas (cuneus, calcarine sulcus), brain regions related to theory of mind/intentionality processing (temporoparietal junction, posterior superior temporal sulcus, medial prefrontal cortex) and the dorsolateral prefrontal cortex. In addition, increased effective connectivity was found between these regions for eye contact in contrast to mouth fixations. The results provide first evidence for neural mechanisms underlying eye contact when watching and listening to another person talking. The network we found might be well suited for processing the intentions of communication partners during eye contact in verbal communication.
Collapse
Affiliation(s)
- Jing Jiang
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany.,Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin 10117, Germany.,Institute of Psychology, Humboldt-Universität zu Berlin, Berlin 12489, Germany
| | - Kamila Borowiak
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany.,Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin 10117, Germany
| | - Luke Tudge
- Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin 10117, Germany
| | - Carolin Otto
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany.,Institute of Psychology, Humboldt-Universität zu Berlin, Berlin 12489, Germany
| |
Collapse
|
17
|
Blank H, Davis MH. Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception. PLoS Biol 2016; 14:e1002577. [PMID: 27846209 PMCID: PMC5112801 DOI: 10.1371/journal.pbio.1002577] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 10/19/2016] [Indexed: 11/19/2022] Open
Abstract
Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior expectation and sensory detail provides evidence for a Predictive Coding account of speech perception. Our work establishes methods that can be used to distinguish representations of Prediction Error and Sharpened Signals in other perceptual domains.
Collapse
Affiliation(s)
- Helen Blank
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
- * E-mail:
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
| |
Collapse
|
18
|
Ran G, Chen X, Cao X, Zhang Q. Prediction and unconscious attention operate synergistically to facilitate stimulus processing: An fMRI study. Conscious Cogn 2016; 44:41-50. [PMID: 27351781 DOI: 10.1016/j.concog.2016.06.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 06/12/2016] [Accepted: 06/20/2016] [Indexed: 11/30/2022]
Abstract
There is considerable evidence that prediction and attention aid perception. However, little is known about the possible neural mechanisms underlying the impact of prediction and unconscious attention on perception, probably due to the relative neglect of unconscious attention in scholarly literature. Here, we addressed this issue using functional magnetic resonance imaging (fMRI). We adopted a variant of the double-cue paradigm, in which prediction and attention were factorially manipulated by two separate cues (prediction and attention cues). To manipulate consciousness, the attention cues were presented subliminally and supraliminally. Behaviorally, we reported an unconscious-attended effect in the predictable trials and a conscious-attended effect in the unpredictable trials. On the neural level, it was shown that prediction and unconscious attention interacted in the left dorsolateral prefrontal cortex (dlPFC). More specifically, there was a significantly decreased activation in dlPFC for predictable relative to unpredictable stimuli in the unconscious-attended trials, but not in the unconscious-unattended trials. This result suggests that prediction and unconscious attention operate synergistically to facilitate stimulus processing. This is further corroborated by the subsequent functional connectivity analysis, which revealed increased functional connectivity between the left dlPFC and the premotor cortex for predictable versus unpredictable stimuli in the unconscious-attended trials.
Collapse
Affiliation(s)
- Guangming Ran
- Faculty of Psychology, Southwest University, Chongqing 400715, China
| | - Xu Chen
- Faculty of Psychology, Southwest University, Chongqing 400715, China.
| | - Xiaojun Cao
- Institute of Education, China West Normal University, Nanchong 637002, China
| | - Qi Zhang
- School of Education Science, Guizhou Normal University, Guiyang 550001, China
| |
Collapse
|
19
|
Park H, Kayser C, Thut G, Gross J. Lip movements entrain the observers' low-frequency brain oscillations to facilitate speech intelligibility. eLife 2016; 5. [PMID: 27146891 PMCID: PMC4900800 DOI: 10.7554/elife.14521] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Accepted: 05/03/2016] [Indexed: 12/02/2022] Open
Abstract
During continuous speech, lip movements provide visual temporal signals that facilitate speech processing. Here, using MEG we directly investigated how these visual signals interact with rhythmic brain activity in participants listening to and seeing the speaker. First, we investigated coherence between oscillatory brain activity and speaker’s lip movements and demonstrated significant entrainment in visual cortex. We then used partial coherence to remove contributions of the coherent auditory speech signal from the lip-brain coherence. Comparing this synchronization between different attention conditions revealed that attending visual speech enhances the coherence between activity in visual cortex and the speaker’s lips. Further, we identified a significant partial coherence between left motor cortex and lip movements and this partial coherence directly predicted comprehension accuracy. Our results emphasize the importance of visually entrained and attention-modulated rhythmic brain activity for the enhancement of audiovisual speech processing. DOI:http://dx.doi.org/10.7554/eLife.14521.001 People are able communicate effectively with each other even in very noisy places where it is difficult to actually hear what others are saying. In a face-to-face conversation, people detect and respond to many physical cues – including body posture, facial expressions, head and eye movements and gestures – alongside the sound cues. Lip movements are particularly important and contain enough information to allow trained observers to understand speech even if they cannot hear the speech itself. It is known that brain waves in listeners are synchronized with the rhythms in a speech, especially the syllables. This is thought to establish a channel for communication – similar to tuning a radio to a certain frequency to listen to a certain radio station. Park et al. studied if listeners’ brain waves also align to the speaker’s lip movements during continuous speech and if this is important for understanding the speech. The experiments reveal that a part of the brain that processes visual information – called the visual cortex – produces brain waves that are synchronized to the rhythm of syllables in continuous speech. This synchronization was more precise in a complex situation where lip movements would be more important to understand speech. Park et al. also found that the area of the observer’s brain that controls the lips (the motor cortex) also produced brain waves that were synchronized to lip movements. Volunteers whose motor cortex was more synchronized to the lip movements understood speech better. This supports the idea that brain areas that are used for producing speech are also important for understanding speech. Future challenges include understanding how synchronization of brain waves with the rhythms of speech helps us to understand speech, and how the brain waves produced by the visual and motor areas interact. DOI:http://dx.doi.org/10.7554/eLife.14521.002
Collapse
Affiliation(s)
- Hyojin Park
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Christoph Kayser
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Gregor Thut
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
20
|
McMurray B, Jongman A. What Comes After /f/? Prediction in Speech Derives From Data-Explanatory Processes. Psychol Sci 2015; 27:43-52. [PMID: 26581947 DOI: 10.1177/0956797615609578] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Accepted: 09/10/2015] [Indexed: 11/17/2022] Open
Abstract
Acoustic cues are short-lived and highly variable, which makes speech perception a difficult problem. However, most listeners solve this problem effortlessly. In the present experiment, we demonstrated that part of the solution lies in predicting upcoming speech sounds and that predictions are modulated by high-level expectations about the current sound. Participants heard isolated fricatives (e.g., "s," "sh") and predicted the upcoming vowel. Accuracy was above chance, which suggests that fine-grained detail in the signal can be used for prediction. A second group performed the same task but also saw a still face and a letter corresponding to the fricative. This group performed markedly better, which suggests that high-level knowledge modulates prediction by helping listeners form expectations about what the fricative should have sounded like. This suggests a form of data explanation operating in speech perception: Listeners account for variance due to their knowledge of the talker and current phoneme, and they use what is left over to make more accurate predictions about the next sound.
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa
| | | |
Collapse
|
21
|
Riedel P, Ragert P, Schelinski S, Kiebel SJ, von Kriegstein K. Visual face-movement sensitive cortex is relevant for auditory-only speech recognition. Cortex 2015; 68:86-99. [DOI: 10.1016/j.cortex.2014.11.016] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 10/24/2014] [Accepted: 11/25/2014] [Indexed: 12/31/2022]
|
22
|
Dissociated roles of the inferior frontal gyrus and superior temporal sulcus in audiovisual processing: top-down and bottom-up mismatch detection. PLoS One 2015; 10:e0122580. [PMID: 25822912 PMCID: PMC4379108 DOI: 10.1371/journal.pone.0122580] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 02/18/2015] [Indexed: 11/21/2022] Open
Abstract
Visual inputs can distort auditory perception, and accurate auditory processing requires the ability to detect and ignore visual input that is simultaneous and incongruent with auditory information. However, the neural basis of this auditory selection from audiovisual information is unknown, whereas integration process of audiovisual inputs is intensively researched. Here, we tested the hypothesis that the inferior frontal gyrus (IFG) and superior temporal sulcus (STS) are involved in top-down and bottom-up processing, respectively, of target auditory information from audiovisual inputs. We recorded high gamma activity (HGA), which is associated with neuronal firing in local brain regions, using electrocorticography while patients with epilepsy judged the syllable spoken by a voice while looking at a voice-congruent or -incongruent lip movement from the speaker. The STS exhibited stronger HGA if the patient was presented with information of large audiovisual incongruence than of small incongruence, especially if the auditory information was correctly identified. On the other hand, the IFG exhibited stronger HGA in trials with small audiovisual incongruence when patients correctly perceived the auditory information than when patients incorrectly perceived the auditory information due to the mismatched visual information. These results indicate that the IFG and STS have dissociated roles in selective auditory processing, and suggest that the neural basis of selective auditory processing changes dynamically in accordance with the degree of incongruity between auditory and visual information.
Collapse
|
23
|
Schall S, von Kriegstein K. Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception. PLoS One 2014; 9:e86325. [PMID: 24466026 PMCID: PMC3900530 DOI: 10.1371/journal.pone.0086325] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 12/06/2013] [Indexed: 11/29/2022] Open
Abstract
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Collapse
Affiliation(s)
- Sonja Schall
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- * E-mail:
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- Humboldt University of Berlin, Berlin, Germany
| |
Collapse
|
24
|
Stekelenburg JJ, Maes JP, Van Gool AR, Sitskoorn M, Vroomen J. Deficient multisensory integration in schizophrenia: an event-related potential study. Schizophr Res 2013; 147:253-61. [PMID: 23707640 DOI: 10.1016/j.schres.2013.04.038] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 04/10/2013] [Accepted: 04/27/2013] [Indexed: 11/17/2022]
Abstract
BACKGROUND In many natural audiovisual events (e.g., the sight of a face articulating the syllable /ba/), the visual signal precedes the sound and thus allows observers to predict the onset and the content of the sound. In healthy adults, the N1 component of the event-related brain potential (ERP), reflecting neural activity associated with basic sound processing, is suppressed if a sound is accompanied by a video that reliably predicts sound onset. If the sound does not match the content of the video (e.g., hearing /ba/ while lipreading /fu/), the later occurring P2 component is affected. Here, we examined whether these visual information sources affect auditory processing in patients with schizophrenia. METHODS The electroencephalography (EEG) was recorded in 18 patients with schizophrenia and compared with that of 18 healthy volunteers. As stimuli we used video recordings of natural actions in which visual information preceded and predicted the onset of the sound that was either congruent or incongruent with the video. RESULTS For the healthy control group, visual information reduced the auditory-evoked N1 if compared to a sound-only condition, and stimulus-congruency affected the P2. This reduction in N1 was absent in patients with schizophrenia, and the congruency effect on the P2 was diminished. Distributed source estimations revealed deficits in the network subserving audiovisual integration in patients with schizophrenia. CONCLUSIONS The results show a deficit in multisensory processing in patients with schizophrenia and suggest that multisensory integration dysfunction may be an important and, to date, under-researched aspect of schizophrenia.
Collapse
Affiliation(s)
- Jeroen J Stekelenburg
- Tilburg University, Department of Cognitive Neuropsychology, P.O. Box 90153, Warandelaan 2, 5000 LE Tilburg, The Netherlands.
| | | | | | | | | |
Collapse
|