1
|
Huang J, Wang A, Zhang M. The audiovisual competition effect induced by temporal asynchronous encoding weakened the visual dominance in working memory retrieval. Memory 2024:1-14. [PMID: 39067050 DOI: 10.1080/09658211.2024.2381782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 07/11/2024] [Indexed: 07/30/2024]
Abstract
Converging evidence suggests a facilitation effect of multisensory interactions on memory performance, reflected in higher accuracy or faster response time under a bimodal encoding condition than a unimodal condition. However, relatively little attention has been given to the effect of multisensory competition on memory. The present study adopted an adaptive staircase test to measure the point of subjective simultaneity (PSS), combined with a delayed matched-to-sample (DMS) task to probe the effect of audiovisual competition during the encoding stage on subsequent unisensory retrieval. The results showed that there was a robust visual dominance effect and multisensory interference effect in WM retrieval, regardless of the subjective synchronous or subjective asynchronous audiovisual presentation. However, a weakened visual dominance effect was observed when the auditory stimulus was presented before the visual stimulus in the encoding period, particularly in the semantically incongruent case. These findings revealed that the prior-entry of sensory information in the early perceptual stage could affect the processing in the late cognitive stage to some extent, and supported the evidence that there is a persistent advantage for visuospatial sketchpad in multisensory WM.
Collapse
Affiliation(s)
- Jie Huang
- Department of Psychology, Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, People's Republic of China
| | - Aijun Wang
- Department of Psychology, Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, People's Republic of China
| | - Ming Zhang
- School of Psychology, Northeast Normal University, Changchun, People's Republic of China
- Department of Psychology, Suzhou University of Science and Technology, Suzhou, People's Republic of China
- Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University, Okayama, Japan
| |
Collapse
|
2
|
Liu Y, Wang Z, Wei T, Zhou S, Yin Y, Mi Y, Liu X, Tang Y. Alterations of Audiovisual Integration in Alzheimer's Disease. Neurosci Bull 2023; 39:1859-1872. [PMID: 37812301 PMCID: PMC10661680 DOI: 10.1007/s12264-023-01125-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 06/22/2023] [Indexed: 10/10/2023] Open
Abstract
Audiovisual integration is a vital information process involved in cognition and is closely correlated with aging and Alzheimer's disease (AD). In this review, we evaluated the altered audiovisual integrative behavioral symptoms in AD. We further analyzed the relationships between AD pathologies and audiovisual integration alterations bidirectionally and suggested the possible mechanisms of audiovisual integration alterations underlying AD, including the imbalance between energy demand and supply, activity-dependent degeneration, disrupted brain networks, and cognitive resource overloading. Then, based on the clinical characteristics including electrophysiological and imaging data related to audiovisual integration, we emphasized the value of audiovisual integration alterations as potential biomarkers for the early diagnosis and progression of AD. We also highlighted that treatments targeted audiovisual integration contributed to widespread pathological improvements in AD animal models and cognitive improvements in AD patients. Moreover, investigation into audiovisual integration alterations in AD also provided new insights and comprehension about sensory information processes.
Collapse
Affiliation(s)
- Yufei Liu
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Zhibin Wang
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Tao Wei
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Shaojiong Zhou
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Yunsi Yin
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Yingxin Mi
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Xiaoduo Liu
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China
| | - Yi Tang
- Department of Neurology and Innovation Center for Neurological Disorders, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, 100053, China.
| |
Collapse
|
3
|
Nidiffer AR, Cao CZ, O'Sullivan A, Lalor EC. A representation of abstract linguistic categories in the visual system underlies successful lipreading. Neuroimage 2023; 282:120391. [PMID: 37757989 DOI: 10.1016/j.neuroimage.2023.120391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 09/29/2023] Open
Abstract
There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.
Collapse
Affiliation(s)
- Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Cody Zhewei Cao
- Department of Psychology, University of Michigan, Ann Arbor, MI, USA
| | - Aisling O'Sullivan
- School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland.
| |
Collapse
|
4
|
Scheliga S, Kellermann T, Lampert A, Rolke R, Spehr M, Habel U. Neural correlates of multisensory integration in the human brain: an ALE meta-analysis. Rev Neurosci 2023; 34:223-245. [PMID: 36084305 DOI: 10.1515/revneuro-2022-0065] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 07/22/2022] [Indexed: 02/07/2023]
Abstract
Previous fMRI research identified superior temporal sulcus as central integration area for audiovisual stimuli. However, less is known about a general multisensory integration network across senses. Therefore, we conducted activation likelihood estimation meta-analysis with multiple sensory modalities to identify a common brain network. We included 49 studies covering all Aristotelian senses i.e., auditory, visual, tactile, gustatory, and olfactory stimuli. Analysis revealed significant activation in bilateral superior temporal gyrus, middle temporal gyrus, thalamus, right insula, and left inferior frontal gyrus. We assume these regions to be part of a general multisensory integration network comprising different functional roles. Here, thalamus operate as first subcortical relay projecting sensory information to higher cortical integration centers in superior temporal gyrus/sulcus while conflict-processing brain regions as insula and inferior frontal gyrus facilitate integration of incongruent information. We additionally performed meta-analytic connectivity modelling and found each brain region showed co-activations within the identified multisensory integration network. Therefore, by including multiple sensory modalities in our meta-analysis the results may provide evidence for a common brain network that supports different functional roles for multisensory integration.
Collapse
Affiliation(s)
- Sebastian Scheliga
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Thilo Kellermann
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany.,JARA-Institute Brain Structure Function Relationship, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Angelika Lampert
- Institute of Physiology, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Roman Rolke
- Department of Palliative Medicine, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Marc Spehr
- Department of Chemosensation, RWTH Aachen University, Institute for Biology, Worringerweg 3, 52074 Aachen, Germany
| | - Ute Habel
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty RWTH Aachen University, Pauwelsstraße 30, 52074 Aachen, Germany.,JARA-Institute Brain Structure Function Relationship, Pauwelsstraße 30, 52074 Aachen, Germany
| |
Collapse
|
5
|
Fisher VL, Dean CL, Nave CS, Parkins EV, Kerkhoff WG, Kwakye LD. Increases in sensory noise predict attentional disruptions to audiovisual speech perception. Front Hum Neurosci 2023; 16:1027335. [PMID: 36684833 PMCID: PMC9846366 DOI: 10.3389/fnhum.2022.1027335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/05/2022] [Indexed: 01/06/2023] Open
Abstract
We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD.
Collapse
Affiliation(s)
- Victoria L. Fisher
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Yale University School of Medicine and the Connecticut Mental Health Center, New Haven, CT, United States
| | - Cassandra L. Dean
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Roche/Genentech Neurodevelopment & Psychiatry Teams Product Development, Neuroscience, South San Francisco, CA, United States
| | - Claire S. Nave
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
| | - Emma V. Parkins
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Neuroscience Graduate Program, University of Cincinnati, Cincinnati, OH, United States
| | - Willa G. Kerkhoff
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Leslie D. Kwakye
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,*Correspondence: Leslie D. Kwakye,
| |
Collapse
|
6
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
7
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|
8
|
Zhang L, Du Y. Lip movements enhance speech representations and effective connectivity in auditory dorsal stream. Neuroimage 2022; 257:119311. [PMID: 35589000 DOI: 10.1016/j.neuroimage.2022.119311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/09/2022] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Viewing speaker's lip movements facilitates speech perception, especially under adverse listening conditions, but the neural mechanisms of this perceptual benefit at the phonemic and feature levels remain unclear. This fMRI study addressed this question by quantifying regional multivariate representation and network organization underlying audiovisual speech-in-noise perception. Behaviorally, valid lip movements improved recognition of place of articulation to aid phoneme identification. Meanwhile, lip movements enhanced neural representations of phonemes in left auditory dorsal stream regions, including frontal speech motor areas and supramarginal gyrus (SMG). Moreover, neural representations of place of articulation and voicing features were promoted differentially by lip movements in these regions, with voicing enhanced in Broca's area while place of articulation better encoded in left ventral premotor cortex and SMG. Next, dynamic causal modeling (DCM) analysis showed that such local changes were accompanied by strengthened effective connectivity along the dorsal stream. Moreover, the neurite orientation dispersion of the left arcuate fasciculus, the bearing skeleton of auditory dorsal stream, predicted the visual enhancements of neural representations and effective connectivity. Our findings provide novel insight to speech science that lip movements promote both local phonemic and feature encoding and network connectivity in the dorsal pathway and the functional enhancement is mediated by the microstructural architecture of the circuit.
Collapse
Affiliation(s)
- Lei Zhang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049
| | - Yi Du
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China 200031; Chinese Institute for Brain Research, Beijing, China 102206.
| |
Collapse
|
9
|
Asymmetrical cross-modal influence on neural encoding of auditory and visual features in natural scenes. Neuroimage 2022; 255:119182. [PMID: 35395403 DOI: 10.1016/j.neuroimage.2022.119182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 03/24/2022] [Accepted: 04/04/2022] [Indexed: 11/22/2022] Open
Abstract
Natural scenes contain multi-modal information, which is integrated to form a coherent perception. Previous studies have demonstrated that cross-modal information can modulate neural encoding of low-level sensory features. These studies, however, mostly focus on the processing of single sensory events or rhythmic sensory sequences. Here, we investigate how the neural encoding of basic auditory and visual features is modulated by cross-modal information when the participants watch movie clips primarily composed of non-rhythmic events. We presented audiovisual congruent and audiovisual incongruent movie clips, and since attention can modulate cross-modal interactions, we separately analyzed high- and low-arousal movie clips. We recorded neural responses using electroencephalography (EEG), and employed the temporal response function (TRF) to quantify the neural encoding of auditory and visual features. The neural encoding of sound envelope is enhanced in the audiovisual congruent condition than the incongruent condition, but this effect is only significant for high-arousal movie clips. In contrast, audiovisual congruency does not significantly modulate the neural encoding of visual features, e.g., luminance or visual motion. In summary, our findings demonstrate asymmetrical cross-modal interactions during the processing of natural scenes that lack rhythmicity: Congruent visual information enhances low-level auditory processing, while congruent auditory information does not significantly modulate low-level visual processing.
Collapse
|
10
|
Peelle JE, Spehar B, Jones MS, McConkey S, Myerson J, Hale S, Sommers MS, Tye-Murray N. Increased Connectivity among Sensory and Motor Regions during Visual and Audiovisual Speech Perception. J Neurosci 2022; 42:435-442. [PMID: 34815317 PMCID: PMC8802926 DOI: 10.1523/jneurosci.0114-21.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 10/29/2021] [Accepted: 11/08/2021] [Indexed: 11/21/2022] Open
Abstract
In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here, we used fMRI to monitor brain activity while adult humans (n = 60) were presented with visual-only, auditory-only, and audiovisual words. The audiovisual words were presented in quiet and in several signal-to-noise ratios. As expected, audiovisual speech perception recruited both auditory and visual cortex, with some evidence for increased recruitment of premotor cortex in some conditions (including in substantial background noise). We then investigated neural connectivity using psychophysiological interaction analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. In addition to whole-brain analyses, we also conducted a region-of-interest analysis on the left posterior superior temporal sulcus (pSTS), implicated in many previous studies of audiovisual speech perception. We found evidence for both activity and effective connectivity in pSTS for visual-only and audiovisual speech, although these were not significant in whole-brain analyses. Together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech that complements activity in integrative brain regions like pSTS.SIGNIFICANCE STATEMENT In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is hard to understand (e.g., background noise). Prior work has suggested that specialized regions of the brain may play a critical role in integrating information from visual and auditory speech. Here, we show a complementary mechanism relying on synchronized brain activity among sensory and motor regions may also play a critical role. These findings encourage reconceptualizing audiovisual integration in the context of coordinated network activity.
Collapse
Affiliation(s)
- Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Brent Spehar
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Michael S Jones
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Sarah McConkey
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Joel Myerson
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Sandra Hale
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Nancy Tye-Murray
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| |
Collapse
|
11
|
O'Sullivan AE, Crosse MJ, Liberto GMD, de Cheveigné A, Lalor EC. Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects. J Neurosci 2021; 41:4991-5003. [PMID: 33824190 PMCID: PMC8197638 DOI: 10.1523/jneurosci.0906-20.2021] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 12/27/2022] Open
Abstract
Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Michael J Crosse
- X, The Moonshot Factory, Mountain View, CA and Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
- University College London Ear Institute, University College London, London WC1X 8EE, United Kingdom
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering and Department of Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
12
|
Abstract
Adaptive behavior in a complex, dynamic, and multisensory world poses some of the most fundamental computational challenges for the brain, notably inference, decision-making, learning, binding, and attention. We first discuss how the brain integrates sensory signals from the same source to support perceptual inference and decision-making by weighting them according to their momentary sensory uncertainties. We then show how observers solve the binding or causal inference problem-deciding whether signals come from common causes and should hence be integrated or else be treated independently. Next, we describe the multifarious interplay between multisensory processing and attention. We argue that attentional mechanisms are crucial to compute approximate solutions to the binding problem in naturalistic environments when complex time-varying signals arise from myriad causes. Finally, we review how the brain dynamically adapts multisensory processing to a changing world across multiple timescales.
Collapse
Affiliation(s)
- Uta Noppeney
- Donders Institute for Brain, Cognition and Behavior, Radboud University, 6525 AJ Nijmegen, The Netherlands;
| |
Collapse
|
13
|
Drijvers L, Jensen O, Spaak E. Rapid invisible frequency tagging reveals nonlinear integration of auditory and visual information. Hum Brain Mapp 2021; 42:1138-1152. [PMID: 33206441 PMCID: PMC7856646 DOI: 10.1002/hbm.25282] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 10/15/2020] [Accepted: 10/21/2020] [Indexed: 12/21/2022] Open
Abstract
During communication in real-life settings, the brain integrates information from auditory and visual modalities to form a unified percept of our environment. In the current magnetoencephalography (MEG) study, we used rapid invisible frequency tagging (RIFT) to generate steady-state evoked fields and investigated the integration of audiovisual information in a semantic context. We presented participants with videos of an actress uttering action verbs (auditory; tagged at 61 Hz) accompanied by a gesture (visual; tagged at 68 Hz, using a projector with a 1,440 Hz refresh rate). Integration difficulty was manipulated by lower-order auditory factors (clear/degraded speech) and higher-order visual factors (congruent/incongruent gesture). We identified MEG spectral peaks at the individual (61/68 Hz) tagging frequencies. We furthermore observed a peak at the intermodulation frequency of the auditory and visually tagged signals (fvisual - fauditory = 7 Hz), specifically when lower-order integration was easiest because signal quality was optimal. This intermodulation peak is a signature of nonlinear audiovisual integration, and was strongest in left inferior frontal gyrus and left temporal regions; areas known to be involved in speech-gesture integration. The enhanced power at the intermodulation frequency thus reflects the ease of lower-order audiovisual integration and demonstrates that speech-gesture information interacts in higher-order language areas. Furthermore, we provide a proof-of-principle of the use of RIFT to study the integration of audiovisual stimuli, in relation to, for instance, semantic context.
Collapse
Affiliation(s)
- Linda Drijvers
- Donders Institute for Brain, Cognition, and Behaviour, Centre for Cognition, Montessorilaan 3Radboud UniversityNijmegenHRThe Netherlands
- Max Planck Institute for PsycholinguisticsNijmegenXDThe Netherlands
| | - Ole Jensen
- School of Psychology, Centre for Human Brain HealthUniversity of BirminghamBirminghamUnited Kingdom
| | - Eelke Spaak
- Donders Institute for Brain, Cognition, and Behaviour, Centre for Cognitive Neuroimaging, Kapittelweg 29Radboud UniversityNijmegenENThe Netherlands
| |
Collapse
|
14
|
Lalonde K, Werner LA. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit. Brain Sci 2021; 11:49. [PMID: 33466253 PMCID: PMC7824772 DOI: 10.3390/brainsci11010049] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/30/2020] [Accepted: 12/30/2020] [Indexed: 02/07/2023] Open
Abstract
The natural environments in which infants and children learn speech and language are noisy and multimodal. Adults rely on the multimodal nature of speech to compensate for noisy environments during speech communication. Multiple mechanisms underlie mature audiovisual benefit to speech perception, including reduced uncertainty as to when auditory speech will occur, use of correlations between the amplitude envelope of auditory and visual signals in fluent speech, and use of visual phonetic knowledge for lexical access. This paper reviews evidence regarding infants' and children's use of temporal and phonetic mechanisms in audiovisual speech perception benefit. The ability to use temporal cues for audiovisual speech perception benefit emerges in infancy. Although infants are sensitive to the correspondence between auditory and visual phonetic cues, the ability to use this correspondence for audiovisual benefit may not emerge until age four. A more cohesive account of the development of audiovisual speech perception may follow from a more thorough understanding of the development of sensitivity to and use of various temporal and phonetic cues.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE 68131, USA
| | - Lynne A. Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA 98105, USA;
| |
Collapse
|
15
|
Paraskevopoulos E, Chalas N, Karagiorgis A, Karagianni M, Styliadis C, Papadelis G, Bamidis P. Aging Effects on the Neuroplastic Attributes of Multisensory Cortical Networks as Triggered by a Computerized Music Reading Training Intervention. Cereb Cortex 2021; 31:123-137. [PMID: 32794571 DOI: 10.1093/cercor/bhaa213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 07/08/2020] [Accepted: 07/13/2020] [Indexed: 12/24/2022] Open
Abstract
The constant increase in the graying population is the result of a great expansion of life expectancy. A smaller expansion of healthy cognitive and brain functioning diminishes the gains achieved by longevity. Music training, as a special case of multisensory learning, may induce restorative neuroplasticity in older ages. The current study aimed to explore aging effects on the cortical network supporting multisensory cognition and to define aging effects on the network's neuroplastic attributes. A computer-based music reading protocol was developed and evaluated via electroencephalography measurements pre- and post-training on young and older adults. Results revealed that multisensory integration is performed via diverse strategies in the two groups: Older adults employ higher-order supramodal areas to a greater extent than lower level perceptual regions, in contrast to younger adults, indicating an age-related shift in the weight of each processing strategy. Restorative neuroplasticity was revealed in the left inferior frontal gyrus and right medial temporal gyrus, as a result of the training, while task-related reorganization of cortical connectivity was obstructed in the group of older adults, probably due to systemic maturation mechanisms. On the contrary, younger adults significantly increased functional connectivity among the regions supporting multisensory integration.
Collapse
Affiliation(s)
- Evangelos Paraskevopoulos
- School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Nikolas Chalas
- School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece.,Institute for Biomagnetism and Biosignal Analysis, University of Münster, D-48149 Münster, Germany
| | - Alexandros Karagiorgis
- School of Music Studies, Faculty of Fine Arts, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Maria Karagianni
- School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Charis Styliadis
- School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Georgios Papadelis
- School of Music Studies, Faculty of Fine Arts, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| | - Panagiotis Bamidis
- School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
| |
Collapse
|
16
|
Sun H, Yue Q, Sy JL, Godwin D, Eaton HP, Raghavan P, Marois R. Increase in internetwork functional connectivity in the human brain with attention capture. J Neurophysiol 2020; 124:1885-1899. [PMID: 33052763 DOI: 10.1152/jn.00693.2019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Attention is often extolled for its selective neural properties. Yet, when powerfully captured by a salient unexpected event, attention can give rise to a broad cascade of systemic effects for evaluating and adaptively responding to the event. Using graph theory analysis combined with fMRI, we show here that the extensive psychophysiological and cognitive changes associated with such attention capture are related to large-scale distributed changes in the brain's functional connectivity. Novel task-irrelevant "oddball" stimuli presented to subjects during the performance of a target-search task triggered an increase in internetwork functional connectivity that degraded the brain's network modularity, thereby facilitating the integration of information. Furthermore, this phenomenon habituated with repeated oddball presentations, mirroring the behavior. These functional network connectivity changes are remarkably consistent with those previously obtained with conscious target perception, thus raising the possibility that large-scale internetwork connectivity changes triggered by attentional capture and awareness rely on common neural network dynamics.NEW & NOTEWORTHY The selective properties of attention have been extensively studied. There are some circumstances in which attention can have widespread and systemic effects, however, such as when it is captured by an unexpected, salient stimulus or event. How are such effects propagated in the human brain? Using graph theory analysis of fMRI data, we show here that salient task-irrelevant events produced a global increase in the functional integration of the brain's neural networks.
Collapse
Affiliation(s)
- Hongyang Sun
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee
| | - Qiuhai Yue
- Department of Psychology, Vanderbilt University, Nashville, Tennessee
| | - Jocelyn L Sy
- Department of Psychology, Vanderbilt University, Nashville, Tennessee
| | - Douglass Godwin
- Department of Psychology, Vanderbilt University, Nashville, Tennessee
| | - Hana P Eaton
- Department of Psychology, Vanderbilt University, Nashville, Tennessee
| | - Padma Raghavan
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee
| | - René Marois
- Department of Psychology, Vanderbilt University, Nashville, Tennessee.,Vanderbilt Vision Research Center, Vanderbilt University, Nashville, Tennessee.,Center for Integrative and Cognitive Neuroscience, Vanderbilt University, Nashville, Tennessee
| |
Collapse
|
17
|
Biological Action Identification Does Not Require Early Visual Input for Development. eNeuro 2020; 7:ENEURO.0534-19.2020. [PMID: 33060179 PMCID: PMC7598910 DOI: 10.1523/eneuro.0534-19.2020] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 08/16/2020] [Accepted: 08/27/2020] [Indexed: 11/21/2022] Open
Abstract
Visual input during the first years of life is vital for the development of numerous visual functions. While normal development of global motion perception seems to require visual input during an early sensitive period, the detection of biological motion (BM) does not seem to do so. A more complex form of BM processing is the identification of human actions. Here, we tested whether identification rather than detection of BM is experience dependent. A group of human participants who had been treated for congenital cataracts (CC; of up to 18 years in duration, CC group) had to identify ten actions performed by human line figures. In addition, they performed a coherent motion (CM) detection task, which required identifying the direction of CM amid the movement of random dots. As controls, developmental cataract (DC) reversal individuals (DC group) who had undergone the same surgical treatment as CC group were included. Moreover, normally sighted controls were tested both with vision blurred to match the visual acuity (VA) of CC individuals [vision matched (VM) group] and with full sight [sighted control (SC) group]. The CC group identified biological actions with an extraordinary high accuracy (on average ∼85% correct) and was indistinguishable from the VM control group. By contrast, CM processing impairments of the CC group persisted even after controlling for VA. These results in the same individuals demonstrate an impressive resilience of BM processing to aberrant early visual experience and at the same time a sensitive period for the development of CM processing.
Collapse
|
18
|
Taneja MK. Prevention and Rehabilitation of Old Age Deafness. Indian J Otolaryngol Head Neck Surg 2020; 72:524-531. [PMID: 33088786 DOI: 10.1007/s12070-020-01856-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 04/04/2020] [Indexed: 10/24/2022] Open
Abstract
Hearing impairment is one of the most common sensory deficit affecting 466 million people globally and in majority of old age people it can not corrected. Since presbycusis is always associated with diminished cognition power resulting in two fold loss in understanding of speech. There is no treatment available till date to regenerate the hair cells but certainly we can augment hearing by preventing and regenerating (apoptosis) atrophy of stria vascularis, spiral neural cells degeneration, atrophy of auditory nerve and cerebral cortex by modified greeva, skandh chalan, dynamic neurobics, tratak (focused concentration), Bhramari, Kumbhak along with mindful relaxation technique.
Collapse
Affiliation(s)
- M K Taneja
- Indian Institute of Ear Diseases, E-982 C. R. Park, New Delhi, India
| |
Collapse
|
19
|
van de Ven V, Waldorp L, Christoffels I. Hippocampus plays a role in speech feedback processing. Neuroimage 2020; 223:117319. [PMID: 32882376 DOI: 10.1016/j.neuroimage.2020.117319] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 06/10/2020] [Accepted: 08/25/2020] [Indexed: 11/26/2022] Open
Abstract
There is increasing evidence that the hippocampus is involved in language production and verbal communication, although little is known about its possible role. According to one view, hippocampus contributes semantic memory to spoken language. Alternatively, hippocampus is involved in the processing the (mis)match between expected sensory consequences of speaking and the perceived speech feedback. In the current study, we re-analysed functional magnetic resonance (fMRI) data of two overt picture-naming studies to test whether hippocampus is involved in speech production and, if so, whether the results can distinguish between a "pure memory" versus a "prediction" account of hippocampal involvement. In both studies, participants overtly named pictures during scanning while hearing their own speech feedback unimpededly or impaired by a superimposed noise mask. Results showed decreased hippocampal activity when speech feedback was impaired, compared to when feedback was unimpeded. Further, we found increased functional coupling between auditory cortex and hippocampus during unimpeded speech feedback, compared to impaired feedback. Finally, we found significant functional coupling between a hippocampal/supplementary motor area (SMA) interaction term and auditory cortex, anterior cingulate cortex and cerebellum during overt picture naming, but not during listening to one's own pre-recorded voice. These findings indicate that hippocampus plays a role in speech production that is in accordance with a "prediction" view of hippocampal functioning.
Collapse
Affiliation(s)
- Vincent van de Ven
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, the Netherlands.
| | | | - Ingrid Christoffels
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, the Netherlands
| |
Collapse
|
20
|
Jasmin K, Dick F, Stewart L, Tierney AT. Altered functional connectivity during speech perception in congenital amusia. eLife 2020; 9:e53539. [PMID: 32762842 PMCID: PMC7449693 DOI: 10.7554/elife.53539] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 08/03/2020] [Indexed: 12/11/2022] Open
Abstract
Individuals with congenital amusia have a lifelong history of unreliable pitch processing. Accordingly, they downweight pitch cues during speech perception and instead rely on other dimensions such as duration. We investigated the neural basis for this strategy. During fMRI, individuals with amusia (N = 15) and controls (N = 15) read sentences where a comma indicated a grammatical phrase boundary. They then heard two sentences spoken that differed only in pitch and/or duration cues and selected the best match for the written sentence. Prominent reductions in functional connectivity were detected in the amusia group between left prefrontal language-related regions and right hemisphere pitch-related regions, which reflected the between-group differences in cue weights in the same groups of listeners. Connectivity differences between these regions were not present during a control task. Our results indicate that the reliability of perceptual dimensions is linked with functional connectivity between frontal and perceptual regions and suggest a compensatory mechanism.
Collapse
Affiliation(s)
- Kyle Jasmin
- Department of Psychological Sciences, Birkbeck University of LondonLondonUnited Kingdom
- UCL Institute of Cognitive Neuroscience, University College LondonLondonUnited Kingdom
| | - Frederic Dick
- Department of Psychological Sciences, Birkbeck University of LondonLondonUnited Kingdom
- Department of Experimental Psychology, University College LondonLondonUnited Kingdom
| | - Lauren Stewart
- Department of Psychology, Goldsmiths University of LondonLondonUnited Kingdom
| | - Adam Taylor Tierney
- Department of Psychological Sciences, Birkbeck University of LondonLondonUnited Kingdom
| |
Collapse
|
21
|
Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. J Neurosci 2020; 40:6938-6948. [PMID: 32727820 PMCID: PMC7470920 DOI: 10.1523/jneurosci.0279-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/22/2022] Open
Abstract
Experimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech, but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory-only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in which visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.SIGNIFICANCE STATEMENT Understanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.
Collapse
|
22
|
Kumar VG, Dutta S, Talwar S, Roy D, Banerjee A. Biophysical mechanisms governing large-scale brain network dynamics underlying individual-specific variability of perception. Eur J Neurosci 2020; 52:3746-3762. [PMID: 32304122 DOI: 10.1111/ejn.14747] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/07/2020] [Accepted: 04/08/2020] [Indexed: 11/30/2022]
Abstract
Perception necessitates interaction among neuronal ensembles, the dynamics of which can be conceptualized as the emergent behavior of coupled dynamical systems. Here, we propose a detailed neurobiologically realistic model that captures the neural mechanisms of inter-individual variability observed in cross-modal speech perception. From raw EEG signals recorded from human participants when they were presented with speech vocalizations of McGurk-incongruent and congruent audio-visual (AV) stimuli, we computed the global coherence metric to capture the neural variability of large-scale networks. We identified that participants' McGurk susceptibility was negatively correlated to their alpha band global coherence. The proposed biophysical model conceptualized the global coherence dynamics emerge from coupling between the interacting neural masses-representing the sensory-specific auditory/visual areas and modality nonspecific associative/integrative regions. Subsequently, we could predict that an extremely weak direct AV coupling results in a decrease in alpha band global coherence-mimicking the cortical dynamics of participants with higher McGurk susceptibility. Source connectivity analysis also showed decreased connectivity between sensory-specific regions in participants more susceptible to McGurk effect, thus establishing an empirical validation to the prediction. Overall, our study provides an outline to link variability in structural and functional connectivity metrics to variability of performance that can be useful for several perception and action task paradigms.
Collapse
Affiliation(s)
- Vinodh G Kumar
- Cognitive Brain Dynamics Lab, National Brain Research Centre, Gurgaon, India
| | - Shrey Dutta
- Cognitive Brain Dynamics Lab, National Brain Research Centre, Gurgaon, India
| | - Siddharth Talwar
- Cognitive Brain Dynamics Lab, National Brain Research Centre, Gurgaon, India
| | - Dipanjan Roy
- Cognitive Brain Dynamics Lab, National Brain Research Centre, Gurgaon, India
| | - Arpan Banerjee
- Cognitive Brain Dynamics Lab, National Brain Research Centre, Gurgaon, India
| |
Collapse
|
23
|
Age-related hearing loss influences functional connectivity of auditory cortex for the McGurk illusion. Cortex 2020; 129:266-280. [PMID: 32535378 DOI: 10.1016/j.cortex.2020.04.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 03/30/2020] [Accepted: 04/09/2020] [Indexed: 01/23/2023]
Abstract
Age-related hearing loss affects hearing at high frequencies and is associated with difficulties in understanding speech. Increased audio-visual integration has recently been found in age-related hearing impairment, the brain mechanisms that contribute to this effect are however unclear. We used functional magnetic resonance imaging in elderly subjects with normal hearing and mild to moderate uncompensated hearing loss. Audio-visual integration was studied using the McGurk task. In this task, an illusionary fused percept can occur if incongruent auditory and visual syllables are presented. The paradigm included unisensory stimuli (auditory only, visual only), congruent audio-visual and incongruent (McGurk) audio-visual stimuli. An illusionary precept was reported in over 60% of incongruent trials. These McGurk illusion rates were equal in both groups of elderly subjects and correlated positively with speech-in-noise perception and daily listening effort. Normal-hearing participants showed an increased neural response in left pre- and postcentral gyri and right middle frontal gyrus for incongruent stimuli (McGurk) compared to congruent audio-visual stimuli. Activation patterns were however not different between groups. Task-modulated functional connectivity differed between groups showing increased connectivity from auditory cortex to visual, parietal and frontal areas in hard of hearing participants as compared to normal-hearing participants when comparing incongruent stimuli (McGurk) with congruent audio-visual stimuli. These results suggest that changes in functional connectivity of auditory cortex rather than activation strength during processing of audio-visual McGurk stimuli accompany age-related hearing loss.
Collapse
|
24
|
Leminen A, Verwoert M, Moisala M, Salmela V, Wikman P, Alho K. Modulation of Brain Activity by Selective Attention to Audiovisual Dialogues. Front Neurosci 2020; 14:436. [PMID: 32477054 PMCID: PMC7235384 DOI: 10.3389/fnins.2020.00436] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 04/09/2020] [Indexed: 01/08/2023] Open
Abstract
In real-life noisy situations, we can selectively attend to conversations in the presence of irrelevant voices, but neurocognitive mechanisms in such natural listening situations remain largely unexplored. Previous research has shown distributed activity in the mid superior temporal gyrus (STG) and sulcus (STS) while listening to speech and human voices, in the posterior STS and fusiform gyrus when combining auditory, visual and linguistic information, as well as in left-hemisphere temporal and frontal cortical areas during comprehension. In the present functional magnetic resonance imaging (fMRI) study, we investigated how selective attention modulates neural responses to naturalistic audiovisual dialogues. Our healthy adult participants (N = 15) selectively attended to video-taped dialogues between a man and woman in the presence of irrelevant continuous speech in the background. We modulated the auditory quality of dialogues with noise vocoding and their visual quality by masking speech-related facial movements. Both increased auditory quality and increased visual quality were associated with bilateral activity enhancements in the STG/STS. In addition, decreased audiovisual stimulus quality elicited enhanced fronto-parietal activity, presumably reflecting increased attentional demands. Finally, attention to the dialogues, in relation to a control task where a fixation cross was attended and the dialogue ignored, yielded enhanced activity in the left planum polare, angular gyrus, the right temporal pole, as well as in the orbitofrontal/ventromedial prefrontal cortex and posterior cingulate gyrus. Our findings suggest that naturalistic conversations effectively engage participants and reveal brain networks related to social perception in addition to speech and semantic processing networks.
Collapse
Affiliation(s)
- Alina Leminen
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Cognitive Science, Department of Digital Humanities, Helsinki Centre for Digital Humanities (Heldig), University of Helsinki, Helsinki, Finland
- Cognitive Brain Research Unit, Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Center for Cognition and Decision Making, Institute of Cognitive Neuroscience, National Research University – Higher School of Economics, Moscow, Russia
| | - Maxime Verwoert
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Mona Moisala
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Viljami Salmela
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Patrik Wikman
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Kimmo Alho
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
25
|
Micheli C, Schepers IM, Ozker M, Yoshor D, Beauchamp MS, Rieger JW. Electrocorticography reveals continuous auditory and visual speech tracking in temporal and occipital cortex. Eur J Neurosci 2020; 51:1364-1376. [PMID: 29888819 PMCID: PMC6289876 DOI: 10.1111/ejn.13992] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 05/19/2018] [Accepted: 05/29/2018] [Indexed: 12/11/2022]
Abstract
During natural speech perception, humans must parse temporally continuous auditory and visual speech signals into sequences of words. However, most studies of speech perception present only single words or syllables. We used electrocorticography (subdural electrodes implanted on the brains of epileptic patients) to investigate the neural mechanisms for processing continuous audiovisual speech signals consisting of individual sentences. Using partial correlation analysis, we found that posterior superior temporal gyrus (pSTG) and medial occipital cortex tracked both the auditory and the visual speech envelopes. These same regions, as well as inferior temporal cortex, responded more strongly to a dynamic video of a talking face compared to auditory speech paired with a static face. Occipital cortex and pSTG carry temporal information about both auditory and visual speech dynamics. Visual speech tracking in pSTG may be a mechanism for enhancing perception of degraded auditory speech.
Collapse
Affiliation(s)
- Cristiano Micheli
- Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany
- Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, The Netherlands
| | - Inga M Schepers
- Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany
- Research Center Neurosensory Science, Carl von Ossietzky University, Oldenburg, Germany
| | - Müge Ozker
- Department of Neurosurgery, Baylor College of Medicine, Houston, Texas
| | - Daniel Yoshor
- Department of Neurosurgery, Baylor College of Medicine, Houston, Texas
- Michael E. DeBakey Veterans Affairs Medical Center, Houston, Texas
| | | | - Jochem W Rieger
- Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany
- Research Center Neurosensory Science, Carl von Ossietzky University, Oldenburg, Germany
| |
Collapse
|
26
|
Wroblewski A, He Y, Straube B. Dynamic Causal Modelling suggests impaired effective connectivity in patients with schizophrenia spectrum disorders during gesture-speech integration. Schizophr Res 2020; 216:175-183. [PMID: 31882274 DOI: 10.1016/j.schres.2019.12.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/26/2019] [Accepted: 12/15/2019] [Indexed: 12/18/2022]
Abstract
Integrating visual and auditory information during gesture-speech integration (GSI) is important for successful social communication, which is often impaired in schizophrenia. Several studies suggested the posterior superior temporal sulcus (pSTS) to be a relevant multisensory integration site. However, intact STS activation patterns were often reported in patients. Thus, here we used Dynamic Causal Modelling (DCM) to analyze whether information processing in schizophrenia spectrum disorders (SSD) is impaired during GSI on network level. We investigated GSI in three different samples. First, we replicated a recently published connectivity model for GSI in a healthy subject group (n = 19). Second, we investigated differences between patients with SSD and a matched healthy control group (n = 17 each). Participants were presented videos of an actor performing intrinsically meaningful gestures accompanied by spoken sentences in German or Russian, or just telling a German sentence without gestures. Across all groups, fMRI analyses revealed similar activation patterns, and DCM analyses resulted in the same winning model for GSI. This finding directly replicates previous results. However, patients revealed significantly reduced connectivity in the verbal pathway (from left middle temporal gyrus (MTG) to left STS). The clinical significance of this connection is supported by its correlations with the severity of concretism and a subscale of negative symptoms (SANS). Our model confirms the importance of the pSTS as integration site during audio-visual integration. Patients showed generally intact connectivity during GSI, but revealed impaired information transfer via the verbal pathway. This might be the basis of interpersonal communication problems in patients with SSD.
Collapse
Affiliation(s)
- Adrian Wroblewski
- Translational Neuroimaging Marburg (TNM), Department of Psychiatry and Psychotherapy, University of Marburg, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany.
| | - Yifei He
- Translational Neuroimaging Marburg (TNM), Department of Psychiatry and Psychotherapy, University of Marburg, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany; Faculty of Translation, Language, and Cultural Studies, University of Mainz, Germersheim, Germany
| | - Benjamin Straube
- Translational Neuroimaging Marburg (TNM), Department of Psychiatry and Psychotherapy, University of Marburg, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany
| |
Collapse
|
27
|
Nath A, Robinson M, Magnotti J, Karas P, Curry D, Paldino M. Determination of Differences in Seed-Based Resting State Functional Magnetic Resonance Imaging Language Networks in Pediatric Patients with Left- and Right-Lateralized Language: A Pilot Study. J Epilepsy Res 2019; 9:93-102. [PMID: 32509544 PMCID: PMC7251337 DOI: 10.14581/jer.19011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 01/24/2020] [Accepted: 01/24/2020] [Indexed: 11/03/2022] Open
Abstract
Background and Purpose The current tools available for localization of expressive language, including functional magnetic resonance imaging (fMRI) and cortical stimulation mapping (CSM), require that the patient remain stationary and follow language commands with precise timing. Many pediatric epilepsy patients, however, have intact language skills but are unable to participate in these tasks due to cognitive impairments or young age. In adult subjects, there is evidence that language laterality can be determined by resting state (RS) fMRI activity, however there are few studies on the use of RS to accurately predict language laterality in children. Methods A retrospective review of pediatric patients at Texas Children's Hospital was performed to identify patients who have undergone epilepsy surgical planning over 3 years with language localization using traditional methods of Wada testing, CSM, or task-based fMRI with calculated laterality index, as well as a 7-minute RS scan available without excessive motion or noise. We found the correlation between each subject's left and right Broca's region activity and each of 68 cortical regions. Results A group of nine patients with left-lateralized language were found to have greater voxel-wise correlations than a group of six patients with right-lateralized language between a left hemispheric Broca's region seed and the following six cortical regions: left inferior temporal, left lateral orbitofrontal, left pars triangularis, right lateral orbitofrontal, right pars orbitalis and right superior frontal regions. Conclusions In a cohort of children with epilepsy, we found that patients with left- and right-hemispheric language lateralization have different RS networks.
Collapse
Affiliation(s)
- Audrey Nath
- Department of Pediatric Neurology, Baylor College of Medicine, Houston, TX, USA
| | - Meghan Robinson
- Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| | - John Magnotti
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Patrick Karas
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Daniel Curry
- Division of Pediatric Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Michael Paldino
- Department of Radiology, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| |
Collapse
|
28
|
Almodovar-Rivera I, Maitra R. Fast Adaptive Smoothing and Thresholding for Improved Activation Detection in Low-Signal fMRI. IEEE TRANSACTIONS ON MEDICAL IMAGING 2019; 38:2821-2828. [PMID: 31071023 DOI: 10.1109/tmi.2019.2915052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Functional magnetic resonance imaging is a noninvasive tool for studying cerebral function. Many factors challenge activation detection, especially in low-signal scenarios that arise in the performance of high-level cognitive tasks. We provide a fully automated fast adaptive smoothing and thresholding (FAST) algorithm that uses smoothing and extreme value theory on correlated statistical parametric maps for thresholding. Performance on experiments spanning a range of low-signal settings is very encouraging. The methodology also performs well in a study to identify the cerebral regions that perceive only-auditory-reliable or only-visual-reliable speech stimuli.
Collapse
|
29
|
Pigdon L, Willmott C, Reilly S, Conti-Ramsden G, Gaser C, Connelly A, Morgan AT. Grey matter volume in developmental speech and language disorder. Brain Struct Funct 2019; 224:3387-3398. [PMID: 31732792 DOI: 10.1007/s00429-019-01978-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 11/04/2019] [Indexed: 01/15/2023]
Abstract
Developmental language disorder (DLD) and developmental speech disorder (DSD) are common, yet their etiologies are not well understood. Atypical volume of the inferior and posterior language regions and striatum have been reported in DLD; however, variability in both methodology and study findings limits interpretations. Imaging research within DSD, on the other hand, is scarce. The present study compared grey matter volume in children with DLD, DSD, and typically developing speech and language. Compared to typically developing controls, children with DLD had larger volume in the right cerebellum, possibly associated with the procedural learning deficits that have been proposed in DLD. Children with DSD showed larger volume in the left inferior occipital lobe compared to controls, which may indicate a compensatory role of the visual processing regions due to sub-optimal auditory-perceptual processes. Overall, these findings suggest that different neural systems may be involved in the specific deficits related to DLD and DSD.
Collapse
Affiliation(s)
- Lauren Pigdon
- Murdoch Children's Research Institute, 50 Flemington Rd, Parkville, VIC, 3052, Australia.,Turner Institute for Brain and Mental Health, Monash University, 18 Innovation Walk, Clayton, VIC, 3800, Australia
| | - Catherine Willmott
- Turner Institute for Brain and Mental Health, Monash University, 18 Innovation Walk, Clayton, VIC, 3800, Australia.,Monash-Epworth Rehabilitation Research Centre, Monash University, 18 Innovation Walk, Clayton, VIC, 3800, Australia
| | - Sheena Reilly
- Murdoch Children's Research Institute, 50 Flemington Rd, Parkville, VIC, 3052, Australia.,Menzies Health Institute Queensland, Griffith University, G40 Level 8.86, Mount Gravatt, QLD, 4222, Australia
| | - Gina Conti-Ramsden
- Murdoch Children's Research Institute, 50 Flemington Rd, Parkville, VIC, 3052, Australia.,The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Christian Gaser
- Jena University Hospital, Am Klinikum 1, 07747, Jena, Germany
| | - Alan Connelly
- Florey Institute of Neuroscience and Mental Health, 245 Burgundy Street, Heidelberg, VIC, 3084, Australia.,University of Melbourne, Grattan Street, Parkville, VIC, 3010, Australia
| | - Angela T Morgan
- Murdoch Children's Research Institute, 50 Flemington Rd, Parkville, VIC, 3052, Australia. .,University of Melbourne, Grattan Street, Parkville, VIC, 3010, Australia. .,Royal Children's Hospital, 50 Flemington Rd, Parkville, VIC, 3052, Australia.
| |
Collapse
|
30
|
Li Y, Wang F, Chen Y, Cichocki A, Sejnowski T. The Effects of Audiovisual Inputs on Solving the Cocktail Party Problem in the Human Brain: An fMRI Study. Cereb Cortex 2019; 28:3623-3637. [PMID: 29029039 DOI: 10.1093/cercor/bhx235] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Indexed: 11/13/2022] Open
Abstract
At cocktail parties, our brains often simultaneously receive visual and auditory information. Although the cocktail party problem has been widely investigated under auditory-only settings, the effects of audiovisual inputs have not. This study explored the effects of audiovisual inputs in a simulated cocktail party. In our fMRI experiment, each congruent audiovisual stimulus was a synthesis of 2 facial movie clips, each of which could be classified into 1 of 2 emotion categories (crying and laughing). Visual-only (faces) and auditory-only stimuli (voices) were created by extracting the visual and auditory contents from the synthesized audiovisual stimuli. Subjects were instructed to selectively attend to 1 of the 2 objects contained in each stimulus and to judge its emotion category in the visual-only, auditory-only, and audiovisual conditions. The neural representations of the emotion features were assessed by calculating decoding accuracy and brain pattern-related reproducibility index based on the fMRI data. We compared the audiovisual condition with the visual-only and auditory-only conditions and found that audiovisual inputs enhanced the neural representations of emotion features of the attended objects instead of the unattended objects. This enhancement might partially explain the benefits of audiovisual inputs for the brain to solve the cocktail party problem.
Collapse
Affiliation(s)
- Yuanqing Li
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Fangyi Wang
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Yongbin Chen
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Andrzej Cichocki
- Riken Brain Science Institute, Wako shi, Japan.,Skolkovo Institute of Science and Technology (SKOTECH), Moscow, Russia
| | - Terrence Sejnowski
- Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| |
Collapse
|
31
|
Tietze FA, Hundertmark L, Roy M, Zerr M, Sinke C, Wiswede D, Walter M, Münte TF, Szycik GR. Auditory Deficits in Audiovisual Speech Perception in Adult Asperger's Syndrome: fMRI Study. Front Psychol 2019; 10:2286. [PMID: 31649597 PMCID: PMC6795762 DOI: 10.3389/fpsyg.2019.02286] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 09/24/2019] [Indexed: 01/23/2023] Open
Abstract
Audiovisual (AV) integration deficits have been proposed to underlie difficulties in speech perception in Asperger’s syndrome (AS). It is not known, if the AV deficits are related to alterations in sensory processing at the level of unisensory processing or at levels of conjoint multisensory processing. Functional Magnetic-resonance images (MRI) was performed in 16 adult subjects with AS and 16 healthy controls (HC) matched for age, gender, and verbal IQ as they were exposed to disyllabic AV congruent and AV incongruent nouns. A simple semantic categorization task was used to ensure subjects’ attention to the stimuli. The left auditory cortex (BA41) showed stronger activation in HC than in subjects with AS with no interaction regarding AV congruency. This suggests that alterations in auditory processing in unimodal low-level areas underlie AV speech perception deficits in AS. Whether this is signaling a difficulty in the deployment of attention remains to be demonstrated.
Collapse
Affiliation(s)
- Fabian-Alexander Tietze
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hanover, Germany
| | - Laura Hundertmark
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hanover, Germany
| | - Mandy Roy
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hanover, Germany.,Asklepios Clinic North - Ochsenzoll, Hamburg, Germany
| | - Michael Zerr
- Department of Psychosomatic Medicine and Psychotherapy, Hannover Medical School, Hanover, Germany
| | - Christopher Sinke
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hanover, Germany
| | - Daniel Wiswede
- Institute of Psychology II, University of Lübeck, Lübeck, Germany.,Department of Neurology, University of Lübeck, Lübeck, Germany
| | - Martin Walter
- Department of Psychiatry and Psychotherapy, University of Tübingen, Tübingen, Germany
| | - Thomas F Münte
- Institute of Psychology II, University of Lübeck, Lübeck, Germany.,Department of Neurology, University of Lübeck, Lübeck, Germany
| | - Gregor R Szycik
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hanover, Germany
| |
Collapse
|
32
|
O'Sullivan AE, Lim CY, Lalor EC. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations. Eur J Neurosci 2019; 50:3282-3295. [PMID: 31013361 DOI: 10.1111/ejn.14425] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 03/25/2019] [Accepted: 04/17/2019] [Indexed: 11/30/2022]
Abstract
Recent work using electroencephalography has applied stimulus reconstruction techniques to identify the attended speaker in a cocktail party environment. The success of these approaches has been primarily based on the ability to detect cortical tracking of the acoustic envelope at the scalp level. However, most studies have ignored the effects of visual input, which is almost always present in naturalistic scenarios. In this study, we investigated the effects of visual input on envelope-based cocktail party decoding in two multisensory cocktail party situations: (a) Congruent AV-facing the attended speaker while ignoring another speaker represented by the audio-only stream and (b) Incongruent AV (eavesdropping)-attending the audio-only speaker while looking at the unattended speaker. We trained and tested decoders for each condition separately and found that we can successfully decode attention to congruent audiovisual speech and can also decode attention when listeners were eavesdropping, i.e., looking at the face of the unattended talker. In addition to this, we found alpha power to be a reliable measure of attention to the visual speech. Using parieto-occipital alpha power, we found that we can distinguish whether subjects are attending or ignoring the speaker's face. Considering the practical applications of these methods, we demonstrate that with only six near-ear electrodes we can successfully determine the attended speech. This work extends the current framework for decoding attention to speech to more naturalistic scenarios, and in doing so provides additional neural measures which may be incorporated to improve decoding accuracy.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Chantelle Y Lim
- Department of Biomedical Engineering, University of Rochester, Rochester, New York
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.,Department of Biomedical Engineering, University of Rochester, Rochester, New York.,Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York
| |
Collapse
|
33
|
Van der Stoep N, Van der Stigchel S, Van Engelen RC, Biesbroek JM, Nijboer TCW. Impairments in Multisensory Integration after Stroke. J Cogn Neurosci 2019; 31:885-899. [PMID: 30883294 DOI: 10.1162/jocn_a_01389] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The integration of information from multiple senses leads to a plethora of behavioral benefits, most predominantly to faster and better detection, localization, and identification of events in the environment. Although previous studies of multisensory integration (MSI) in humans have provided insights into the neural underpinnings of MSI, studies of MSI at a behavioral level in individuals with brain damage are scarce. Here, a well-known psychophysical paradigm (the redundant target paradigm) was employed to quantify MSI in a group of stroke patients. The relation between MSI and lesion location was analyzed using lesion subtraction analysis. Twenty-one patients with ischemic infarctions and 14 healthy control participants responded to auditory, visual, and audiovisual targets in the left and right visual hemifield. Responses to audiovisual targets were faster than to unisensory targets. This could be due to MSI or statistical facilitation. Comparing the audiovisual RTs to the winner of a race between unisensory signals allowed us to determine whether participants could integrate auditory and visual information. The results indicated that (1) 33% of the patients showed an impairment in MSI; (2) patients with MSI impairment had left hemisphere and brainstem/cerebellar lesions; and (3) the left caudate, left pallidum, left putamen, left thalamus, left insula, left postcentral and precentral gyrus, left central opercular cortex, left amygdala, and left OFC were more often damaged in patients with MSI impairments. These results are the first to demonstrate the impact of brain damage on MSI in stroke patients using a well-established psychophysical paradigm.
Collapse
Affiliation(s)
| | | | | | | | - Tanja C W Nijboer
- Helmholtz Institute, Utrecht University.,Brain Center Rudolph Magnus, University Medical Center, Utrecht University.,Center for Brain Rehabilitation Medicine, Utrecht Medical Center, Utrecht University
| |
Collapse
|
34
|
Elkhetali AS, Fleming LL, Vaden RJ, Nenert R, Mendle JE, Visscher KM. Background connectivity between frontal and sensory cortex depends on task state, independent of stimulus modality. Neuroimage 2018; 184:790-800. [PMID: 30237034 DOI: 10.1016/j.neuroimage.2018.09.040] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 09/13/2018] [Accepted: 09/15/2018] [Indexed: 10/28/2022] Open
Abstract
The human brain has the ability to process identical information differently depending on the task. In order to perform a given task, the brain must select and react to the appropriate stimuli while ignoring other irrelevant stimuli. The dynamic nature of environmental stimuli and behavioral intentions requires an equally dynamic set of responses within the brain. Collectively, these responses act to set up and maintain states needed to perform a given task. However, the mechanisms that allow for setting up and maintaining a task state are not fully understood. Prior evidence suggests that one possible mechanism for maintaining a task state may be through altering 'background connectivity,' connectivity that exists independently of the trials of a task. Although previous studies have suggested that background connectivity contributes to a task state, these studies have typically not controlled for stimulus characteristics, or have focused primarily on relationships among areas involved with visual sensory processing. In the present study we examined background connectivity during tasks involving both visual and auditory stimuli. We examined the connectivity profiles of both visual and auditory sensory cortex that allow for selection of task-relevant stimuli, demonstrating the existence of a potentially universal pattern of background connectivity underlying attention to a stimulus. Participants were presented with simultaneous auditory and visual stimuli and were instructed to respond to only one, while ignoring the other. Using functional MRI, we observed task-based modulation of the background connectivity profile for both the auditory and visual cortex to certain brain regions. There was an increase in background connectivity between the task-relevant sensory cortex and control areas in the frontal cortex. This increase in synchrony when receiving the task-relevant stimulus as compared to the task irrelevant stimulus may be maintaining paths for passing information within the cortex. These task-based modulations of connectivity occur independently of stimuli and could be one way the brain sets up and maintains a task state.
Collapse
Affiliation(s)
- Abdurahman S Elkhetali
- University of Utah School of Medicine Department of Neurology, Salt Lake City, UT, 84132, USA
| | - Leland L Fleming
- University of Alabama at Birmingham School of Medicine Department of Neurobiology, Birmingham, AL, 35294, USA
| | - Ryan J Vaden
- University of Alabama at Birmingham School of Medicine Department of Neurobiology, Birmingham, AL, 35294, USA
| | - Rodolphe Nenert
- Department of Neurology, University of Alabama at Birmingham School of Medicine, Birmingham, AL, 35294, USA
| | - Jane E Mendle
- Department of Human Development, Cornell University, Ithaca, NY, 14853, USA
| | - Kristina M Visscher
- University of Alabama at Birmingham School of Medicine Department of Neurobiology, Birmingham, AL, 35294, USA.
| |
Collapse
|
35
|
Neural Prediction Errors Distinguish Perception and Misperception of Speech. J Neurosci 2018; 38:6076-6089. [PMID: 29891730 DOI: 10.1523/jneurosci.3258-17.2018] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/08/2018] [Accepted: 03/28/2018] [Indexed: 11/21/2022] Open
Abstract
Humans use prior expectations to improve perception, especially of sensory signals that are degraded or ambiguous. However, if sensory input deviates from prior expectations, then correct perception depends on adjusting or rejecting prior expectations. Failure to adjust or reject the prior leads to perceptual illusions, especially if there is partial overlap (and thus partial mismatch) between expectations and input. With speech, "slips of the ear" occur when expectations lead to misperception. For instance, an entomologist might be more susceptible to hear "The ants are my friends" for "The answer, my friend" (in the Bob Dylan song Blowing in the Wind). Here, we contrast two mechanisms by which prior expectations may lead to misperception of degraded speech. First, clear representations of the common sounds in the prior and input (i.e., expected sounds) may lead to incorrect confirmation of the prior. Second, insufficient representations of sounds that deviate between prior and input (i.e., prediction errors) could lead to deception. We used crossmodal predictions from written words that partially match degraded speech to compare neural responses when male and female human listeners were deceived into accepting the prior or correctly reject it. Combined behavioral and multivariate representational similarity analysis of fMRI data show that veridical perception of degraded speech is signaled by representations of prediction error in the left superior temporal sulcus. Instead of using top-down processes to support perception of expected sensory input, our findings suggest that the strength of neural prediction error representations distinguishes correct perception and misperception.SIGNIFICANCE STATEMENT Misperceiving spoken words is an everyday experience, with outcomes that range from shared amusement to serious miscommunication. For hearing-impaired individuals, frequent misperception can lead to social withdrawal and isolation, with severe consequences for wellbeing. In this work, we specify the neural mechanisms by which prior expectations, which are so often helpful for perception, can lead to misperception of degraded sensory signals. Most descriptive theories of illusory perception explain misperception as arising from a clear sensory representation of features or sounds that are in common between prior expectations and sensory input. Our work instead provides support for a complementary proposal: that misperception occurs when there is an insufficient sensory representations of the deviation between expectations and sensory signals.
Collapse
|
36
|
Bernstein LE. Response Errors in Females' and Males' Sentence Lipreading Necessitate Structurally Different Models for Predicting Lipreading Accuracy. LANGUAGE LEARNING 2018; 68:127-158. [PMID: 31485084 PMCID: PMC6724546 DOI: 10.1111/lang.12281] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Lipreaders recognize words with phonetically impoverished stimuli, an ability that is generally poor in normal-hearing adults. Individual sentence lipreading trials from 341 young adults were modeled to predict words and phonemes correct in terms of measures of phoneme response dissimilarity (PRD), number of inserted incorrect response phonemes, lipreader gender, and a measure of speech perception in noise. Interactions with lipreaders' gender necessitated structurally different models of males' and females' lipreading. Overall, female lipreaders are more accurate, their ability to recognize words with impoverished or degraded input is consistent across visual and auditory modalities, and they amplify their correct responding through top-down insertion of text. Males' responses suggest that individuals with poorer auditory speech perception in noise amplify their responses by shifting towards including text in their response that is more perceptually discrepant from the stimulus. Attention to gender differences merits attention in future studies that use visual speech stimuli.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Department of Speech, Language, and Hearing Science, George Washington University, 2121 I St NW, Washington, DC 20052
| |
Collapse
|
37
|
Neural networks supporting audiovisual integration for speech: A large-scale lesion study. Cortex 2018; 103:360-371. [PMID: 29705718 DOI: 10.1016/j.cortex.2018.03.030] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/05/2017] [Accepted: 03/23/2018] [Indexed: 10/17/2022]
Abstract
Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration.
Collapse
|
38
|
Abstract
Behaviorally, it is well established that human observers integrate signals near-optimally weighted in proportion to their reliabilities as predicted by maximum likelihood estimation. Yet, despite abundant behavioral evidence, it is unclear how the human brain accomplishes this feat. In a spatial ventriloquist paradigm, participants were presented with auditory, visual, and audiovisual signals and reported the location of the auditory or the visual signal. Combining psychophysics, multivariate functional MRI (fMRI) decoding, and models of maximum likelihood estimation (MLE), we characterized the computational operations underlying audiovisual integration at distinct cortical levels. We estimated observers' behavioral weights by fitting psychometric functions to participants' localization responses. Likewise, we estimated the neural weights by fitting neurometric functions to spatial locations decoded from regional fMRI activation patterns. Our results demonstrate that low-level auditory and visual areas encode predominantly the spatial location of the signal component of a region's preferred auditory (or visual) modality. By contrast, intraparietal sulcus forms spatial representations by integrating auditory and visual signals weighted by their reliabilities. Critically, the neural and behavioral weights and the variance of the spatial representations depended not only on the sensory reliabilities as predicted by the MLE model but also on participants' modality-specific attention and report (i.e., visual vs. auditory). These results suggest that audiovisual integration is not exclusively determined by bottom-up sensory reliabilities. Instead, modality-specific attention and report can flexibly modulate how intraparietal sulcus integrates sensory signals into spatial representations to guide behavioral responses (e.g., localization and orienting).
Collapse
|
39
|
Individual differences and the effect of face configuration information in the McGurk effect. Exp Brain Res 2018; 236:973-984. [PMID: 29383400 DOI: 10.1007/s00221-018-5188-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 01/23/2018] [Indexed: 10/18/2022]
Abstract
The McGurk effect, which denotes the influence of visual information on audiovisual speech perception, is less frequently observed in individuals with autism spectrum disorder (ASD) compared to those without it; the reason for this remains unclear. Several studies have suggested that facial configuration context might play a role in this difference. More specifically, people with ASD show a local processing bias for faces-that is, they process global face information to a lesser extent. This study examined the role of facial configuration context in the McGurk effect in 46 healthy students. Adopting an analogue approach using the Autism-Spectrum Quotient (AQ), we sought to determine whether this facial configuration context is crucial to previously observed reductions in the McGurk effect in people with ASD. Lip-reading and audiovisual syllable identification tasks were assessed via presentation of upright normal, inverted normal, upright Thatcher-type, and inverted Thatcher-type faces. When the Thatcher-type face was presented, perceivers were found to be sensitive to the misoriented facial characteristics, causing them to perceive a weaker McGurk effect than when the normal face was presented (this is known as the McThatcher effect). Additionally, the McGurk effect was weaker in individuals with high AQ scores than in those with low AQ scores in the incongruent audiovisual condition, regardless of their ability to read lips or process facial configuration contexts. Our findings, therefore, do not support the assumption that individuals with ASD show a weaker McGurk effect due to a difficulty in processing facial configuration context.
Collapse
|
40
|
Kumar GV, Kumar N, Roy D, Banerjee A. Segregation and Integration of Cortical Information Processing Underlying Cross-Modal Perception. Multisens Res 2018; 31:481-500. [DOI: 10.1163/22134808-00002574] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Accepted: 04/17/2017] [Indexed: 11/19/2022]
Abstract
Visual cues from the speaker’s face influence the perception of speech. An example of this influence is demonstrated by the McGurk-effect where illusory (cross-modal) sounds are perceived following presentation of incongruent audio–visual (AV) stimuli. Previous studies report the engagement of specific cortical modules that are spatially distributed during cross-modal perception. However, the limits of the underlying representational space and the cortical network mechanisms remain unclear. In this combined psychophysical and electroencephalography (EEG) study, the participants reported their perception while listening to a set of synchronous and asynchronous incongruent AV stimuli. We identified the neural representation of subjective cross-modal perception at different organizational levels — at specific locations in sensor space and at the level of the large-scale brain network estimated from between-sensor interactions. We identified an enhanced positivity in the event-related potential peak around 300 ms following stimulus onset associated with cross-modal perception. At the spectral level, cross-modal perception involved an overall decrease in power at the frontal and temporal regions at multiple frequency bands and at all AV lags, along with an increased power at the occipital scalp region for synchronous AV stimuli. At the level of large-scale neuronal networks, enhanced functional connectivity at the gamma band involving frontal regions serves as a marker of AV integration. Thus, we report in one single study that segregation of information processing at individual brain locations and integration of information over candidate brain networks underlie multisensory speech perception.
Collapse
Affiliation(s)
- G. Vinodh Kumar
- Cognitive Brain Lab, National Brain Research Centre, NH 8, Manesar, Gurgaon 122051, India
| | - Neeraj Kumar
- Cognitive Brain Lab, National Brain Research Centre, NH 8, Manesar, Gurgaon 122051, India
| | - Dipanjan Roy
- Centre of Behavioural and Cognitive Sciences, University of Allahabad, Allahabad 211002, India
| | - Arpan Banerjee
- Cognitive Brain Lab, National Brain Research Centre, NH 8, Manesar, Gurgaon 122051, India
| |
Collapse
|
41
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
42
|
Starke J, Ball F, Heinze HJ, Noesselt T. The spatio-temporal profile of multisensory integration. Eur J Neurosci 2017; 51:1210-1223. [PMID: 29057531 DOI: 10.1111/ejn.13753] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 10/13/2017] [Accepted: 10/16/2017] [Indexed: 12/29/2022]
Abstract
Task-irrelevant visual stimuli can enhance auditory perception. However, while there is some neurophysiological evidence for mechanisms that underlie the phenomenon, the neural basis of visually induced effects on auditory perception remains unknown. Combining fMRI and EEG with psychophysical measurements in two independent studies, we identified the neural underpinnings and temporal dynamics of visually induced auditory enhancement. Lower- and higher-intensity sounds were paired with a non-informative visual stimulus, while participants performed an auditory detection task. Behaviourally, visual co-stimulation enhanced auditory sensitivity. Using fMRI, enhanced BOLD signals were observed in primary auditory cortex for low-intensity audiovisual stimuli which scaled with subject-specific enhancement in perceptual sensitivity. Concordantly, a modulation of event-related potentials could already be observed over frontal electrodes at an early latency (30-80 ms), which again scaled with subject-specific behavioural benefits. Later modulations starting around 280 ms, that is in the time range of the P3, did not fit this pattern of brain-behaviour correspondence. Hence, the latency of the corresponding fMRI-EEG brain-behaviour modulation points at an early interplay of visual and auditory signals in low-level auditory cortex, potentially mediated by crosstalk at the level of the thalamus. However, fMRI signals in primary auditory cortex, auditory thalamus and the P50 for higher-intensity auditory stimuli were also elevated by visual co-stimulation (in the absence of any behavioural effect) suggesting a general, intensity-independent integration mechanism. We propose that this automatic interaction occurs at the level of the thalamus and might signify a first step of audiovisual interplay necessary for visually induced perceptual enhancement of auditory perception.
Collapse
Affiliation(s)
- Johanna Starke
- Department of Biological Psychology, Faculty of Natural Science, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.,Department of Neurology, Faculty of Medicine, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
| | - Felix Ball
- Department of Biological Psychology, Faculty of Natural Science, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.,Department of Neurology, Faculty of Medicine, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.,Center for Behavioural Brain Sciences, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
| | - Hans-Jochen Heinze
- Department of Neurology, Faculty of Medicine, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.,Center for Behavioural Brain Sciences, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
| | - Toemme Noesselt
- Department of Biological Psychology, Faculty of Natural Science, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.,Center for Behavioural Brain Sciences, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
| |
Collapse
|
43
|
Festa EK, Katz AP, Ott BR, Tremont G, Heindel WC. Dissociable Effects of Aging and Mild Cognitive Impairment on Bottom-Up Audiovisual Integration. J Alzheimers Dis 2017; 59:155-167. [DOI: 10.3233/jad-161062] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Elena K. Festa
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA
| | - Andrew P. Katz
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA
| | - Brian R. Ott
- Department of Neurology, Alpert Medical School, Brown University, Providence, RI, USA
- Department of Neurology, Rhode Island Hospital, Providence, RI, USA
| | - Geoffrey Tremont
- Department of Psychiatry and Human Behavior, Alpert Medical School, Brown University, Providence, RI, USA
- Department of Psychiatry, Rhode Island Hospital, Providence, RI, USA
| | - William C. Heindel
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, Providence, RI, USA
| |
Collapse
|
44
|
Stevenson YA, Baum SH, Segers M, Ferber S, Barense MD, Wallace MT. Multisensory speech perception in autism spectrum disorder: From phoneme to whole-word perception. Autism Res 2017; 10:1280-1290. [PMID: 28339177 PMCID: PMC5513806 DOI: 10.1002/aur.1776] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 01/19/2017] [Accepted: 02/06/2017] [Indexed: 11/11/2022]
Abstract
Speech perception in noisy environments is boosted when a listener can see the speaker's mouth and integrate the auditory and visual speech information. Autistic children have a diminished capacity to integrate sensory information across modalities, which contributes to core symptoms of autism, such as impairments in social communication. We investigated the abilities of autistic and typically-developing (TD) children to integrate auditory and visual speech stimuli in various signal-to-noise ratios (SNR). Measurements of both whole-word and phoneme recognition were recorded. At the level of whole-word recognition, autistic children exhibited reduced performance in both the auditory and audiovisual modalities. Importantly, autistic children showed reduced behavioral benefit from multisensory integration with whole-word recognition, specifically at low SNRs. At the level of phoneme recognition, autistic children exhibited reduced performance relative to their TD peers in auditory, visual, and audiovisual modalities. However, and in contrast to their performance at the level of whole-word recognition, both autistic and TD children showed benefits from multisensory integration for phoneme recognition. In accordance with the principle of inverse effectiveness, both groups exhibited greater benefit at low SNRs relative to high SNRs. Thus, while autistic children showed typical multisensory benefits during phoneme recognition, these benefits did not translate to typical multisensory benefit of whole-word recognition in noisy environments. We hypothesize that sensory impairments in autistic children raise the SNR threshold needed to extract meaningful information from a given sensory input, resulting in subsequent failure to exhibit behavioral benefits from additional sensory information at the level of whole-word recognition. Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. Autism Res 2017, 10: 1280-1290. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- yan A. Stevenson
- Department of Psychology, Western University, London, ON, Canada
- Brain and Mind Institute, Western University, London, ON, Canada
| | - Sarah H. Baum
- Department of Psychology, University of Washington, Seattle, WA, USA
| | | | - Susanne Ferber
- Dept. of Psychology, University of Toronto, Toronto, ON, Canada
- Rotman Research Institute, Toronto, ON, Canada
| | - Morgan D. Barense
- Dept. of Psychology, University of Toronto, Toronto, ON, Canada
- Rotman Research Institute, Toronto, ON, Canada
| | - Mark T. Wallace
- Vanderbilt Brain Institute, Nashville, TN, USA
- Vanderbilt Kennedy Center, Nashville, TN, USA
- Vanderbilt University, Nashville, TN, USA
- Dept. of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
- Dept. of Psychology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
45
|
Giordano BL, Ince RAA, Gross J, Schyns PG, Panzeri S, Kayser C. Contributions of local speech encoding and functional connectivity to audio-visual speech perception. eLife 2017; 6. [PMID: 28590903 PMCID: PMC5462535 DOI: 10.7554/elife.24763] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/07/2017] [Indexed: 11/13/2022] Open
Abstract
Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI:http://dx.doi.org/10.7554/eLife.24763.001 When listening to someone in a noisy environment, such as a cocktail party, we can understand the speaker more easily if we can also see his or her face. Movements of the lips and tongue convey additional information that helps the listener’s brain separate out syllables, words and sentences. However, exactly where in the brain this effect occurs and how it works remain unclear. To find out, Giordano et al. scanned the brains of healthy volunteers as they watched clips of people speaking. The clarity of the speech varied between clips. Furthermore, in some of the clips the lip movements of the speaker corresponded to the speech in question, whereas in others the lip movements were nonsense babble. As expected, the volunteers performed better on a word recognition task when the speech was clear and when the lips movements agreed with the spoken dialogue. Watching the video clips stimulated rhythmic activity in multiple regions of the volunteers’ brains, including areas that process sound and areas that plan movements. Speech is itself rhythmic, and the volunteers’ brain activity synchronized with the rhythms of the speech they were listening to. Seeing the speaker’s face increased this degree of synchrony. However, it also made it easier for sound-processing regions within the listeners’ brains to transfer information to one other. Notably, only the latter effect predicted improved performance on the word recognition task. This suggests that seeing a person’s face makes it easier to understand his or her speech by boosting communication between brain regions, rather than through effects on individual areas. Further work is required to determine where and how the brain encodes lip movements and speech sounds. The next challenge will be to identify where these two sets of information interact, and how the brain merges them together to generate the impression of specific words. DOI:http://dx.doi.org/10.7554/eLife.24763.002
Collapse
Affiliation(s)
- Bruno L Giordano
- Institut de Neurosciences de la Timone UMR 7289, Aix Marseille Université - Centre National de la Recherche Scientifique, Marseille, France.,Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Philippe G Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Stefano Panzeri
- Neural Computation Laboratory, Center for Neuroscience and Cognitive Systems, Istituto Italiano di Tecnologia, Rovereto, Italy
| | - Christoph Kayser
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
46
|
Sight and sound persistently out of synch: stable individual differences in audiovisual synchronisation revealed by implicit measures of lip-voice integration. Sci Rep 2017; 7:46413. [PMID: 28429784 PMCID: PMC5399466 DOI: 10.1038/srep46413] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 03/17/2017] [Indexed: 11/08/2022] Open
Abstract
Are sight and sound out of synch? Signs that they are have been dismissed for over two centuries as an artefact of attentional and response bias, to which traditional subjective methods are prone. To avoid such biases, we measured performance on objective tasks that depend implicitly on achieving good lip-synch. We measured the McGurk effect (in which incongruent lip-voice pairs evoke illusory phonemes), and also identification of degraded speech, while manipulating audiovisual asynchrony. Peak performance was found at an average auditory lag of ~100 ms, but this varied widely between individuals. Participants’ individual optimal asynchronies showed trait-like stability when the same task was re-tested one week later, but measures based on different tasks did not correlate. This discounts the possible influence of common biasing factors, suggesting instead that our different tasks probe different brain networks, each subject to their own intrinsic auditory and visual processing latencies. Our findings call for renewed interest in the biological causes and cognitive consequences of individual sensory asynchronies, leading potentially to fresh insights into the neural representation of sensory timing. A concrete implication is that speech comprehension might be enhanced, by first measuring each individual’s optimal asynchrony and then applying a compensatory auditory delay.
Collapse
|
47
|
Venezia JH, Vaden KI, Rong F, Maddox D, Saberi K, Hickok G. Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus. Front Hum Neurosci 2017; 11:174. [PMID: 28439236 PMCID: PMC5383672 DOI: 10.3389/fnhum.2017.00174] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Accepted: 03/24/2017] [Indexed: 11/30/2022] Open
Abstract
The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.
Collapse
Affiliation(s)
| | - Kenneth I Vaden
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South CarolinaCharleston, SC, USA
| | - Feng Rong
- Department of Cognitive Sciences, Center for Cognitive Neuroscience and Engineering, University of CaliforniaIrvine, CA, USA
| | - Dale Maddox
- Department of Cognitive Sciences, Center for Cognitive Neuroscience and Engineering, University of CaliforniaIrvine, CA, USA
| | - Kourosh Saberi
- Department of Cognitive Sciences, Center for Cognitive Neuroscience and Engineering, University of CaliforniaIrvine, CA, USA
| | - Gregory Hickok
- Department of Cognitive Sciences, Center for Cognitive Neuroscience and Engineering, University of CaliforniaIrvine, CA, USA
| |
Collapse
|
48
|
Audio-visual speech perception in adult readers with dyslexia: an fMRI study. Brain Imaging Behav 2017; 12:357-368. [DOI: 10.1007/s11682-017-9694-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
49
|
Ozker M, Schepers IM, Magnotti JF, Yoshor D, Beauchamp MS. A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography. J Cogn Neurosci 2017; 29:1044-1060. [PMID: 28253074 DOI: 10.1162/jocn_a_01110] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.
Collapse
Affiliation(s)
- Muge Ozker
- 1 University of Texas Graduate School of Biomedical Sciences at Houston.,2 Baylor College of Medicine
| | | | | | | | | |
Collapse
|
50
|
Yu L, Rao A, Zhang Y, Burton PC, Rishiq D, Abrams H. Neuromodulatory Effects of Auditory Training and Hearing Aid Use on Audiovisual Speech Perception in Elderly Individuals. Front Aging Neurosci 2017; 9:30. [PMID: 28270763 PMCID: PMC5318380 DOI: 10.3389/fnagi.2017.00030] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 02/06/2017] [Indexed: 11/18/2022] Open
Abstract
Although audiovisual (AV) training has been shown to improve overall speech perception in hearing-impaired listeners, there has been a lack of direct brain imaging data to help elucidate the neural networks and neural plasticity associated with hearing aid (HA) use and auditory training targeting speechreading. For this purpose, the current clinical case study reports functional magnetic resonance imaging (fMRI) data from two hearing-impaired patients who were first-time HA users. During the study period, both patients used HAs for 8 weeks; only one received a training program named ReadMyQuipsTM (RMQ) targeting speechreading during the second half of the study period for 4 weeks. Identical fMRI tests were administered at pre-fitting and at the end of the 8 weeks. Regions of interest (ROI) including auditory cortex and visual cortex for uni-sensory processing, and superior temporal sulcus (STS) for AV integration, were identified for each person through independent functional localizer task. The results showed experience-dependent changes involving ROIs of auditory cortex, STS and functional connectivity between uni-sensory ROIs and STS from pretest to posttest in both cases. These data provide initial evidence for the malleable experience-driven cortical functionality for AV speech perception in elderly hearing-impaired people and call for further studies with a much larger subject sample and systematic control to fill in the knowledge gap to understand brain plasticity associated with auditory rehabilitation in the aging population.
Collapse
Affiliation(s)
- Luodi Yu
- Department of Speech-Language-Hearing Sciences and Center for Neurobehavioral Development, University of Minnesota Minneapolis, MN, USA
| | - Aparna Rao
- Department of Speech and Hearing Sciences, Arizona State University Tempe, AZ, USA
| | - Yang Zhang
- Department of Speech-Language-Hearing Sciences and Center for Neurobehavioral Development, University of Minnesota Minneapolis, MN, USA
| | - Philip C Burton
- Office of the Associate Dean for Research, College of Liberal Arts, University of Minnesota Minneapolis, MN, USA
| | - Dania Rishiq
- Department of Speech Pathology and Audiology, University of South Alabama Mobile, AL, USA
| | - Harvey Abrams
- Department of Speech Pathology and Audiology, University of South Alabama Mobile, AL, USA
| |
Collapse
|