1
|
Lin T, Rana M, Liu P, Polk R, Heemskerk A, Weisberg SM, Bowers D, Sitaram R, Ebner NC. Real-Time fMRI Neurofeedback Training of Selective Attention in Older Adults. Brain Sci 2024; 14:931. [PMID: 39335425 DOI: 10.3390/brainsci14090931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 08/29/2024] [Accepted: 09/12/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND Selective attention declines with age, due to age-related functional changes in dorsal anterior cingulate cortex (dACC). Real-time functional magnetic resonance imaging (rtfMRI) neurofeedback has been used in young adults to train volitional control of brain activity, including in dACC. METHODS For the first time, this study used rtfMRI neurofeedback to train 19 young and 27 older adults in volitional up- or down-regulation of bilateral dACC during a selective attention task. RESULTS Older participants in the up-regulation condition (experimental group) showed greater reward points and dACC BOLD signal across training sessions, reflective of neurofeedback training success; and faster reaction time and better response accuracy, suggesting behavioral benefits on selective attention. These effects were not observed for older participants in the down-regulation condition (inverse condition control group), supporting specificity of volitional dACC up-regulation training in older adults. These effects were, unexpectedly, also not observed for young participants in the up-regulation condition (age control group), perhaps due to a lack of motivation to continue the training. CONCLUSIONS These findings provide promising first evidence of functional plasticity in dACC in late life via rtfMRI neurofeedback up-regulation training, enhancing selective attention, and demonstrate proof of concept of rtfMRI neurofeedback training in cognitive aging.
Collapse
Affiliation(s)
- Tian Lin
- Department of Psychology, University of Florida, Gainesville, FL 32611, USA
| | - Mohit Rana
- Institute of Biological and Medical Engineering, Department of Psychiatry and Section of Neuroscience, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile
| | - Peiwei Liu
- Department of Psychology, University of Florida, Gainesville, FL 32611, USA
| | - Rebecca Polk
- Department of Psychology, University of Florida, Gainesville, FL 32611, USA
| | - Amber Heemskerk
- Department of Psychology, University of Florida, Gainesville, FL 32611, USA
| | - Steven M Weisberg
- Department of Psychology, University of Florida, Gainesville, FL 32611, USA
| | - Dawn Bowers
- Department of Clinical and Health Psychology, University of Florida, Gainesville, FL 32610, USA
| | | | - Natalie C Ebner
- Department of Psychology, University of Florida, Gainesville, FL 32611, USA
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, FL 32610, USA
| |
Collapse
|
2
|
Mackey CA, O'Connell MN, Hackett TA, Schroeder CE, Kajikawa Y. Laminar organization of visual responses in core and parabelt auditory cortex. Cereb Cortex 2024; 34:bhae373. [PMID: 39300609 DOI: 10.1093/cercor/bhae373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 08/24/2024] [Accepted: 08/29/2024] [Indexed: 09/22/2024] Open
Abstract
Audiovisual (AV) interaction has been shown in many studies of auditory cortex. However, the underlying processes and circuits are unclear because few studies have used methods that delineate the timing and laminar distribution of net excitatory and inhibitory processes within areas, much less across cortical levels. This study examined laminar profiles of neuronal activity in auditory core (AC) and parabelt (PB) cortices recorded from macaques during active discrimination of conspecific faces and vocalizations. We found modulation of multi-unit activity (MUA) in response to isolated visual stimulation, characterized by a brief deep MUA spike, putatively in white matter, followed by mid-layer MUA suppression in core auditory cortex; the later suppressive event had clear current source density concomitants, while the earlier MUA spike did not. We observed a similar facilitation-suppression sequence in the PB, with later onset latency. In combined AV stimulation, there was moderate reduction of responses to sound during the visual-evoked MUA suppression interval in both AC and PB. These data suggest a common sequence of afferent spikes, followed by synaptic inhibition; however, differences in timing and laminar location may reflect distinct visual projections to AC and PB.
Collapse
Affiliation(s)
- Chase A Mackey
- Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute for Psychiatric Research, 140 Old Orangeburg Rd, Orangeburg, NY 10962, United States
| | - Monica N O'Connell
- Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute for Psychiatric Research, 140 Old Orangeburg Rd, Orangeburg, NY 10962, United States
- Department of Psychiatry, New York University School of Medicine, 145 E 32nd St., New York, NY 10016, United States
| | - Troy A Hackett
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1211 Medical Center Dr., Nashville, TN 37212, United States
| | - Charles E Schroeder
- Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute for Psychiatric Research, 140 Old Orangeburg Rd, Orangeburg, NY 10962, United States
- Departments of Psychiatry and Neurology, Columbia University College of Physicians, 630 W 168th St, New York, NY 10032, United States
| | - Yoshinao Kajikawa
- Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute for Psychiatric Research, 140 Old Orangeburg Rd, Orangeburg, NY 10962, United States
- Department of Psychiatry, New York University School of Medicine, 145 E 32nd St., New York, NY 10016, United States
| |
Collapse
|
3
|
Hmamouche Y, Ochs M, Prévot L, Chaminade T. Interpretable prediction of brain activity during conversations from multimodal behavioral signals. PLoS One 2024; 19:e0284342. [PMID: 38512831 PMCID: PMC10956754 DOI: 10.1371/journal.pone.0284342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 03/29/2023] [Indexed: 03/23/2024] Open
Abstract
We present an analytical framework aimed at predicting the local brain activity in uncontrolled experimental conditions based on multimodal recordings of participants' behavior, and its application to a corpus of participants having conversations with another human or a conversational humanoid robot. The framework consists in extracting high-level features from the raw behavioral recordings and applying a dynamic prediction of binarized fMRI-recorded local brain activity using these behavioral features. The objective is to identify behavioral features required for this prediction, and their relative weights, depending on the brain area under investigation and the experimental condition. In order to validate our framework, we use a corpus of uncontrolled conversations of participants with a human or a robotic agent, focusing on brain regions involved in speech processing, and more generally in social interactions. The framework not only predicts local brain activity significantly better than random, it also quantifies the weights of behavioral features required for this prediction, depending on the brain area under investigation and on the nature of the conversational partner. In the left Superior Temporal Sulcus, perceived speech is the most important behavioral feature for predicting brain activity, regardless of the agent, while several features, which differ between the human and robot interlocutors, contribute to the prediction in regions involved in social cognition, such as the TemporoParietal Junction. This framework therefore allows us to study how multiple behavioral signals from different modalities are integrated in individual brain regions during complex social interactions.
Collapse
Affiliation(s)
- Youssef Hmamouche
- International Artificial Intelligence Center of Morocco, University Mohammed VI Polytechnique, Rabat, Morocco
| | - Magalie Ochs
- LIS UMR 7020, CNRS, Aix Marseille Université, Université de Toulon, Marseille, France
| | - Laurent Prévot
- LPL UMR 7309, CNRS, Aix Marseille Université, Marseille, France
| | | |
Collapse
|
4
|
Lee IS, Kang JH, Kim J. Auditory influence on stickiness perception: an fMRI study of multisensory integration. Neuroreport 2024; 35:269-276. [PMID: 38305131 PMCID: PMC10852036 DOI: 10.1097/wnr.0000000000002003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/02/2024] [Indexed: 02/03/2024]
Abstract
This study explored how the human brain perceives stickiness through tactile and auditory channels, especially when presented with congruent or incongruent intensity cues. In our behavioral and functional MRI (fMRI) experiments, we presented participants with adhesive tape stimuli at two different intensities. The congruent condition involved providing stickiness stimuli with matching intensity cues in both auditory and tactile channels, whereas the incongruent condition involved cues of different intensities. Behavioral results showed that participants were able to distinguish between the congruent and incongruent conditions with high accuracy. Through fMRI searchlight analysis, we tested which brain regions could distinguish between congruent and incongruent conditions, and as a result, we identified the superior temporal gyrus, known primarily for auditory processing. Interestingly, we did not observe any significant activation in regions associated with somatosensory or motor functions. This indicates that the brain dedicates more attention to auditory cues than to tactile cues, possibly due to the unfamiliarity of conveying the sensation of stickiness through sound. Our results could provide new perspectives on the complexities of multisensory integration, highlighting the subtle yet significant role of auditory processing in understanding tactile properties such as stickiness.
Collapse
Affiliation(s)
- In-Seon Lee
- College of Korean Medicine, Kyung Hee University, Seoul
| | - Jae-Hwan Kang
- Digital Health Research Division, Korea Institute of Oriental Medicine
- Aging Convergence Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon
| | - Junsuk Kim
- School of Information Convergence, Kwangwoon University, Seoul, South Korea
| |
Collapse
|
5
|
Ross LA, Molholm S, Butler JS, Del Bene VA, Brima T, Foxe JJ. Neural correlates of audiovisual narrative speech perception in children and adults on the autism spectrum: A functional magnetic resonance imaging study. Autism Res 2024; 17:280-310. [PMID: 38334251 DOI: 10.1002/aur.3104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 01/19/2024] [Indexed: 02/10/2024]
Abstract
Autistic individuals show substantially reduced benefit from observing visual articulations during audiovisual speech perception, a multisensory integration deficit that is particularly relevant to social communication. This has mostly been studied using simple syllabic or word-level stimuli and it remains unclear how altered lower-level multisensory integration translates to the processing of more complex natural multisensory stimulus environments in autism. Here, functional neuroimaging was used to examine neural correlates of audiovisual gain (AV-gain) in 41 autistic individuals to those of 41 age-matched non-autistic controls when presented with a complex audiovisual narrative. Participants were presented with continuous narration of a story in auditory-alone, visual-alone, and both synchronous and asynchronous audiovisual speech conditions. We hypothesized that previously identified differences in audiovisual speech processing in autism would be characterized by activation differences in brain regions well known to be associated with audiovisual enhancement in neurotypicals. However, our results did not provide evidence for altered processing of auditory alone, visual alone, audiovisual conditions or AV- gain in regions associated with the respective task when comparing activation patterns between groups. Instead, we found that autistic individuals responded with higher activations in mostly frontal regions where the activation to the experimental conditions was below baseline (de-activations) in the control group. These frontal effects were observed in both unisensory and audiovisual conditions, suggesting that these altered activations were not specific to multisensory processing but reflective of more general mechanisms such as an altered disengagement of Default Mode Network processes during the observation of the language stimulus across conditions.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
- School of Mathematics and Statistics, Technological University Dublin, City Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
- Heersink School of Medicine, Department of Neurology, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Tufikameni Brima
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| |
Collapse
|
6
|
Dorsi J, Lacey S, Sathian K. Multisensory and lexical information in speech perception. Front Hum Neurosci 2024; 17:1331129. [PMID: 38259332 PMCID: PMC10800662 DOI: 10.3389/fnhum.2023.1331129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
Both multisensory and lexical information are known to influence the perception of speech. However, an open question remains: is either source more fundamental to perceiving speech? In this perspective, we review the literature and argue that multisensory information plays a more fundamental role in speech perception than lexical information. Three sets of findings support this conclusion: first, reaction times and electroencephalographic signal latencies indicate that the effects of multisensory information on speech processing seem to occur earlier than the effects of lexical information. Second, non-auditory sensory input influences the perception of features that differentiate phonetic categories; thus, multisensory information determines what lexical information is ultimately processed. Finally, there is evidence that multisensory information helps form some lexical information as part of a phenomenon known as sound symbolism. These findings support a framework of speech perception that, while acknowledging the influential roles of both multisensory and lexical information, holds that multisensory information is more fundamental to the process.
Collapse
Affiliation(s)
- Josh Dorsi
- Department of Neurology, Penn State College of Medicine, Hershey, PA, United States
| | - Simon Lacey
- Department of Neurology, Penn State College of Medicine, Hershey, PA, United States
- Department of Neural and Behavioral Sciences, Penn State College of Medicine, Hershey, PA, United States
- Department of Psychology, Penn State Colleges of Medicine and Liberal Arts, Hershey, PA, United States
| | - K. Sathian
- Department of Neurology, Penn State College of Medicine, Hershey, PA, United States
- Department of Neural and Behavioral Sciences, Penn State College of Medicine, Hershey, PA, United States
- Department of Psychology, Penn State Colleges of Medicine and Liberal Arts, Hershey, PA, United States
| |
Collapse
|
7
|
Fullerton AM, Vickers DA, Luke R, Billing AN, McAlpine D, Hernandez-Perez H, Peelle JE, Monaghan JJM, McMahon CM. Cross-modal functional connectivity supports speech understanding in cochlear implant users. Cereb Cortex 2023; 33:3350-3371. [PMID: 35989307 PMCID: PMC10068270 DOI: 10.1093/cercor/bhac277] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 06/10/2022] [Accepted: 06/11/2022] [Indexed: 11/12/2022] Open
Abstract
Sensory deprivation can lead to cross-modal cortical changes, whereby sensory brain regions deprived of input may be recruited to perform atypical function. Enhanced cross-modal responses to visual stimuli observed in auditory cortex of postlingually deaf cochlear implant (CI) users are hypothesized to reflect increased activation of cortical language regions, but it is unclear if this cross-modal activity is "adaptive" or "mal-adaptive" for speech understanding. To determine if increased activation of language regions is correlated with better speech understanding in CI users, we assessed task-related activation and functional connectivity of auditory and visual cortices to auditory and visual speech and non-speech stimuli in CI users (n = 14) and normal-hearing listeners (n = 17) and used functional near-infrared spectroscopy to measure hemodynamic responses. We used visually presented speech and non-speech to investigate neural processes related to linguistic content and observed that CI users show beneficial cross-modal effects. Specifically, an increase in connectivity between the left auditory and visual cortices-presumed primary sites of cortical language processing-was positively correlated with CI users' abilities to understand speech in background noise. Cross-modal activity in auditory cortex of postlingually deaf CI users may reflect adaptive activity of a distributed, multimodal speech network, recruited to enhance speech understanding.
Collapse
Affiliation(s)
- Amanda M Fullerton
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Deborah A Vickers
- Cambridge Hearing Group, Sound Lab, Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 OSZ, United Kingdom
- Speech, Hearing and Phonetic Sciences, University College London, London WC1N 1PF, United Kingdom
| | - Robert Luke
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Addison N Billing
- Institute of Cognitive Neuroscience, University College London, London WCIN 3AZ, United Kingdom
- DOT-HUB, Department of Medical Physics and Biomedical Engineering, University College London, London WC1E 6BT, United Kingdom
| | - David McAlpine
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Heivet Hernandez-Perez
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, MO 63110, United States
| | - Jessica J M Monaghan
- National Acoustic Laboratories, Australian Hearing Hub, Sydney 2109, Australia
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
| | - Catherine M McMahon
- Department of Linguistics and Macquarie University Hearing, Australian Hearing Hub, Macquarie University, Sydney 2109, Australia
- HEAR Centre, Macquarie University, Sydney 2109, Australia
| |
Collapse
|
8
|
Saalasti S, Alho J, Lahnakoski JM, Bacha-Trams M, Glerean E, Jääskeläinen IP, Hasson U, Sams M. Lipreading a naturalistic narrative in a female population: Neural characteristics shared with listening and reading. Brain Behav 2023; 13:e2869. [PMID: 36579557 PMCID: PMC9927859 DOI: 10.1002/brb3.2869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 12/30/2022] Open
Abstract
INTRODUCTION Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.
Collapse
Affiliation(s)
- Satu Saalasti
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.,Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Advanced Magnetic Imaging (AMI) Centre, Aalto NeuroImaging, School of Science, Aalto University, Espoo, Finland
| | - Jussi Alho
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Juha M Lahnakoski
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Munich, Germany.,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany.,Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mareike Bacha-Trams
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Enrico Glerean
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Iiro P Jääskeläinen
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto Studios - MAGICS, Aalto University, Espoo, Finland
| |
Collapse
|
9
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
10
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|
11
|
Peelle JE, Spehar B, Jones MS, McConkey S, Myerson J, Hale S, Sommers MS, Tye-Murray N. Increased Connectivity among Sensory and Motor Regions during Visual and Audiovisual Speech Perception. J Neurosci 2022; 42:435-442. [PMID: 34815317 PMCID: PMC8802926 DOI: 10.1523/jneurosci.0114-21.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 10/29/2021] [Accepted: 11/08/2021] [Indexed: 11/21/2022] Open
Abstract
In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here, we used fMRI to monitor brain activity while adult humans (n = 60) were presented with visual-only, auditory-only, and audiovisual words. The audiovisual words were presented in quiet and in several signal-to-noise ratios. As expected, audiovisual speech perception recruited both auditory and visual cortex, with some evidence for increased recruitment of premotor cortex in some conditions (including in substantial background noise). We then investigated neural connectivity using psychophysiological interaction analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. In addition to whole-brain analyses, we also conducted a region-of-interest analysis on the left posterior superior temporal sulcus (pSTS), implicated in many previous studies of audiovisual speech perception. We found evidence for both activity and effective connectivity in pSTS for visual-only and audiovisual speech, although these were not significant in whole-brain analyses. Together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech that complements activity in integrative brain regions like pSTS.SIGNIFICANCE STATEMENT In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is hard to understand (e.g., background noise). Prior work has suggested that specialized regions of the brain may play a critical role in integrating information from visual and auditory speech. Here, we show a complementary mechanism relying on synchronized brain activity among sensory and motor regions may also play a critical role. These findings encourage reconceptualizing audiovisual integration in the context of coordinated network activity.
Collapse
Affiliation(s)
- Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Brent Spehar
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Michael S Jones
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Sarah McConkey
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Joel Myerson
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Sandra Hale
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Nancy Tye-Murray
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| |
Collapse
|
12
|
Tseng RY, Wang TW, Fu SW, Lee CY, Tsao Y. A Study of Joint Effect on Denoising Techniques and Visual Cues to Improve Speech Intelligibility in Cochlear Implant Simulation. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.3017042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
13
|
Alzaher M, Vannson N, Deguine O, Marx M, Barone P, Strelnikov K. Brain plasticity and hearing disorders. Rev Neurol (Paris) 2021; 177:1121-1132. [PMID: 34657730 DOI: 10.1016/j.neurol.2021.09.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 09/06/2021] [Accepted: 09/10/2021] [Indexed: 11/30/2022]
Abstract
Permanently changed sensory stimulation can modify functional connectivity patterns in the healthy brain and in pathology. In the pathology case, these adaptive modifications of the brain are referred to as compensation, and the subsequent configurations of functional connectivity are called compensatory plasticity. The variability and extent of auditory deficits due to the impairments in the hearing system determine the related brain reorganization and rehabilitation. In this review, we consider cross-modal and intra-modal brain plasticity related to bilateral and unilateral hearing loss and their restoration using cochlear implantation. Cross-modal brain plasticity may have both beneficial and detrimental effects on hearing disorders. It has a beneficial effect when it serves to improve a patient's adaptation to the visuo-auditory environment. However, the occupation of the auditory cortex by visual functions may be a negative factor for the restoration of hearing with cochlear implants. In what concerns intra-modal plasticity, the loss of interhemispheric asymmetry in asymmetric hearing loss is deleterious for the auditory spatial localization. Research on brain plasticity in hearing disorders can advance our understanding of brain plasticity and improve the rehabilitation of the patients using prognostic, evidence-based approaches from cognitive neuroscience combined with post-rehabilitation objective biomarkers of this plasticity utilizing neuroimaging.
Collapse
Affiliation(s)
- M Alzaher
- Université de Toulouse, UPS, centre de recherche cerveau et cognition, Toulouse, France; CNRS, CerCo, France
| | - N Vannson
- Université de Toulouse, UPS, centre de recherche cerveau et cognition, Toulouse, France; CNRS, CerCo, France
| | - O Deguine
- Université de Toulouse, UPS, centre de recherche cerveau et cognition, Toulouse, France; CNRS, CerCo, France; Faculté de médecine de Purpan, CHU Toulouse, université de Toulouse 3, France
| | - M Marx
- Université de Toulouse, UPS, centre de recherche cerveau et cognition, Toulouse, France; CNRS, CerCo, France; Faculté de médecine de Purpan, CHU Toulouse, université de Toulouse 3, France
| | - P Barone
- Université de Toulouse, UPS, centre de recherche cerveau et cognition, Toulouse, France; CNRS, CerCo, France.
| | - K Strelnikov
- Faculté de médecine de Purpan, CHU Toulouse, université de Toulouse 3, France
| |
Collapse
|
14
|
Karthik G, Plass J, Beltz AM, Liu Z, Grabowecky M, Suzuki S, Stacey WC, Wasade VS, Towle VL, Tao JX, Wu S, Issa NP, Brang D. Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex. Eur J Neurosci 2021; 54:7301-7317. [PMID: 34587350 DOI: 10.1111/ejn.15482] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 08/20/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022]
Abstract
Speech perception is a central component of social communication. Although principally an auditory process, accurate speech perception in everyday settings is supported by meaningful information extracted from visual cues. Visual speech modulates activity in cortical areas subserving auditory speech perception including the superior temporal gyrus (STG). However, it is unknown whether visual modulation of auditory processing is a unitary phenomenon or, rather, consists of multiple functionally distinct processes. To explore this question, we examined neural responses to audiovisual speech measured from intracranially implanted electrodes in 21 patients with epilepsy. We found that visual speech modulated auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differed across frequency bands. In the theta band, visual speech suppressed the auditory response from before auditory speech onset to after auditory speech onset (-93 to 500 ms) most strongly in the posterior STG. In the beta band, suppression was seen in the anterior STG from -311 to -195 ms before auditory speech onset and in the middle STG from -195 to 235 ms after speech onset. In high gamma, visual speech enhanced the auditory response from -45 to 24 ms only in the posterior STG. We interpret the visual-induced changes prior to speech onset as reflecting crossmodal prediction of speech signals. In contrast, modulations after sound onset may reflect a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.
Collapse
Affiliation(s)
- G Karthik
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - John Plass
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - Adriene M Beltz
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - Zhongming Liu
- Department of Biomedical Engineering and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA
| | - Marcia Grabowecky
- Department of Psychology, Northwestern University, Evanston, Illinois, USA
| | - Satoru Suzuki
- Department of Psychology, Northwestern University, Evanston, Illinois, USA
| | - William C Stacey
- Department of Neurology and Department of Biomedical Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Vibhangini S Wasade
- Department of Neurology, Henry Ford Hospital, Detroit, Michigan, USA.,Department of Neurology, Wayne State University School of Medicine, Detroit, Michigan, USA
| | - Vernon L Towle
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - James X Tao
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - Shasha Wu
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - Naoum P Issa
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
15
|
O'Sullivan AE, Crosse MJ, Liberto GMD, de Cheveigné A, Lalor EC. Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects. J Neurosci 2021; 41:4991-5003. [PMID: 33824190 PMCID: PMC8197638 DOI: 10.1523/jneurosci.0906-20.2021] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 12/27/2022] Open
Abstract
Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Michael J Crosse
- X, The Moonshot Factory, Mountain View, CA and Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
- University College London Ear Institute, University College London, London WC1X 8EE, United Kingdom
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering and Department of Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
16
|
Visual Benefit in Lexical Tone Perception in Mandarin: An Event-related Potential Study. Neuroscience 2021; 466:196-204. [PMID: 33887386 DOI: 10.1016/j.neuroscience.2021.04.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 03/22/2021] [Accepted: 04/10/2021] [Indexed: 11/23/2022]
Abstract
Congruent visual information enhances auditory speech perception. This visual benefit has been widely observed in perception of consonants and vowels, and linked to reduced amplitudes and latencies of auditory N1 and P2 event-related potential (ERP) components when visual information was present. However, it remains unclear whether lexical tone perception in Mandarin also shows this visual benefit. This question is theoretically important given the low visual saliency of lexical tones. The current study compared the N1/P2 reduction in Mandarin lexical tones and consonants perception with a discrimination task. Result showed amplitude reductions in N1/P2 and a latency reduction in N1 for audiovisual lexical tone perception. These findings suggest that lexical tone perception was also helped by visual information as found in consonants. Furthermore, this visual benefit in N1 for lexical tone perception was delayed relative to consonants.
Collapse
|
17
|
Csonka M, Mardmomen N, Webster PJ, Brefczynski-Lewis JA, Frum C, Lewis JW. Meta-Analyses Support a Taxonomic Model for Representations of Different Categories of Audio-Visual Interaction Events in the Human Brain. Cereb Cortex Commun 2021; 2:tgab002. [PMID: 33718874 PMCID: PMC7941256 DOI: 10.1093/texcom/tgab002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 12/31/2020] [Accepted: 01/06/2021] [Indexed: 01/23/2023] Open
Abstract
Our ability to perceive meaningful action events involving objects, people, and other animate agents is characterized in part by an interplay of visual and auditory sensory processing and their cross-modal interactions. However, this multisensory ability can be altered or dysfunctional in some hearing and sighted individuals, and in some clinical populations. The present meta-analysis sought to test current hypotheses regarding neurobiological architectures that may mediate audio-visual multisensory processing. Reported coordinates from 82 neuroimaging studies (137 experiments) that revealed some form of audio-visual interaction in discrete brain regions were compiled, converted to a common coordinate space, and then organized along specific categorical dimensions to generate activation likelihood estimate (ALE) brain maps and various contrasts of those derived maps. The results revealed brain regions (cortical "hubs") preferentially involved in multisensory processing along different stimulus category dimensions, including 1) living versus nonliving audio-visual events, 2) audio-visual events involving vocalizations versus actions by living sources, 3) emotionally valent events, and 4) dynamic-visual versus static-visual audio-visual stimuli. These meta-analysis results are discussed in the context of neurocomputational theories of semantic knowledge representations and perception, and the brain volumes of interest are available for download to facilitate data interpretation for future neuroimaging studies.
Collapse
Affiliation(s)
- Matt Csonka
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Nadia Mardmomen
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Paula J Webster
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Julie A Brefczynski-Lewis
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - Chris Frum
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| | - James W Lewis
- Department of Neuroscience, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
18
|
Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. J Neurosci 2020; 40:6938-6948. [PMID: 32727820 PMCID: PMC7470920 DOI: 10.1523/jneurosci.0279-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/22/2022] Open
Abstract
Experimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech, but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory-only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in which visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.SIGNIFICANCE STATEMENT Understanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.
Collapse
|
19
|
Gao C, Weber CE, Shinkareva SV. The brain basis of audiovisual affective processing: Evidence from a coordinate-based activation likelihood estimation meta-analysis. Cortex 2019; 120:66-77. [DOI: 10.1016/j.cortex.2019.05.016] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 05/03/2019] [Accepted: 05/28/2019] [Indexed: 01/19/2023]
|
20
|
Multisensory Enhancement of Odor Object Processing in Primary Olfactory Cortex. Neuroscience 2019; 418:254-265. [DOI: 10.1016/j.neuroscience.2019.08.040] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 08/22/2019] [Accepted: 08/23/2019] [Indexed: 01/06/2023]
|
21
|
Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019; 8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open
Abstract
Visual information about speech content from the talker's mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.
Collapse
Affiliation(s)
- Patrick J Karas
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - John F Magnotti
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Brian A Metzger
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Lin L Zhu
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Kristen B Smith
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Daniel Yoshor
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | | |
Collapse
|
22
|
A functional MRI investigation of crossmodal interference in an audiovisual Stroop task. PLoS One 2019; 14:e0210736. [PMID: 30645634 PMCID: PMC6333399 DOI: 10.1371/journal.pone.0210736] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 01/01/2019] [Indexed: 01/08/2023] Open
Abstract
The visual color-word Stroop task is widely used in clinical and research settings as a measure of cognitive control. Numerous neuroimaging studies have used color-word Stroop tasks to investigate the neural resources supporting cognitive control, but to our knowledge all have used unimodal (typically visual) Stroop paradigms. Thus, it is possible that this classic measure of cognitive control is not capturing the resources involved in multisensory cognitive control. The audiovisual integration and crossmodal correspondence literatures identify regions sensitive to congruency of auditory and visual stimuli, but it is unclear how these regions relate to the unimodal cognitive control literature. In this study we aimed to identify brain regions engaged by crossmodal cognitive control during an audiovisual color-word Stroop task, and how they relate to previous unimodal Stroop and audiovisual integration findings. First, we replicated previous behavioral audiovisual Stroop findings in an fMRI-adapted audiovisual Stroop paradigm: incongruent visual information increased reaction time towards an auditory stimulus and congruent visual information decreased reaction time. Second, we investigated the brain regions supporting cognitive control during an audiovisual color-word Stroop task using fMRI. Similar to unimodal cognitive control tasks, a left superior parietal region exhibited an interference effect of visual information on the auditory stimulus. This superior parietal region was also identified using a standard audiovisual integration localizing procedure, indicating that audiovisual integration resources are sensitive to cognitive control demands. Facilitation of the auditory stimulus by congruent visual information was found in posterior superior temporal cortex, including in the posterior STS which has been found to support audiovisual integration. The dorsal anterior cingulate cortex, often implicated in unimodal Stroop tasks, was not modulated by the audiovisual Stroop task. Overall the findings indicate that an audiovisual color-word Stroop task engages overlapping resources with audiovisual integration and overlapping but distinct resources compared to unimodal Stroop tasks.
Collapse
|
23
|
Abbott NT, Shahin AJ. Cross-modal phonetic encoding facilitates the McGurk illusion and phonemic restoration. J Neurophysiol 2018; 120:2988-3000. [PMID: 30303762 DOI: 10.1152/jn.00262.2018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
In spoken language, audiovisual (AV) perception occurs when the visual modality influences encoding of acoustic features (e.g., phonetic representations) at the auditory cortex. We examined how visual speech (mouth movements) transforms phonetic representations, indexed by changes to the N1 auditory evoked potential (AEP). EEG was acquired while human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables, /ba/ and /wa/, presented in auditory-only or AV congruent or incongruent contexts or in a context in which the consonants were replaced by white noise (noise replaced). Subjects reported whether they heard "ba" or "wa." We hypothesized that the auditory N1 amplitude during illusory perception (caused by incongruent AV input, as in the McGurk illusion, or white noise-replaced consonants in CV utterances) should shift to reflect the auditory N1 characteristics of the phonemes conveyed visually (by mouth movements) as opposed to acoustically. Indeed, the N1 AEP became larger and occurred earlier when listeners experienced illusory "ba" (video /ba/, audio /wa/, heard as "ba") and vice versa when they experienced illusory "wa" (video /wa/, audio /ba/, heard as "wa"), mirroring the N1 AEP characteristics for /ba/ and /wa/ observed in natural acoustic situations (e.g., auditory-only setting). This visually mediated N1 behavior was also observed for noise-replaced CVs. Taken together, the findings suggest that information relayed by the visual modality modifies phonetic representations at the auditory cortex and that similar neural mechanisms support the McGurk illusion and visually mediated phonemic restoration. NEW & NOTEWORTHY Using a variant of the McGurk illusion experimental design (using the syllables /ba/ and /wa/), we demonstrate that lipreading influences phonetic encoding at the auditory cortex. We show that the N1 auditory evoked potential morphology shifts to resemble the N1 morphology of the syllable conveyed visually. We also show similar N1 shifts when the consonants are replaced by white noise, suggesting that the McGurk illusion and the visually mediated phonemic restoration rely on common mechanisms.
Collapse
Affiliation(s)
- Noelle T Abbott
- Center for Mind and Brain, University of California, Davis, California.,San Diego State University-University of California, San Diego Joint Doctoral Program in Language and Communicative Disorders, San Diego, California
| | - Antoine J Shahin
- Center for Mind and Brain, University of California, Davis, California.,Department of Cognitive and Information Sciences, University of California, Merced, California
| |
Collapse
|
24
|
Devesse A, Dudek A, van Wieringen A, Wouters J. Speech intelligibility of virtual humans. Int J Audiol 2018; 57:908-916. [DOI: 10.1080/14992027.2018.1511922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Annelies Devesse
- KU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Alexander Dudek
- KU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Astrid van Wieringen
- KU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Jan Wouters
- KU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| |
Collapse
|
25
|
Neural networks supporting audiovisual integration for speech: A large-scale lesion study. Cortex 2018; 103:360-371. [PMID: 29705718 DOI: 10.1016/j.cortex.2018.03.030] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/05/2017] [Accepted: 03/23/2018] [Indexed: 10/17/2022]
Abstract
Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration.
Collapse
|
26
|
Atilgan H, Town SM, Wood KC, Jones GP, Maddox RK, Lee AKC, Bizley JK. Integration of Visual Information in Auditory Cortex Promotes Auditory Scene Analysis through Multisensory Binding. Neuron 2018; 97:640-655.e4. [PMID: 29395914 PMCID: PMC5814679 DOI: 10.1016/j.neuron.2017.12.034] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Revised: 10/28/2017] [Accepted: 12/22/2017] [Indexed: 12/29/2022]
Abstract
How and where in the brain audio-visual signals are bound to create multimodal objects remains unknown. One hypothesis is that temporal coherence between dynamic multisensory signals provides a mechanism for binding stimulus features across sensory modalities. Here, we report that when the luminance of a visual stimulus is temporally coherent with the amplitude fluctuations of one sound in a mixture, the representation of that sound is enhanced in auditory cortex. Critically, this enhancement extends to include both binding and non-binding features of the sound. We demonstrate that visual information conveyed from visual cortex via the phase of the local field potential is combined with auditory information within auditory cortex. These data provide evidence that early cross-sensory binding provides a bottom-up mechanism for the formation of cross-sensory objects and that one role for multisensory binding in auditory cortex is to support auditory scene analysis. Visual stimuli can shape how auditory cortical neurons respond to sound mixtures Temporal coherence between senses enhances sound features of a bound multisensory object Visual stimuli elicit changes in the phase of the local field potential in auditory cortex Vision-induced phase effects are lost when visual cortex is reversibly silenced
Collapse
Affiliation(s)
- Huriye Atilgan
- The Ear Institute, University College London, London, UK
| | - Stephen M Town
- The Ear Institute, University College London, London, UK
| | | | - Gareth P Jones
- The Ear Institute, University College London, London, UK
| | - Ross K Maddox
- Department of Biomedical Engineering and Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; Institute for Learning and Brain Sciences and Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Adrian K C Lee
- Institute for Learning and Brain Sciences and Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | | |
Collapse
|
27
|
Neural Mechanisms Underlying Cross-Modal Phonetic Encoding. J Neurosci 2017; 38:1835-1849. [PMID: 29263241 DOI: 10.1523/jneurosci.1566-17.2017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 11/17/2017] [Accepted: 12/08/2017] [Indexed: 11/21/2022] Open
Abstract
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (illusion-fa), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (illusion-ba), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba, and a reduced N1 when they perceived illusion-fa, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex.SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
Collapse
|
28
|
Wolf D, Rekittke LM, Mittelberg I, Klasen M, Mathiak K. Perceived Conventionality in Co-speech Gestures Involves the Fronto-Temporal Language Network. Front Hum Neurosci 2017; 11:573. [PMID: 29249945 PMCID: PMC5714878 DOI: 10.3389/fnhum.2017.00573] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 11/13/2017] [Indexed: 11/16/2022] Open
Abstract
Face-to-face communication is multimodal; it encompasses spoken words, facial expressions, gaze, and co-speech gestures. In contrast to linguistic symbols (e.g., spoken words or signs in sign language) relying on mostly explicit conventions, gestures vary in their degree of conventionality. Bodily signs may have a general accepted or conventionalized meaning (e.g., a head shake) or less so (e.g., self-grooming). We hypothesized that subjective perception of conventionality in co-speech gestures relies on the classical language network, i.e., the left hemispheric inferior frontal gyrus (IFG, Broca's area) and the posterior superior temporal gyrus (pSTG, Wernicke's area) and studied 36 subjects watching video-recorded story retellings during a behavioral and an functional magnetic resonance imaging (fMRI) experiment. It is well documented that neural correlates of such naturalistic videos emerge as intersubject covariance (ISC) in fMRI even without involving a stimulus (model-free analysis). The subjects attended either to perceived conventionality or to a control condition (any hand movements or gesture-speech relations). Such tasks modulate ISC in contributing neural structures and thus we studied ISC changes to task demands in language networks. Indeed, the conventionality task significantly increased covariance of the button press time series and neuronal synchronization in the left IFG over the comparison with other tasks. In the left IFG, synchronous activity was observed during the conventionality task only. In contrast, the left pSTG exhibited correlated activation patterns during all conditions with an increase in the conventionality task at the trend level only. Conceivably, the left IFG can be considered a core region for the processing of perceived conventionality in co-speech gestures similar to spoken language. In general, the interpretation of conventionalized signs may rely on neural mechanisms that engage during language comprehension.
Collapse
Affiliation(s)
- Dhana Wolf
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty, RWTH Aachen, Aachen, Germany.,Natural Media Lab, Human Technology Centre, RWTH Aachen, Aachen, Germany.,Center for Sign Language and Gesture (SignGes), RWTH Aachen, Aachen, Germany
| | - Linn-Marlen Rekittke
- Natural Media Lab, Human Technology Centre, RWTH Aachen, Aachen, Germany.,Center for Sign Language and Gesture (SignGes), RWTH Aachen, Aachen, Germany
| | - Irene Mittelberg
- Natural Media Lab, Human Technology Centre, RWTH Aachen, Aachen, Germany.,Center for Sign Language and Gesture (SignGes), RWTH Aachen, Aachen, Germany
| | - Martin Klasen
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty, RWTH Aachen, Aachen, Germany.,JARA-Translational Brain Medicine, RWTH Aachen, Aachen, Germany
| | - Klaus Mathiak
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical Faculty, RWTH Aachen, Aachen, Germany.,Center for Sign Language and Gesture (SignGes), RWTH Aachen, Aachen, Germany.,JARA-Translational Brain Medicine, RWTH Aachen, Aachen, Germany
| |
Collapse
|
29
|
Irwin J, DiBlasi L. Audiovisual speech perception: A new approach and implications for clinical populations. LANGUAGE AND LINGUISTICS COMPASS 2017; 11:77-91. [PMID: 29520300 PMCID: PMC5839512 DOI: 10.1111/lnc3.12237] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 01/25/2017] [Indexed: 06/01/2023]
Abstract
This selected overview of audiovisual (AV) speech perception examines the influence of visible articulatory information on what is heard. Thought to be a cross-cultural phenomenon that emerges early in typical language development, variables that influence AV speech perception include properties of the visual and the auditory signal, attentional demands, and individual differences. A brief review of the existing neurobiological evidence on how visual information influences heard speech indicates potential loci, timing, and facilitatory effects of AV over auditory only speech. The current literature on AV speech in certain clinical populations (individuals with an autism spectrum disorder, developmental language disorder, or hearing loss) reveals differences in processing that may inform interventions. Finally, a new method of assessing AV speech that does not require obvious cross-category mismatch or auditory noise was presented as a novel approach for investigators.
Collapse
Affiliation(s)
- Julia Irwin
- LEARN Center, Haskins Laboratories Inc., USA
| | | |
Collapse
|
30
|
Rosenblum LD, Dorsi J, Dias JW. The Impact and Status of Carol Fowler's Supramodal Theory of Multisensory Speech Perception. ECOLOGICAL PSYCHOLOGY 2016. [DOI: 10.1080/10407413.2016.1230373] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
31
|
Functional organization of the language network in three- and six-year-old children. Neuropsychologia 2016; 98:24-33. [PMID: 27542319 PMCID: PMC5407357 DOI: 10.1016/j.neuropsychologia.2016.08.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 08/02/2016] [Accepted: 08/14/2016] [Indexed: 11/08/2022]
Abstract
The organization of the language network undergoes continuous changes during development as children learn to understand sentences. In the present study, functional magnetic resonance imaging and behavioral measures were utilized to investigate functional activation and functional connectivity (FC) in three-year-old (3yo) and six-year-old (6yo) children during sentence comprehension. Transitive German sentences varying the word order (subject-initial and object-initial) with case marking were presented auditorily. We selected children who were capable of processing the subject-initial sentences above chance level accuracy from each age group to ensure that we were tapping real comprehension. Both age groups showed a main effect of word order in the left posterior superior temporal gyrus (pSTG), with greater activation for object-initial compared to subject-initial sentences. However, age differences were observed in the FC between left pSTG and the left inferior frontal gyrus (IFG). The 6yo group showed stronger FC between the left pSTG and Brodmann area (BA) 44 of the left IFG compared to the 3yo group. For the 3yo group, in turn, the FC between left pSTG and left BA 45 was stronger than with left BA 44. Our study demonstrates that while task-related activation was comparable, the small behavioral differences between age groups were reflected in the underlying functional organization revealing the ongoing development of the neural language network. We examined functional connectivity of sentence processing in 3- and 6-year-olds. Performance-matched age groups activated left pSTG for processing complex syntax. 6-year-olds had stronger connectivity between left BA44 and pSTG than 3-year-olds. 3-year-olds had greater connectivity between left BA45 and pSTG than BA44 and pSTG. Functional connectivity results could be related to behavioral performance.
Collapse
|
32
|
Shinozaki J, Hiroe N, Sato MA, Nagamine T, Sekiyama K. Impact of language on functional connectivity for audiovisual speech integration. Sci Rep 2016; 6:31388. [PMID: 27510407 PMCID: PMC4980767 DOI: 10.1038/srep31388] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 07/19/2016] [Indexed: 11/17/2022] Open
Abstract
Visual information about lip and facial movements plays a role in audiovisual (AV) speech perception. Although this has been widely confirmed, previous behavioural studies have shown interlanguage differences, that is, native Japanese speakers do not integrate auditory and visual speech as closely as native English speakers. To elucidate the neural basis of such interlanguage differences, 22 native English speakers and 24 native Japanese speakers were examined in behavioural or functional Magnetic Resonance Imaging (fMRI) experiments while mono-syllabic speech was presented under AV, auditory-only, or visual-only conditions for speech identification. Behavioural results indicated that the English speakers identified visual speech more quickly than the Japanese speakers, and that the temporal facilitation effect of congruent visual speech was significant in the English speakers but not in the Japanese speakers. Using fMRI data, we examined the functional connectivity among brain regions important for auditory-visual interplay. The results indicated that the English speakers had significantly stronger connectivity between the visual motion area MT and the Heschl’s gyrus compared with the Japanese speakers, which may subserve lower-level visual influences on speech perception in English speakers in a multisensory environment. These results suggested that linguistic experience strongly affects neural connectivity involved in AV speech integration.
Collapse
Affiliation(s)
- Jun Shinozaki
- Department of Systems Neuroscience, School of Medicine, Sapporo Medical University, Sapporo, Japan
| | - Nobuo Hiroe
- ATR Neural Information Analysis Laboratories, Seika-cho, Japan
| | - Masa-Aki Sato
- ATR Neural Information Analysis Laboratories, Seika-cho, Japan
| | - Takashi Nagamine
- Department of Systems Neuroscience, School of Medicine, Sapporo Medical University, Sapporo, Japan
| | - Kaoru Sekiyama
- Division of Cognitive Psychology, Faculty of Letters, Kumamoto University, Kumamoto, Japan
| |
Collapse
|
33
|
A dynamical framework to relate perceptual variability with multisensory information processing. Sci Rep 2016; 6:31280. [PMID: 27502974 PMCID: PMC4977493 DOI: 10.1038/srep31280] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 07/15/2016] [Indexed: 11/29/2022] Open
Abstract
Multisensory processing involves participation of individual sensory streams, e.g., vision, audition to facilitate perception of environmental stimuli. An experimental realization of the underlying complexity is captured by the “McGurk-effect”- incongruent auditory and visual vocalization stimuli eliciting perception of illusory speech sounds. Further studies have established that time-delay between onset of auditory and visual signals (AV lag) and perturbations in the unisensory streams are key variables that modulate perception. However, as of now only few quantitative theoretical frameworks have been proposed to understand the interplay among these psychophysical variables or the neural systems level interactions that govern perceptual variability. Here, we propose a dynamic systems model consisting of the basic ingredients of any multisensory processing, two unisensory and one multisensory sub-system (nodes) as reported by several researchers. The nodes are connected such that biophysically inspired coupling parameters and time delays become key parameters of this network. We observed that zero AV lag results in maximum synchronization of constituent nodes and the degree of synchronization decreases when we have non-zero lags. The attractor states of this network can thus be interpreted as the facilitator for stabilizing specific perceptual experience. Thereby, the dynamic model presents a quantitative framework for understanding multisensory information processing.
Collapse
|
34
|
Venezia JH, Fillmore P, Matchin W, Isenberg AL, Hickok G, Fridriksson J. Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech. Neuroimage 2016; 126:196-207. [PMID: 26608242 PMCID: PMC4733636 DOI: 10.1016/j.neuroimage.2015.11.038] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Revised: 11/09/2015] [Accepted: 11/15/2015] [Indexed: 11/22/2022] Open
Abstract
Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development.
Collapse
Affiliation(s)
- Jonathan H Venezia
- Department of Cognitive Sciences, University of California, Irvine, Irvine, CA 92697, United States.
| | - Paul Fillmore
- Department of Communication Sciences and Disorders, Baylor University, Waco, TX 76798, United States
| | - William Matchin
- Department of Linguistics, University of Maryland, College Park, MD 20742, United States
| | - A Lisette Isenberg
- Department of Cognitive Sciences, University of California, Irvine, Irvine, CA 92697, United States
| | - Gregory Hickok
- Department of Cognitive Sciences, University of California, Irvine, Irvine, CA 92697, United States
| | - Julius Fridriksson
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, SC 29208, United States
| |
Collapse
|
35
|
Rhone AE, Nourski KV, Oya H, Kawasaki H, Howard MA, McMurray B. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex. LANGUAGE, COGNITION AND NEUROSCIENCE 2015; 31:284-302. [PMID: 27182530 PMCID: PMC4865257 DOI: 10.1080/23273798.2015.1101145] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
Collapse
|
36
|
Lüttke CS, Ekman M, van Gerven MAJ, de Lange FP. Preference for Audiovisual Speech Congruency in Superior Temporal Cortex. J Cogn Neurosci 2015; 28:1-7. [PMID: 26351991 DOI: 10.1162/jocn_a_00874] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Auditory speech perception can be altered by concurrent visual information. The superior temporal cortex is an important combining site for this integration process. This area was previously found to be sensitive to audiovisual congruency. However, the direction of this congruency effect (i.e., stronger or weaker activity for congruent compared to incongruent stimulation) has been more equivocal. Here, we used fMRI to look at the neural responses of human participants during the McGurk illusion--in which auditory /aba/ and visual /aga/ inputs are fused to perceived /ada/--in a large homogenous sample of participants who consistently experienced this illusion. This enabled us to compare the neuronal responses during congruent audiovisual stimulation with incongruent audiovisual stimulation leading to the McGurk illusion while avoiding the possible confounding factor of sensory surprise that can occur when McGurk stimuli are only occasionally perceived. We found larger activity for congruent audiovisual stimuli than for incongruent (McGurk) stimuli in bilateral superior temporal cortex, extending into the primary auditory cortex. This finding suggests that superior temporal cortex prefers when auditory and visual input support the same representation.
Collapse
|
37
|
Riedel P, Ragert P, Schelinski S, Kiebel SJ, von Kriegstein K. Visual face-movement sensitive cortex is relevant for auditory-only speech recognition. Cortex 2015; 68:86-99. [DOI: 10.1016/j.cortex.2014.11.016] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 10/24/2014] [Accepted: 11/25/2014] [Indexed: 12/31/2022]
|
38
|
Prediction and constraint in audiovisual speech perception. Cortex 2015; 68:169-81. [PMID: 25890390 DOI: 10.1016/j.cortex.2015.03.006] [Citation(s) in RCA: 128] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Revised: 01/28/2015] [Accepted: 03/08/2015] [Indexed: 11/23/2022]
Abstract
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms.
Collapse
|
39
|
Bernstein LE, Liebenthal E. Neural pathways for visual speech perception. Front Neurosci 2014; 8:386. [PMID: 25520611 PMCID: PMC4248808 DOI: 10.3389/fnins.2014.00386] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 11/10/2014] [Indexed: 12/03/2022] Open
Abstract
This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Department of Speech and Hearing Sciences, George Washington University Washington, DC, USA
| | - Einat Liebenthal
- Department of Neurology, Medical College of Wisconsin Milwaukee, WI, USA ; Department of Psychiatry, Brigham and Women's Hospital Boston, MA, USA
| |
Collapse
|
40
|
Kim J, Davis C. How visual timing and form information affect speech and non-speech processing. BRAIN AND LANGUAGE 2014; 137:86-90. [PMID: 25190328 DOI: 10.1016/j.bandl.2014.07.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Revised: 07/13/2014] [Accepted: 07/17/2014] [Indexed: 06/03/2023]
Abstract
Auditory speech processing is facilitated when the talker's face/head movements are seen. This effect is typically explained in terms of visual speech providing form and/or timing information. We determined the effect of both types of information on a speech/non-speech task (non-speech stimuli were spectrally rotated speech). All stimuli were presented paired with the talker's static or moving face. Two types of moving face stimuli were used: full-face versions (both spoken form and timing information available) and modified face versions (only timing information provided by peri-oral motion available). The results showed that the peri-oral timing information facilitated response time for speech and non-speech stimuli compared to a static face. An additional facilitatory effect was found for full-face versions compared to the timing condition; this effect only occurred for speech stimuli. We propose the timing effect was due to cross-modal phase resetting; the form effect to cross-modal priming.
Collapse
Affiliation(s)
- Jeesun Kim
- The MARCS Institute, University of Western Sydney, Australia.
| | - Chris Davis
- The MARCS Institute, University of Western Sydney, Australia.
| |
Collapse
|
41
|
Erickson LC, Heeg E, Rauschecker JP, Turkeltaub PE. An ALE meta-analysis on the audiovisual integration of speech signals. Hum Brain Mapp 2014; 35:5587-605. [PMID: 24996043 DOI: 10.1002/hbm.22572] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Revised: 05/28/2014] [Accepted: 06/24/2014] [Indexed: 11/09/2022] Open
Abstract
The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals.
Collapse
Affiliation(s)
- Laura C Erickson
- Department of Neurology, Georgetown University Medical Center, Washington, District of Columbia; Department of Neuroscience, Georgetown University Medical Center, Washington, District of Columbia
| | | | | | | |
Collapse
|
42
|
Schepers IM, Yoshor D, Beauchamp MS. Electrocorticography Reveals Enhanced Visual Cortex Responses to Visual Speech. Cereb Cortex 2014; 25:4103-10. [PMID: 24904069 DOI: 10.1093/cercor/bhu127] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Human speech contains both auditory and visual components, processed by their respective sensory cortices. We test a simple model in which task-relevant speech information is enhanced during cortical processing. Visual speech is most important when the auditory component is uninformative. Therefore, the model predicts that visual cortex responses should be enhanced to visual-only (V) speech compared with audiovisual (AV) speech. We recorded neuronal activity as patients perceived auditory-only (A), V, and AV speech. Visual cortex showed strong increases in high-gamma band power and strong decreases in alpha-band power to V and AV speech. Consistent with the model prediction, gamma-band increases and alpha-band decreases were stronger for V speech. The model predicts that the uninformative nature of the auditory component (not simply its absence) is the critical factor, a prediction we tested in a second experiment in which visual speech was paired with auditory white noise. As predicted, visual speech with auditory noise showed enhanced visual cortex responses relative to AV speech. An examination of the anatomical locus of the effects showed that all visual areas, including primary visual cortex, showed enhanced responses. Visual cortex responses to speech are enhanced under circumstances when visual information is most important for comprehension.
Collapse
Affiliation(s)
- Inga M Schepers
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA Current Address: Department of Psychology, Oldenburg University, Oldenburg, Germany
| | - Daniel Yoshor
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Michael S Beauchamp
- Department of Neurobiology and Anatomy, University of Texas Medical School at Houston, Houston, TX, USA
| |
Collapse
|
43
|
Mitchel AD, Christiansen MH, Weiss DJ. Multimodal integration in statistical learning: evidence from the McGurk illusion. Front Psychol 2014; 5:407. [PMID: 24904449 PMCID: PMC4032898 DOI: 10.3389/fpsyg.2014.00407] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/18/2014] [Indexed: 11/16/2022] Open
Abstract
Recent advances in the field of statistical learning have established that learners are able to track regularities of multimodal stimuli, yet it is unknown whether the statistical computations are performed on integrated representations or on separate, unimodal representations. In the present study, we investigated the ability of adults to integrate audio and visual input during statistical learning. We presented learners with a speech stream synchronized with a video of a speaker’s face. In the critical condition, the visual (e.g., /gi/) and auditory (e.g., /mi/) signals were occasionally incongruent, which we predicted would produce the McGurk illusion, resulting in the perception of an audiovisual syllable (e.g., /ni/). In this way, we used the McGurk illusion to manipulate the underlying statistical structure of the speech streams, such that perception of these illusory syllables facilitated participants’ ability to segment the speech stream. Our results therefore demonstrate that participants can integrate audio and visual input to perceive the McGurk illusion during statistical learning. We interpret our findings as support for modality-interactive accounts of statistical learning.
Collapse
Affiliation(s)
- Aaron D Mitchel
- Department of Psychology and Program in Neuroscience, Bucknell University Lewisburg, PA, USA
| | - Morten H Christiansen
- Department of Psychology, Cornell University Ithaca, NY, USA ; Department of Language and Communication, University of Southern Denmark Odense, Denmark ; Haskins Laboratories, New Haven CT, USA
| | - Daniel J Weiss
- Department of Psychology and Program in Linguistics, Pennsylvania State University, University Park PA, USA
| |
Collapse
|
44
|
Szycik G, Ye Z, Mohammadi B, Dillo W, te Wildt B, Samii A, Frieling H, Bleich S, Münte T. Maladaptive connectivity of Broca’s area in schizophrenia during audiovisual speech perception: An fMRI study. Neuroscience 2013; 253:274-82. [DOI: 10.1016/j.neuroscience.2013.08.041] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Revised: 08/18/2013] [Accepted: 08/22/2013] [Indexed: 11/16/2022]
|