1
|
Ahn E, Majumdar A, Lee TG, Brang D. Evidence for a Causal Dissociation of the McGurk Effect and Congruent Audiovisual Speech Perception via TMS to the Left pSTS. Multisens Res 2024; 37:341-363. [PMID: 39191410 PMCID: PMC11388023 DOI: 10.1163/22134808-bja10129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 07/21/2024] [Indexed: 08/29/2024]
Abstract
Congruent visual speech improves speech perception accuracy, particularly in noisy environments. Conversely, mismatched visual speech can alter what is heard, leading to an illusory percept that differs from the auditory and visual components, known as the McGurk effect. While prior transcranial magnetic stimulation (TMS) and neuroimaging studies have identified the left posterior superior temporal sulcus (pSTS) as a causal region involved in the generation of the McGurk effect, it remains unclear whether this region is critical only for this illusion or also for the more general benefits of congruent visual speech (e.g., increased accuracy and faster reaction times). Indeed, recent correlative research suggests that the benefits of congruent visual speech and the McGurk effect rely on largely independent mechanisms. To better understand how these different features of audiovisual integration are causally generated by the left pSTS, we used single-pulse TMS to temporarily disrupt processing within this region while subjects were presented with either congruent or incongruent (McGurk) audiovisual combinations. Consistent with past research, we observed that TMS to the left pSTS reduced the strength of the McGurk effect. Importantly, however, left pSTS stimulation had no effect on the positive benefits of congruent audiovisual speech (increased accuracy and faster reaction times), demonstrating a causal dissociation between the two processes. Our results are consistent with models proposing that the pSTS is but one of multiple critical areas supporting audiovisual speech interactions. Moreover, these data add to a growing body of evidence suggesting that the McGurk effect is an imperfect surrogate measure for more general and ecologically valid audiovisual speech behaviors.
Collapse
Affiliation(s)
- EunSeon Ahn
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Areti Majumdar
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Taraz G Lee
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
2
|
Dong C, Noppeney U, Wang S. Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli. Hum Brain Mapp 2024; 45:e26653. [PMID: 38488460 DOI: 10.1002/hbm.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
Face-to-face communication relies on the integration of acoustic speech signals with the corresponding facial articulations. In the McGurk illusion, an auditory /ba/ phoneme presented simultaneously with a facial articulation of a /ga/ (i.e., viseme), is typically fused into an illusory 'da' percept. Despite its widespread use as an index of audiovisual speech integration, critics argue that it arises from perceptual processes that differ categorically from natural speech recognition. Conversely, Bayesian theoretical frameworks suggest that both the illusory McGurk and the veridical audiovisual congruent speech percepts result from probabilistic inference based on noisy sensory signals. According to these models, the inter-sensory conflict in McGurk stimuli may only increase observers' perceptual uncertainty. This functional magnetic resonance imaging (fMRI) study presented participants (20 male and 24 female) with audiovisual congruent, McGurk (i.e., auditory /ba/ + visual /ga/), and incongruent (i.e., auditory /ga/ + visual /ba/) stimuli along with their unisensory counterparts in a syllable categorization task. Behaviorally, observers' response entropy was greater for McGurk compared to congruent audiovisual stimuli. At the neural level, McGurk stimuli increased activations in a widespread neural system, extending from the inferior frontal sulci (IFS) to the pre-supplementary motor area (pre-SMA) and insulae, typically involved in cognitive control processes. Crucially, in line with Bayesian theories these activation increases were fully accounted for by observers' perceptual uncertainty as measured by their response entropy. Our findings suggest that McGurk and congruent speech processing rely on shared neural mechanisms, thereby supporting the McGurk illusion as a valid measure of natural audiovisual speech perception.
Collapse
Affiliation(s)
- Chenjie Dong
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Uta Noppeney
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Suiping Wang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
| |
Collapse
|
3
|
Ahn E, Majumdar A, Lee T, Brang D. Evidence for a Causal Dissociation of the McGurk Effect and Congruent Audiovisual Speech Perception via TMS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568892. [PMID: 38077093 PMCID: PMC10705272 DOI: 10.1101/2023.11.27.568892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Congruent visual speech improves speech perception accuracy, particularly in noisy environments. Conversely, mismatched visual speech can alter what is heard, leading to an illusory percept known as the McGurk effect. This illusion has been widely used to study audiovisual speech integration, illustrating that auditory and visual cues are combined in the brain to generate a single coherent percept. While prior transcranial magnetic stimulation (TMS) and neuroimaging studies have identified the left posterior superior temporal sulcus (pSTS) as a causal region involved in the generation of the McGurk effect, it remains unclear whether this region is critical only for this illusion or also for the more general benefits of congruent visual speech (e.g., increased accuracy and faster reaction times). Indeed, recent correlative research suggests that the benefits of congruent visual speech and the McGurk effect reflect largely independent mechanisms. To better understand how these different features of audiovisual integration are causally generated by the left pSTS, we used single-pulse TMS to temporarily impair processing while subjects were presented with either incongruent (McGurk) or congruent audiovisual combinations. Consistent with past research, we observed that TMS to the left pSTS significantly reduced the strength of the McGurk effect. Importantly, however, left pSTS stimulation did not affect the positive benefits of congruent audiovisual speech (increased accuracy and faster reaction times), demonstrating a causal dissociation between the two processes. Our results are consistent with models proposing that the pSTS is but one of multiple critical areas supporting audiovisual speech interactions. Moreover, these data add to a growing body of evidence suggesting that the McGurk effect is an imperfect surrogate measure for more general and ecologically valid audiovisual speech behaviors.
Collapse
Affiliation(s)
- EunSeon Ahn
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Areti Majumdar
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Taraz Lee
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
4
|
Krason A, Vigliocco G, Mailend ML, Stoll H, Varley R, Buxbaum LJ. Benefit of visual speech information for word comprehension in post-stroke aphasia. Cortex 2023; 165:86-100. [PMID: 37271014 PMCID: PMC10850036 DOI: 10.1016/j.cortex.2023.04.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 03/13/2023] [Accepted: 04/22/2023] [Indexed: 06/06/2023]
Abstract
Aphasia is a language disorder that often involves speech comprehension impairments affecting communication. In face-to-face settings, speech is accompanied by mouth and facial movements, but little is known about the extent to which they benefit aphasic comprehension. This study investigated the benefit of visual information accompanying speech for word comprehension in people with aphasia (PWA) and the neuroanatomic substrates of any benefit. Thirty-six PWA and 13 neurotypical matched control participants performed a picture-word verification task in which they indicated whether a picture of an animate/inanimate object matched a subsequent word produced by an actress in a video. Stimuli were either audiovisual (with visible mouth and facial movements) or auditory-only (still picture of a silhouette) with audio being clear (unedited) or degraded (6-band noise-vocoding). We found that visual speech information was more beneficial for neurotypical participants than PWA, and more beneficial for both groups when speech was degraded. A multivariate lesion-symptom mapping analysis for the degraded speech condition showed that lesions to superior temporal gyrus, underlying insula, primary and secondary somatosensory cortices, and inferior frontal gyrus were associated with reduced benefit of audiovisual compared to auditory-only speech, suggesting that the integrity of these fronto-temporo-parietal regions may facilitate cross-modal mapping. These findings provide initial insights into our understanding of the impact of audiovisual information on comprehension in aphasia and the brain regions mediating any benefit.
Collapse
Affiliation(s)
- Anna Krason
- Experimental Psychology, University College London, UK; Moss Rehabilitation Research Institute, Elkins Park, PA, USA.
| | - Gabriella Vigliocco
- Experimental Psychology, University College London, UK; Moss Rehabilitation Research Institute, Elkins Park, PA, USA
| | - Marja-Liisa Mailend
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Department of Special Education, University of Tartu, Tartu Linn, Estonia
| | - Harrison Stoll
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Applied Cognitive and Brain Science, Drexel University, Philadelphia, PA, USA
| | | | - Laurel J Buxbaum
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Department of Rehabilitation Medicine, Thomas Jefferson University, Philadelphia, PA, USA
| |
Collapse
|
5
|
Pepper JL, Nuttall HE. Age-Related Changes to Multisensory Integration and Audiovisual Speech Perception. Brain Sci 2023; 13:1126. [PMID: 37626483 PMCID: PMC10452685 DOI: 10.3390/brainsci13081126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/20/2023] [Accepted: 07/22/2023] [Indexed: 08/27/2023] Open
Abstract
Multisensory integration is essential for the quick and accurate perception of our environment, particularly in everyday tasks like speech perception. Research has highlighted the importance of investigating bottom-up and top-down contributions to multisensory integration and how these change as a function of ageing. Specifically, perceptual factors like the temporal binding window and cognitive factors like attention and inhibition appear to be fundamental in the integration of visual and auditory information-integration that may become less efficient as we age. These factors have been linked to brain areas like the superior temporal sulcus, with neural oscillations in the alpha-band frequency also being implicated in multisensory processing. Age-related changes in multisensory integration may have significant consequences for the well-being of our increasingly ageing population, affecting their ability to communicate with others and safely move through their environment; it is crucial that the evidence surrounding this subject continues to be carefully investigated. This review will discuss research into age-related changes in the perceptual and cognitive mechanisms of multisensory integration and the impact that these changes have on speech perception and fall risk. The role of oscillatory alpha activity is of particular interest, as it may be key in the modulation of multisensory integration.
Collapse
Affiliation(s)
| | - Helen E. Nuttall
- Department of Psychology, Lancaster University, Bailrigg LA1 4YF, UK;
| |
Collapse
|
6
|
Di Pietro SV, Karipidis II, Pleisch G, Brem S. Neurodevelopmental trajectories of letter and speech sound processing from preschool to the end of elementary school. Dev Cogn Neurosci 2023; 61:101255. [PMID: 37196374 DOI: 10.1016/j.dcn.2023.101255] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 03/20/2023] [Accepted: 05/11/2023] [Indexed: 05/19/2023] Open
Abstract
Learning to read alphabetic languages starts with learning letter-speech-sound associations. How this process changes brain function during development is still largely unknown. We followed 102 children with varying reading skills in a mixed-longitudinal/cross-sectional design from the prereading stage to the end of elementary school over five time points (n = 46 with two and more time points, of which n = 16 fully-longitudinal) to investigate the neural trajectories of letter and speech sound processing using fMRI. Children were presented with letters and speech sounds visually, auditorily, and audiovisually in kindergarten (6.7yo), at the middle (7.3yo) and end of first grade (7.6yo), and in second (8.4yo) and fifth grades (11.5yo). Activation of the ventral occipitotemporal cortex for visual and audiovisual processing followed a complex trajectory, with two peaks in first and fifth grades. The superior temporal gyrus (STG) showed an inverted U-shaped trajectory for audiovisual letter processing, a development that in poor readers was attenuated in middle STG and absent in posterior STG. Finally, the trajectories for letter-speech-sound integration were modulated by reading skills and showed differing directionality in the congruency effect depending on the time point. This unprecedented study captures the development of letter processing across elementary school and its neural trajectories in children with varying reading skills.
Collapse
Affiliation(s)
- S V Di Pietro
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and ETH Zurich, Switzerland; URPP Adaptive Brain Circuits in Development and Learning (AdaBD), University of Zurich, Zurich, Switzerland
| | - I I Karipidis
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and ETH Zurich, Switzerland
| | - G Pleisch
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Switzerland
| | - S Brem
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and ETH Zurich, Switzerland; URPP Adaptive Brain Circuits in Development and Learning (AdaBD), University of Zurich, Zurich, Switzerland.
| |
Collapse
|
7
|
Quinones JF, Hildebrandt A, Pavan T, Thiel CM, Heep A. Preterm birth and neonatal white matter microstructure in in-vivo reconstructed fiber tracts among audiovisual integration brain regions. Dev Cogn Neurosci 2023; 60:101202. [PMID: 36731359 PMCID: PMC9894786 DOI: 10.1016/j.dcn.2023.101202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 01/02/2023] [Accepted: 01/25/2023] [Indexed: 01/28/2023] Open
Abstract
Individuals born preterm are at risk of developing a variety of sequelae. Audiovisual integration (AVI) has received little attention despite its facilitating role in the development of socio-cognitive abilities. The present study assessed the association between prematurity and in-vivo reconstructed fiber bundles among brain regions relevant for AVI. We retrieved data from 63 preterm neonates enrolled in the Developing Human Connectome Project (http://www.developingconnectome.org/) and matched them with 63 term-born neonates from the same study by means of propensity score matching. We performed probabilistic tractography, DTI and NODDI analysis on the traced fibers. We found that specific DTI and NODDI metrics are significantly associated with prematurity in neonates matched for postmenstrual age at scan. We investigated the spatial overlap and developmental order of the reconstructed tractograms between preterm and full-term neonates. Permutation-based analysis revealed significant differences in dice similarity coefficients and developmental order between preterm and full term neonates at the group level. Contrarily, no group differences in the amount of interindividual variability of DTI and NODDI metrics were observed. We conclude that microstructural detriment in the reconstructed fiber bundles along with developmental and morphological differences are likely to contribute to disadvantages in AVI in preterm individuals.
Collapse
Affiliation(s)
- Juan F Quinones
- Psychological Methods and Statistics, Department of Psychology, School of Medicine and Health Sciences, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany; Cluster of Excellence Hearing4all, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany.
| | - Andrea Hildebrandt
- Psychological Methods and Statistics, Department of Psychology, School of Medicine and Health Sciences, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany; Cluster of Excellence Hearing4all, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany; Research Center Neurosensory Science, Carl von Ossietzky Universität Oldenburg, Germany.
| | - Tommaso Pavan
- Department of Radiology, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland
| | - Christiane M Thiel
- Cluster of Excellence Hearing4all, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany; Research Center Neurosensory Science, Carl von Ossietzky Universität Oldenburg, Germany; Biological Psychology, Department of Psychology, School of Medicine and Health Sciences, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Axel Heep
- Research Center Neurosensory Science, Carl von Ossietzky Universität Oldenburg, Germany; Klinik für Neonatologie, Intensivmedizin und Kinderkardiologie, Oldenburg, Germany
| |
Collapse
|
8
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
9
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|
10
|
Butera IM, Larson ED, DeFreese AJ, Lee AK, Gifford RH, Wallace MT. Functional localization of audiovisual speech using near infrared spectroscopy. Brain Topogr 2022; 35:416-430. [PMID: 35821542 PMCID: PMC9334437 DOI: 10.1007/s10548-022-00904-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 05/19/2022] [Indexed: 11/21/2022]
Abstract
Visual cues are especially vital for hearing impaired individuals such as cochlear implant (CI) users to understand speech in noise. Functional Near Infrared Spectroscopy (fNIRS) is a light-based imaging technology that is ideally suited for measuring the brain activity of CI users due to its compatibility with both the ferromagnetic and electrical components of these implants. In a preliminary step toward better elucidating the behavioral and neural correlates of audiovisual (AV) speech integration in CI users, we designed a speech-in-noise task and measured the extent to which 24 normal hearing individuals could integrate the audio of spoken monosyllabic words with the corresponding visual signals of a female speaker. In our behavioral task, we found that audiovisual pairings provided average improvements of 103% and 197% over auditory-alone listening conditions in -6 and -9 dB signal-to-noise ratios consisting of multi-talker background noise. In an fNIRS task using similar stimuli, we measured activity during auditory-only listening, visual-only lipreading, and AV listening conditions. We identified cortical activity in all three conditions over regions of middle and superior temporal cortex typically associated with speech processing and audiovisual integration. In addition, three channels active during the lipreading condition showed uncorrected correlations associated with behavioral measures of audiovisual gain as well as with the McGurk effect. Further work focusing primarily on the regions of interest identified in this study could test how AV speech integration may differ for CI users who rely on this mechanism for daily communication.
Collapse
Affiliation(s)
- Iliza M Butera
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA.
| | - Eric D Larson
- Institute for Learning & Brain Sciences, University of Washington, Seattle Washington, USA
| | - Andrea J DeFreese
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN, USA
| | - Adrian Kc Lee
- Institute for Learning & Brain Sciences, University of Washington, Seattle Washington, USA
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington, USA
| | - René H Gifford
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN, USA
| | - Mark T Wallace
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
11
|
Bernstein LE, Jordan N, Auer ET, Eberhardt SP. Lipreading: A Review of Its Continuing Importance for Speech Recognition With an Acquired Hearing Loss and Possibilities for Effective Training. Am J Audiol 2022; 31:453-469. [PMID: 35316072 PMCID: PMC9524756 DOI: 10.1044/2021_aja-21-00112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 10/25/2021] [Accepted: 12/30/2021] [Indexed: 11/09/2022] Open
Abstract
PURPOSE The goal of this review article is to reinvigorate interest in lipreading and lipreading training for adults with acquired hearing loss. Most adults benefit from being able to see the talker when speech is degraded; however, the effect size is related to their lipreading ability, which is typically poor in adults who have experienced normal hearing through most of their lives. Lipreading training has been viewed as a possible avenue for rehabilitation of adults with an acquired hearing loss, but most training approaches have not been particularly successful. Here, we describe lipreading and theoretically motivated approaches to its training, as well as examples of successful training paradigms. We discuss some extensions to auditory-only (AO) and audiovisual (AV) speech recognition. METHOD Visual speech perception and word recognition are described. Traditional and contemporary views of training and perceptual learning are outlined. We focus on the roles of external and internal feedback and the training task in perceptual learning, and we describe results of lipreading training experiments. RESULTS Lipreading is commonly characterized as limited to viseme perception. However, evidence demonstrates subvisemic perception of visual phonetic information. Lipreading words also relies on lexical constraints, not unlike auditory spoken word recognition. Lipreading has been shown to be difficult to improve through training, but under specific feedback and task conditions, training can be successful, and learning can generalize to untrained materials, including AV sentence stimuli in noise. The results on lipreading have implications for AO and AV training and for use of acoustically processed speech in face-to-face communication. CONCLUSION Given its importance for speech recognition with a hearing loss, we suggest that the research and clinical communities integrate lipreading in their efforts to improve speech recognition in adults with acquired hearing loss.
Collapse
Affiliation(s)
- Lynne E. Bernstein
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Nicole Jordan
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Edward T. Auer
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Silvio P. Eberhardt
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| |
Collapse
|
12
|
Michon M, Zamorano-Abramson J, Aboitiz F. Faces and Voices Processing in Human and Primate Brains: Rhythmic and Multimodal Mechanisms Underlying the Evolution and Development of Speech. Front Psychol 2022; 13:829083. [PMID: 35432052 PMCID: PMC9007199 DOI: 10.3389/fpsyg.2022.829083] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 03/07/2022] [Indexed: 11/24/2022] Open
Abstract
While influential works since the 1970s have widely assumed that imitation is an innate skill in both human and non-human primate neonates, recent empirical studies and meta-analyses have challenged this view, indicating other forms of reward-based learning as relevant factors in the development of social behavior. The visual input translation into matching motor output that underlies imitation abilities instead seems to develop along with social interactions and sensorimotor experience during infancy and childhood. Recently, a new visual stream has been identified in both human and non-human primate brains, updating the dual visual stream model. This third pathway is thought to be specialized for dynamics aspects of social perceptions such as eye-gaze, facial expression and crucially for audio-visual integration of speech. Here, we review empirical studies addressing an understudied but crucial aspect of speech and communication, namely the processing of visual orofacial cues (i.e., the perception of a speaker's lips and tongue movements) and its integration with vocal auditory cues. Along this review, we offer new insights from our understanding of speech as the product of evolution and development of a rhythmic and multimodal organization of sensorimotor brain networks, supporting volitional motor control of the upper vocal tract and audio-visual voices-faces integration.
Collapse
Affiliation(s)
- Maëva Michon
- Laboratory for Cognitive and Evolutionary Neuroscience, Department of Psychiatry, Faculty of Medicine, Interdisciplinary Center for Neuroscience, Pontificia Universidad Católica de Chile, Santiago, Chile
- Centro de Estudios en Neurociencia Humana y Neuropsicología, Facultad de Psicología, Universidad Diego Portales, Santiago, Chile
| | - José Zamorano-Abramson
- Centro de Investigación en Complejidad Social, Facultad de Gobierno, Universidad del Desarrollo, Santiago, Chile
| | - Francisco Aboitiz
- Laboratory for Cognitive and Evolutionary Neuroscience, Department of Psychiatry, Faculty of Medicine, Interdisciplinary Center for Neuroscience, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
13
|
Fiber tracing and microstructural characterization among audiovisual integration brain regions in neonates compared with young adults. Neuroimage 2022; 254:119141. [PMID: 35342006 DOI: 10.1016/j.neuroimage.2022.119141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 02/23/2022] [Accepted: 03/21/2022] [Indexed: 11/23/2022] Open
Abstract
Audiovisual integration has been related with cognitive-processing and behavioral advantages, as well as with various socio-cognitive disorders. While some studies have identified brain regions instantiating this ability shortly after birth, little is known about the structural pathways connecting them. The goal of the present study was to reconstruct fiber tracts linking AVI regions in the newborn in-vivo brain and assess their adult-likeness by comparing them with analogous fiber tracts of young adults. We performed probabilistic tractography and compared connective probabilities between a sample of term-born neonates (N = 311; the Developing Human Connectome Project (dHCP, http://www.developingconnectome.org) and young adults (N = 311 The Human Connectome Project; https://www.humanconnectome.org/) by means of a classification algorithm. Furthermore, we computed Dice coefficients to assess between-group spatial similarity of the reconstructed fibers and used diffusion metrics to characterize neonates' AVI brain network in terms of microstructural properties, interhemispheric differences and the association with perinatal covariates and biological sex. Overall, our results indicate that the AVI fiber bundles were successfully reconstructed in a vast majority of neonates, similarly to adults. Connective probability distributional similarities and spatial overlaps of AVI fibers between the two groups differed across the reconstructed fibers. There was a rank-order correspondence of the fibers' connective strengths across the groups. Additionally, the study revealed patterns of diffusion metrics in line with early white matter developmental trajectories and a developmental advantage for females. Altogether, these findings deliver evidence of meaningful structural connections among AVI regions in the newborn in-vivo brain.
Collapse
|
14
|
Sandhya, Vinay, V M. Perception of Incongruent Audiovisual Speech: Distribution of Modality-Specific Responses. Am J Audiol 2021; 30:968-979. [PMID: 34499528 DOI: 10.1044/2021_aja-20-00213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
PURPOSE Multimodal sensory integration in audiovisual (AV) speech perception is a naturally occurring phenomenon. Modality-specific responses such as auditory left, auditory right, and visual responses to dichotic incongruent AV speech stimuli help in understanding AV speech processing through each input modality. It is observed that distribution of activity in the frontal motor areas involved in speech production has been shown to correlate with how subjects perceive the same syllable differently or perceive different syllables. This study investigated the distribution of modality-specific responses to dichotic incongruent AV speech stimuli by simultaneously presenting consonant-vowel (CV) syllables with different places of articulation to the participant's left and right ears and visually. DESIGN A dichotic experimental design was adopted. Six stop CV syllables /pa/, /ta/, /ka/, /ba/, /da/, and /ga/ were assembled to create dichotic incongruent AV speech material. Participants included 40 native speakers of Norwegian (20 women, M age = 22.6 years, SD = 2.43 years; 20 men, M age = 23.7 years, SD = 2.08 years). RESULTS Findings of this study showed that, under dichotic listening conditions, velar CV syllables resulted in the highest scores in the respective ears, and this might be explained by stimulus dominance of velar consonants, as shown in previous studies. However, this study, with dichotic auditory stimuli accompanied by an incongruent video segment, demonstrated that the presentation of a visually distinct video segment possibly draws attention to the video segment in some participants, thereby reducing the overall recognition of the dominant syllable. Furthermore, the findings here suggest the possibility of lesser response times to incongruent AV stimuli in females compared with males. CONCLUSION The identification of the left audio, right audio, and visual segments in dichotic incongruent AV stimuli depends on place of articulation, stimulus dominance, and voice onset time of the CV syllables.
Collapse
Affiliation(s)
- Sandhya
- Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Vinay
- Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Manchaiah, V
- Department of Speech and Hearing Sciences, Lamar University, Beaumont, TX
| |
Collapse
|
15
|
Karipidis II, Pleisch G, Di Pietro SV, Fraga-González G, Brem S. Developmental Trajectories of Letter and Speech Sound Integration During Reading Acquisition. Front Psychol 2021; 12:750491. [PMID: 34867636 PMCID: PMC8636811 DOI: 10.3389/fpsyg.2021.750491] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 10/15/2021] [Indexed: 11/22/2022] Open
Abstract
Reading acquisition in alphabetic languages starts with learning the associations between speech sounds and letters. This learning process is related to crucial developmental changes of brain regions that serve visual, auditory, multisensory integration, and higher cognitive processes. Here, we studied the development of audiovisual processing and integration of letter-speech sound pairs with an audiovisual target detection functional MRI paradigm. Using a longitudinal approach, we tested children with varying reading outcomes before the start of reading acquisition (T1, 6.5 yo), in first grade (T2, 7.5 yo), and in second grade (T3, 8.5 yo). Early audiovisual integration effects were characterized by higher activation for incongruent than congruent letter-speech sound pairs in the inferior frontal gyrus and ventral occipitotemporal cortex. Audiovisual processing in the left superior temporal gyrus significantly increased from the prereading (T1) to early reading stages (T2, T3). Region of interest analyses revealed that activation in left superior temporal gyrus (STG), inferior frontal gyrus and ventral occipitotemporal cortex increased in children with typical reading fluency skills, while poor readers did not show the same development in these regions. The incongruency effect bilaterally in parts of the STG and insular cortex at T1 was significantly associated with reading fluency skills at T3. These findings provide new insights into the development of the brain circuitry involved in audiovisual processing of letters, the building blocks of words, and reveal early markers of audiovisual integration that may be predictive of reading outcomes.
Collapse
Affiliation(s)
- Iliana I Karipidis
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Zurich, Switzerland.,Center for Interdisciplinary Brain Sciences Research, Stanford University School of Medicine, Stanford, CA, United States
| | - Georgette Pleisch
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Zurich, Switzerland
| | - Sarah V Di Pietro
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Zurich, Switzerland.,Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Gorka Fraga-González
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Zurich, Switzerland
| | - Silvia Brem
- Department of Child and Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Zurich, Switzerland.,Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland.,MR-Center of the University Hospital of Psychiatry Zurich, University of Zurich, Zurich, Switzerland
| |
Collapse
|
16
|
Kaganovich N, Christ S. Event-related potentials evidence for long-term audiovisual representations of phonemes in adults. Eur J Neurosci 2021; 54:7860-7875. [PMID: 34750895 PMCID: PMC8815308 DOI: 10.1111/ejn.15519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 11/03/2021] [Accepted: 11/05/2021] [Indexed: 10/19/2022]
Abstract
The presence of long-term auditory representations for phonemes has been well-established. However, since speech perception is typically audiovisual, we hypothesized that long-term phoneme representations may also contain information on speakers' mouth shape during articulation. We used an audiovisual oddball paradigm in which, on each trial, participants saw a face and heard one of two vowels. One vowel occurred frequently (standard), while another occurred rarely (deviant). In one condition (neutral), the face had a closed, non-articulating mouth. In the other condition (audiovisual violation), the mouth shape matched the frequent vowel. Although in both conditions stimuli were audiovisual, we hypothesized that identical auditory changes would be perceived differently by participants. Namely, in the neutral condition, deviants violated only the audiovisual pattern specific to each block. By contrast, in the audiovisual violation condition, deviants additionally violated long-term representations for how a speaker's mouth looks during articulation. We compared the amplitude of mismatch negativity (MMN) and P3 components elicited by deviants in the two conditions. The MMN extended posteriorly over temporal and occipital sites even though deviants contained no visual changes, suggesting that deviants were perceived as interruptions in audiovisual, rather than auditory only, sequences. As predicted, deviants elicited larger MMN and P3 in the audiovisual violation compared to the neutral condition. The results suggest that long-term representations of phonemes are indeed audiovisual.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana, USA
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Sharon Christ
- Department of Human Development and Family Studies, Purdue University, West Lafayette, Indiana, USA
- Department of Statistics, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
17
|
Karthik G, Plass J, Beltz AM, Liu Z, Grabowecky M, Suzuki S, Stacey WC, Wasade VS, Towle VL, Tao JX, Wu S, Issa NP, Brang D. Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex. Eur J Neurosci 2021; 54:7301-7317. [PMID: 34587350 DOI: 10.1111/ejn.15482] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 08/20/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022]
Abstract
Speech perception is a central component of social communication. Although principally an auditory process, accurate speech perception in everyday settings is supported by meaningful information extracted from visual cues. Visual speech modulates activity in cortical areas subserving auditory speech perception including the superior temporal gyrus (STG). However, it is unknown whether visual modulation of auditory processing is a unitary phenomenon or, rather, consists of multiple functionally distinct processes. To explore this question, we examined neural responses to audiovisual speech measured from intracranially implanted electrodes in 21 patients with epilepsy. We found that visual speech modulated auditory processes in the STG in multiple ways, eliciting temporally and spatially distinct patterns of activity that differed across frequency bands. In the theta band, visual speech suppressed the auditory response from before auditory speech onset to after auditory speech onset (-93 to 500 ms) most strongly in the posterior STG. In the beta band, suppression was seen in the anterior STG from -311 to -195 ms before auditory speech onset and in the middle STG from -195 to 235 ms after speech onset. In high gamma, visual speech enhanced the auditory response from -45 to 24 ms only in the posterior STG. We interpret the visual-induced changes prior to speech onset as reflecting crossmodal prediction of speech signals. In contrast, modulations after sound onset may reflect a decrease in sustained feedforward auditory activity. These results are consistent with models that posit multiple distinct mechanisms supporting audiovisual speech perception.
Collapse
Affiliation(s)
- G Karthik
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - John Plass
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - Adriene M Beltz
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| | - Zhongming Liu
- Department of Biomedical Engineering and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, USA
| | - Marcia Grabowecky
- Department of Psychology, Northwestern University, Evanston, Illinois, USA
| | - Satoru Suzuki
- Department of Psychology, Northwestern University, Evanston, Illinois, USA
| | - William C Stacey
- Department of Neurology and Department of Biomedical Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Vibhangini S Wasade
- Department of Neurology, Henry Ford Hospital, Detroit, Michigan, USA.,Department of Neurology, Wayne State University School of Medicine, Detroit, Michigan, USA
| | - Vernon L Towle
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - James X Tao
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - Shasha Wu
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - Naoum P Issa
- Department of Neurology, The University of Chicago, Chicago, Illinois, USA
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
18
|
Kaganovich N, Schumaker J, Christ S. Impaired Audiovisual Representation of Phonemes in Children with Developmental Language Disorder. Brain Sci 2021; 11:brainsci11040507. [PMID: 33923647 PMCID: PMC8073635 DOI: 10.3390/brainsci11040507] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 04/06/2021] [Accepted: 04/10/2021] [Indexed: 11/25/2022] Open
Abstract
We examined whether children with developmental language disorder (DLD) differed from their peers with typical development (TD) in the degree to which they encode information about a talker’s mouth shape into long-term phonemic representations. Children watched a talker’s face and listened to rare changes from [i] to [u] or the reverse. In the neutral condition, the talker’s face had a closed mouth throughout. In the audiovisual violation condition, the mouth shape always matched the frequent vowel, even when the rare vowel was played. We hypothesized that in the neutral condition no long-term audiovisual memory traces for speech sounds would be activated. Therefore, the neural response elicited by deviants would reflect only a violation of the observed audiovisual sequence. In contrast, we expected that in the audiovisual violation condition, a long-term memory trace for the speech sound/lip configuration typical for the frequent vowel would be activated. In this condition then, the neural response elicited by rare sound changes would reflect a violation of not only observed audiovisual patterns but also of a long-term memory representation for how a given vowel looks when articulated. Children pressed a response button whenever they saw a talker’s face assume a silly expression. We found that in children with TD, rare auditory changes produced a significant mismatch negativity (MMN) event-related potential (ERP) component over the posterior scalp in the audiovisual violation condition but not in the neutral condition. In children with DLD, no MMN was present in either condition. Rare vowel changes elicited a significant P3 in both groups and conditions, indicating that all children noticed auditory changes. Our results suggest that children with TD, but not children with DLD, incorporate visual information into long-term phonemic representations and detect violations in audiovisual phonemic congruency even when they perform a task that is unrelated to phonemic processing.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, USA;
- Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2038, USA
- Correspondence: ; Tel.: +1-(765)-494-4233; Fax: +1-(765)-494-0771
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, USA;
| | - Sharon Christ
- Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN 47907-2066, USA;
- Department of Human Development and Family Studies, Purdue University, 1202 West State Street, West Lafayette, IN 47907-2055, USA
| |
Collapse
|
19
|
Walker GM, Rollo PS, Tandon N, Hickok G. Effect of Bilateral Opercular Syndrome on Speech Perception. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2021; 2:335-353. [PMID: 37213256 PMCID: PMC10158595 DOI: 10.1162/nol_a_00037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 03/23/2021] [Indexed: 05/23/2023]
Abstract
Speech perception ability and structural neuroimaging were investigated in two cases of bilateral opercular syndrome. Due to bilateral ablation of the motor control center for the lower face and surrounds, these rare cases provide an opportunity to evaluate the necessity of cortical motor representations for speech perception, a cornerstone of some neurocomputational theories of language processing. Speech perception, including audiovisual integration (i.e., the McGurk effect), was mostly unaffected in these cases, although verbal short-term memory impairment hindered performance on several tasks that are traditionally used to evaluate speech perception. The results suggest that the role of the cortical motor system in speech perception is context-dependent and supplementary, not inherent or necessary.
Collapse
Affiliation(s)
- Grant M. Walker
- Department of Cognitive Sciences, University of California, Irvine
- * Corresponding Author:
| | | | - Nitin Tandon
- Department of Neurosurgery, University of Texas Medical School at Houston
| | - Gregory Hickok
- Department of Cognitive Sciences, University of California, Irvine
- Department of Language Science, University of California, Irvine
| |
Collapse
|
20
|
Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure. Sci Rep 2020; 10:18009. [PMID: 33093570 PMCID: PMC7583249 DOI: 10.1038/s41598-020-75201-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 10/05/2020] [Indexed: 11/08/2022] Open
Abstract
In face-to-face communication, audio-visual (AV) stimuli can be fused, combined or perceived as mismatching. While the left superior temporal sulcus (STS) is presumably the locus of AV integration, the process leading to combination is unknown. Based on previous modelling work, we hypothesize that combination results from a complex dynamic originating in a failure to integrate AV inputs, followed by a reconstruction of the most plausible AV sequence. In two different behavioural tasks and one MEG experiment, we observed that combination is more time demanding than fusion. Using time-/source-resolved human MEG analyses with linear and dynamic causal models, we show that both fusion and combination involve early detection of AV incongruence in the STS, whereas combination is further associated with enhanced activity of AV asynchrony-sensitive regions (auditory and inferior frontal cortices). Based on neural signal decoding, we finally show that only combination can be decoded from the IFG activity and that combination is decoded later than fusion in the STS. These results indicate that the AV speech integration outcome primarily depends on whether the STS converges or not onto an existing multimodal syllable representation, and that combination results from subsequent temporal processing, presumably the off-line re-ordering of incongruent AV stimuli.
Collapse
|
21
|
Keitel A, Gross J, Kayser C. Shared and modality-specific brain regions that mediate auditory and visual word comprehension. eLife 2020; 9:e56972. [PMID: 32831168 PMCID: PMC7470824 DOI: 10.7554/elife.56972] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 08/18/2020] [Indexed: 12/22/2022] Open
Abstract
Visual speech carried by lip movements is an integral part of communication. Yet, it remains unclear in how far visual and acoustic speech comprehension are mediated by the same brain regions. Using multivariate classification of full-brain MEG data, we first probed where the brain represents acoustically and visually conveyed word identities. We then tested where these sensory-driven representations are predictive of participants' trial-wise comprehension. The comprehension-relevant representations of auditory and visual speech converged only in anterior angular and inferior frontal regions and were spatially dissociated from those representations that best reflected the sensory-driven word identity. These results provide a neural explanation for the behavioural dissociation of acoustic and visual speech comprehension and suggest that cerebral representations encoding word identities may be more modality-specific than often upheld.
Collapse
Affiliation(s)
- Anne Keitel
- Psychology, University of DundeeDundeeUnited Kingdom
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
- Institute for Biomagnetism and Biosignalanalysis, University of MünsterMünsterGermany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld UniversityBielefeldGermany
| |
Collapse
|
22
|
Michaelis K, Erickson LC, Fama ME, Skipper-Kallal LM, Xing S, Lacey EH, Anbari Z, Norato G, Rauschecker JP, Turkeltaub PE. Effects of age and left hemisphere lesions on audiovisual integration of speech. BRAIN AND LANGUAGE 2020; 206:104812. [PMID: 32447050 PMCID: PMC7379161 DOI: 10.1016/j.bandl.2020.104812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/02/2020] [Accepted: 05/04/2020] [Indexed: 06/11/2023]
Abstract
Neuroimaging studies have implicated left temporal lobe regions in audiovisual integration of speech and inferior parietal regions in temporal binding of incoming signals. However, it remains unclear which regions are necessary for audiovisual integration, especially when the auditory and visual signals are offset in time. Aging also influences integration, but the nature of this influence is unresolved. We used a McGurk task to test audiovisual integration and sensitivity to the timing of audiovisual signals in two older adult groups: left hemisphere stroke survivors and controls. We observed a positive relationship between age and audiovisual speech integration in both groups, and an interaction indicating that lesions reduce sensitivity to timing offsets between signals. Lesion-symptom mapping demonstrated that damage to the left supramarginal gyrus and planum temporale reduces temporal acuity in audiovisual speech perception. This suggests that a process mediated by these structures identifies asynchronous audiovisual signals that should not be integrated.
Collapse
Affiliation(s)
- Kelly Michaelis
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Laura C Erickson
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Neuroscience Department, Georgetown University Medical Center, Washington DC, USA
| | - Mackenzie E Fama
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Department of Speech-Language Pathology & Audiology, Towson University, Towson, MD, USA
| | - Laura M Skipper-Kallal
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Shihui Xing
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Department of Neurology, First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Elizabeth H Lacey
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Research Division, MedStar National Rehabilitation Hospital, Washington DC, USA
| | - Zainab Anbari
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Gina Norato
- Clinical Trials Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Josef P Rauschecker
- Neuroscience Department, Georgetown University Medical Center, Washington DC, USA
| | - Peter E Turkeltaub
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Research Division, MedStar National Rehabilitation Hospital, Washington DC, USA.
| |
Collapse
|
23
|
Randazzo M, Priefer R, Smith PJ, Nagler A, Avery T, Froud K. Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm. Brain Sci 2020; 10:brainsci10060328. [PMID: 32481538 PMCID: PMC7348766 DOI: 10.3390/brainsci10060328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/15/2020] [Accepted: 05/25/2020] [Indexed: 11/16/2022] Open
Abstract
The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.
Collapse
Affiliation(s)
- Melissa Randazzo
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
- Correspondence: ; Tel.: +1-516-877-4769
| | - Ryan Priefer
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Paul J. Smith
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Amanda Nagler
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Trey Avery
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Karen Froud
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| |
Collapse
|
24
|
Rennig J, Wegner-Clemens K, Beauchamp MS. Face viewing behavior predicts multisensory gain during speech perception. Psychon Bull Rev 2020; 27:70-77. [PMID: 31845209 PMCID: PMC7004844 DOI: 10.3758/s13423-019-01665-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Visual information from the face of an interlocutor complements auditory information from their voice, enhancing intelligibility. However, there are large individual differences in the ability to comprehend noisy audiovisual speech. Another axis of individual variability is the extent to which humans fixate the mouth or the eyes of a viewed face. We speculated that across a lifetime of face viewing, individuals who prefer to fixate the mouth of a viewed face might accumulate stronger associations between visual and auditory speech, resulting in improved comprehension of noisy audiovisual speech. To test this idea, we assessed interindividual variability in two tasks. Participants (n = 102) varied greatly in their ability to understand noisy audiovisual sentences (accuracy from 2-58%) and in the time they spent fixating the mouth of a talker enunciating clear audiovisual syllables (3-98% of total time). These two variables were positively correlated: a 10% increase in time spent fixating the mouth equated to a 5.6% increase in multisensory gain. This finding demonstrates an unexpected link, mediated by histories of visual exposure, between two fundamental human abilities: processing faces and understanding speech.
Collapse
Affiliation(s)
- Johannes Rennig
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, 1 Baylor Plaza Suite S104, Houston, TX, 77030, USA
| | - Kira Wegner-Clemens
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, 1 Baylor Plaza Suite S104, Houston, TX, 77030, USA
| | - Michael S Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, 1 Baylor Plaza Suite S104, Houston, TX, 77030, USA.
| |
Collapse
|
25
|
Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019; 8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open
Abstract
Visual information about speech content from the talker's mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.
Collapse
Affiliation(s)
- Patrick J Karas
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - John F Magnotti
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Brian A Metzger
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Lin L Zhu
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Kristen B Smith
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Daniel Yoshor
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | | |
Collapse
|
26
|
Basirat A, Allart É, Brunellière A, Martin Y. Audiovisual speech segmentation in post-stroke aphasia: a pilot study. Top Stroke Rehabil 2019; 26:588-594. [PMID: 31369358 DOI: 10.1080/10749357.2019.1643566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Background: Stroke may cause sentence comprehension disorders. Speech segmentation, i.e. the ability to detect word boundaries while listening to continuous speech, is an initial step allowing the successful identification of words and the accurate understanding of meaning within sentences. It has received little attention in people with post-stroke aphasia (PWA).Objectives: Our goal was to study speech segmentation in PWA and examine the potential benefit of seeing the speakers' articulatory gestures while segmenting sentences.Methods: Fourteen PWA and twelve healthy controls participated in this pilot study. Performance was measured with a word-monitoring task. In the auditory-only modality, participants were presented with auditory-only stimuli while in the audiovisual modality, visual speech cues (i.e. speaker's articulatory gestures) accompanied the auditory input. The proportion of correct responses was calculated for each participant and each modality. Visual enhancement was then calculated in order to estimate the potential benefit of seeing the speaker's articulatory gestures.Results: Both in auditory-only and audiovisual modalities, PWA performed significantly less well than controls, who had 100% correct performance in both modalities. The performance of PWA was correlated with their phonological ability. Six PWA used the visual cues. Group level analysis performed on PWA did not show any reliable difference between the auditory-only and audiovisual modalities (median of visual enhancement = 7% [Q1 - Q3: -5 - 39]).Conclusion: Our findings show that speech segmentation disorder may exist in PWA. This points to the importance of assessing and training speech segmentation after stroke. Further studies should investigate the characteristics of PWA who use visual speech cues during sentence processing.
Collapse
Affiliation(s)
- Anahita Basirat
- UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Univ. Lille, CNRS, CHU Lille, Lille, France
| | - Étienne Allart
- Neurorehabilitation Unit, Lille University Medical Center, Lille, France.,Inserm U1171, University Lille, Degenerative and Vascular Cognitive Disorders, Lille, France
| | - Angèle Brunellière
- UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Univ. Lille, CNRS, CHU Lille, Lille, France
| | | |
Collapse
|
27
|
Audiovisual Lexical Retrieval Deficits Following Left Hemisphere Stroke. Brain Sci 2018; 8:brainsci8120206. [PMID: 30486517 PMCID: PMC6316523 DOI: 10.3390/brainsci8120206] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Revised: 11/18/2018] [Accepted: 11/27/2018] [Indexed: 11/27/2022] Open
Abstract
Binding sensory features of multiple modalities of what we hear and see allows formation of a coherent percept to access semantics. Previous work on object naming has focused on visual confrontation naming with limited research in nonverbal auditory or multisensory processing. To investigate neural substrates and sensory effects of lexical retrieval, we evaluated healthy adults (n = 118) and left hemisphere stroke patients (LHD, n = 42) in naming manipulable objects across auditory (sound), visual (picture), and multisensory (audiovisual) conditions. LHD patients were divided into cortical, cortical–subcortical, or subcortical lesions (CO, CO–SC, SC), and specific lesion location investigated in a predictive model. Subjects produced lower accuracy in auditory naming relative to other conditions. Controls demonstrated greater naming accuracy and faster reaction times across all conditions compared to LHD patients. Naming across conditions was most severely impaired in CO patients. Both auditory and visual naming accuracy were impacted by temporal lobe involvement, although auditory naming was sensitive to lesions extending subcortically. Only controls demonstrated significant improvement over visual naming with the addition of auditory cues (i.e., multisensory condition). Results support overlapping neural networks for visual and auditory modalities related to semantic integration in lexical retrieval and temporal lobe involvement, while multisensory integration was impacted by both occipital and temporal lobe lesion involvement. The findings support modality specificity in naming and suggest that auditory naming is mediated by a distributed cortical–subcortical network overlapping with networks mediating spatiotemporal aspects of skilled movements producing sound.
Collapse
|