1
|
Salagovic CA, Stevenson RA, Butler BE. Behavioral Response Modeling to Resolve Listener- and Stimulus-Related Influences on Audiovisual Speech Integration in Cochlear Implant Users. Ear Hear 2024:00003446-990000000-00372. [PMID: 39660814 DOI: 10.1097/aud.0000000000001607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2024]
Abstract
OBJECTIVES Speech intelligibility is supported by the sound of a talker's voice and visual cues related to articulatory movements. The relative contribution of auditory and visual cues to an integrated audiovisual percept varies depending on a listener's environment and sensory acuity. Cochlear implant users rely more on visual cues than those with acoustic hearing to help compensate for the fact that the auditory signal produced by their implant is poorly resolved relative to that of the typically developed cochlea. The relative weight placed on auditory and visual speech cues can be measured by presenting discordant cues across the two modalities and assessing the resulting percept (the McGurk effect). The current literature is mixed with regards to how cochlear implant users respond to McGurk stimuli; some studies suggest they report hearing syllables that represent a fusion of the auditory and visual cues more frequently than typical hearing controls while others report less frequent fusion. However, several of these studies compared implant users to younger control samples despite evidence that the likelihood and strength of audiovisual integration increase with age. Thus, the present study sought to clarify the impacts of hearing status and age on multisensory speech integration using a combination of behavioral analyses and response modeling. DESIGN Cochlear implant users (mean age = 58.9 years), age-matched controls (mean age = 61.5 years), and younger controls (mean age = 25.9 years) completed an online audiovisual speech task. Participants were shown and/or heard four different talkers producing syllables in auditory-alone, visual-alone, and incongruent audiovisual conditions. After each trial, participants reported the syllable they heard or saw from a list of four possible options. RESULTS The younger and older control groups performed similarly in both unisensory conditions. The cochlear implant users performed significantly better than either control group in the visual-alone condition. When responding to the incongruent audiovisual trials, cochlear implant users and age-matched controls experienced significantly more fusion than younger controls. When fusion was not experienced, younger controls were more likely to report the auditorily presented syllable than either implant users or age-matched controls. Conversely, implant users were more likely to report the visually presented syllable than either age-matched controls or younger controls. Modeling of the relationship between stimuli and behavioral responses revealed that younger controls had lower disparity thresholds (i.e., were less likely to experience a fused audiovisual percept) than either the implant users or older controls, while implant users had higher levels of sensory noise (i.e., more variability in the way a given stimulus pair is perceived across multiple presentations) than age-matched controls. CONCLUSIONS Our findings suggest that age and cochlear implantation may have independent effects on McGurk effect perception. Noisy encoding of disparity modeling confirms that age is a strong predictor of an individual's prior likelihood of experiencing audiovisual integration but suggests that hearing status modulates this relationship due to differences in sensory noise during speech encoding. Together, these findings demonstrate that different groups of listeners can arrive at similar levels of performance in different ways, and highlight the need for careful consideration of stimulus- and group-related effects on multisensory speech perception.
Collapse
Affiliation(s)
- Cailey A Salagovic
- Graduate Program in Psychology, University of Western Ontario, London, Ontario, Canada
| | - Ryan A Stevenson
- Department of Psychology, University of Western Ontario, London, Ontario, Canada
- Western Institute for Neuroscience, University of Western Ontario, London, Ontario, Canada
| | - Blake E Butler
- Department of Psychology, University of Western Ontario, London, Ontario, Canada
- Western Institute for Neuroscience, University of Western Ontario, London, Ontario, Canada
- National Centre for Audiology, University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
2
|
Wen X, Li G, Wang X, Hu X, Yang H. Modulation of audiovisual integration in the left and right sides: effects of side and spatial coherency. BMC Neurosci 2024; 25:40. [PMID: 39192193 DOI: 10.1186/s12868-024-00889-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 08/16/2024] [Indexed: 08/29/2024] Open
Abstract
BACKGROUND Using event-related potentials (ERPs), we aimed to investigate audiovisual integration neural mechanisms during a letter identification task in the left and right sides. Unimodal (A,V) and bimodal (AV) stimuli were presented on either side, with ERPs from unimodal (A,V) stimuli on the same side being compared to those from simultaneous bimodal stimuli (AV). Non-zero results of the AV-(A + V) difference waveforms indicated audiovisual integration on the left/right side. RESULTS When spatially coherent AV stimuli were presented on the right side, two significant ERP components in the integrated differential wave were noted. The N134 and N262, present in the first 300 ms of the AV-(A + V) integration difference wave, indicated significant audiovisual integration effects. However, when these stimuli were presented on the left side, there were no significant integration components. This audiovisual integration difference may stem from left/right asymmetry of cerebral hemisphere language processing. CONCLUSIONS Audiovisual letter information presented on the right side was easier to integrate, process, and represent. Additionally, only one significant integrative component peaked at 140 ms in the parietal cortex for spatially non-coherent AV stimuli and provided audiovisual multisensory integration, which could be attributed to some integrative neural processes that depend on the spatial congruity of the auditory and visual stimuli.
Collapse
Affiliation(s)
- XiaoHui Wen
- Hunan University of Humanities, Science and Technology, Loudi, 417000, China.
| | - GuoQiang Li
- Hunan University of Humanities, Science and Technology, Loudi, 417000, China
| | - XuHong Wang
- Hunan University of Humanities, Science and Technology, Loudi, 417000, China
| | - XiaoLan Hu
- Hunan University of Humanities, Science and Technology, Loudi, 417000, China
| | - HongJun Yang
- Hunan University of Humanities, Science and Technology, Loudi, 417000, China
| |
Collapse
|
3
|
Jertberg RM, Begeer S, Geurts HM, Chakrabarti B, Van der Burg E. Age, not autism, influences multisensory integration of speech stimuli among adults in a McGurk/MacDonald paradigm. Eur J Neurosci 2024; 59:2979-2994. [PMID: 38570828 DOI: 10.1111/ejn.16319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/27/2024] [Accepted: 02/28/2024] [Indexed: 04/05/2024]
Abstract
Differences between autistic and non-autistic individuals in perception of the temporal relationships between sights and sounds are theorized to underlie difficulties in integrating relevant sensory information. These, in turn, are thought to contribute to problems with speech perception and higher level social behaviour. However, the literature establishing this connection often involves limited sample sizes and focuses almost entirely on children. To determine whether these differences persist into adulthood, we compared 496 autistic and 373 non-autistic adults (aged 17 to 75 years). Participants completed an online version of the McGurk/MacDonald paradigm, a multisensory illusion indicative of the ability to integrate audiovisual speech stimuli. Audiovisual asynchrony was manipulated, and participants responded both to the syllable they perceived (revealing their susceptibility to the illusion) and to whether or not the audio and video were synchronized (allowing insight into temporal processing). In contrast with prior research with smaller, younger samples, we detected no evidence of impaired temporal or multisensory processing in autistic adults. Instead, we found that in both groups, multisensory integration correlated strongly with age. This contradicts prior presumptions that differences in multisensory perception persist and even increase in magnitude over the lifespan of autistic individuals. It also suggests that the compensatory role multisensory integration may play as the individual senses decline with age is intact. These findings challenge existing theories and provide an optimistic perspective on autistic development. They also underline the importance of expanding autism research to better reflect the age range of the autistic population.
Collapse
Affiliation(s)
- Robert M Jertberg
- Department of Clinical and Developmental Psychology, Vrije Universiteit Amsterdam, The Netherlands and Amsterdam Public Health Research Institute, Amsterdam, Netherlands
| | - Sander Begeer
- Department of Clinical and Developmental Psychology, Vrije Universiteit Amsterdam, The Netherlands and Amsterdam Public Health Research Institute, Amsterdam, Netherlands
| | - Hilde M Geurts
- Dutch Autism and ADHD Research Center (d'Arc), Brain & Cognition, Department of Psychology, Universiteit van Amsterdam, Amsterdam, The Netherlands
- Leo Kannerhuis (Youz/Parnassiagroup), Den Haag, The Netherlands
| | - Bhismadev Chakrabarti
- Centre for Autism, School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
- India Autism Center, Kolkata, India
- Department of Psychology, Ashoka University, Sonipat, India
| | - Erik Van der Burg
- Dutch Autism and ADHD Research Center (d'Arc), Brain & Cognition, Department of Psychology, Universiteit van Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
4
|
Magnotti JF, Lado A, Zhang Y, Maasø A, Nath A, Beauchamp MS. Repeatedly experiencing the McGurk effect induces long-lasting changes in auditory speech perception. COMMUNICATIONS PSYCHOLOGY 2024; 2:25. [PMID: 39242734 PMCID: PMC11332120 DOI: 10.1038/s44271-024-00073-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 03/07/2024] [Indexed: 09/09/2024]
Abstract
In the McGurk effect, presentation of incongruent auditory and visual speech evokes a fusion percept different than either component modality. We show that repeatedly experiencing the McGurk effect for 14 days induces a change in auditory-only speech perception: the auditory component of the McGurk stimulus begins to evoke the fusion percept, even when presented on its own without accompanying visual speech. This perceptual change, termed fusion-induced recalibration (FIR), was talker-specific and syllable-specific and persisted for a year or more in some participants without any additional McGurk exposure. Participants who did not experience the McGurk effect did not experience FIR, showing that recalibration was driven by multisensory prediction error. A causal inference model of speech perception incorporating multisensory cue conflict accurately predicted individual differences in FIR. Just as the McGurk effect demonstrates that visual speech can alter the perception of auditory speech, FIR shows that these alterations can persist for months or years. The ability to induce seemingly permanent changes in auditory speech perception will be useful for studying plasticity in brain networks for language and may provide new strategies for improving language learning.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anastasia Lado
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yue Zhang
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Arnt Maasø
- Institute for Media and Communications, University of Oslo, Oslo, Norway
| | - Audrey Nath
- Department of Neurosurgery, University of Texas Medical Branch, Galveston, TX, USA
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
5
|
Dong C, Noppeney U, Wang S. Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli. Hum Brain Mapp 2024; 45:e26653. [PMID: 38488460 DOI: 10.1002/hbm.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
Face-to-face communication relies on the integration of acoustic speech signals with the corresponding facial articulations. In the McGurk illusion, an auditory /ba/ phoneme presented simultaneously with a facial articulation of a /ga/ (i.e., viseme), is typically fused into an illusory 'da' percept. Despite its widespread use as an index of audiovisual speech integration, critics argue that it arises from perceptual processes that differ categorically from natural speech recognition. Conversely, Bayesian theoretical frameworks suggest that both the illusory McGurk and the veridical audiovisual congruent speech percepts result from probabilistic inference based on noisy sensory signals. According to these models, the inter-sensory conflict in McGurk stimuli may only increase observers' perceptual uncertainty. This functional magnetic resonance imaging (fMRI) study presented participants (20 male and 24 female) with audiovisual congruent, McGurk (i.e., auditory /ba/ + visual /ga/), and incongruent (i.e., auditory /ga/ + visual /ba/) stimuli along with their unisensory counterparts in a syllable categorization task. Behaviorally, observers' response entropy was greater for McGurk compared to congruent audiovisual stimuli. At the neural level, McGurk stimuli increased activations in a widespread neural system, extending from the inferior frontal sulci (IFS) to the pre-supplementary motor area (pre-SMA) and insulae, typically involved in cognitive control processes. Crucially, in line with Bayesian theories these activation increases were fully accounted for by observers' perceptual uncertainty as measured by their response entropy. Our findings suggest that McGurk and congruent speech processing rely on shared neural mechanisms, thereby supporting the McGurk illusion as a valid measure of natural audiovisual speech perception.
Collapse
Affiliation(s)
- Chenjie Dong
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Uta Noppeney
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Suiping Wang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
| |
Collapse
|
6
|
Lee HH, Groves K, Ripollés P, Carrasco M. Audiovisual integration in the McGurk effect is impervious to music training. Sci Rep 2024; 14:3262. [PMID: 38332159 PMCID: PMC10853564 DOI: 10.1038/s41598-024-53593-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 02/01/2024] [Indexed: 02/10/2024] Open
Abstract
The McGurk effect refers to an audiovisual speech illusion where the discrepant auditory and visual syllables produce a fused percept between the visual and auditory component. However, little is known about how individual differences contribute to the McGurk effect. Here, we examined whether music training experience-which involves audiovisual integration-can modulate the McGurk effect. Seventy-three participants completed the Goldsmiths Musical Sophistication Index (Gold-MSI) questionnaire to evaluate their music expertise on a continuous scale. Gold-MSI considers participants' daily-life exposure to music learning experiences (formal and informal), instead of merely classifying people into different groups according to how many years they have been trained in music. Participants were instructed to report, via a 3-alternative forced choice task, "what a person said": /Ba/, /Ga/ or /Da/. The experiment consisted of 96 audiovisual congruent trials and 96 audiovisual incongruent (McGurk) trials. We observed no significant correlations between the susceptibility of the McGurk effect and the different subscales of the Gold-MSI (active engagement, perceptual abilities, music training, singing abilities, emotion) or the general musical sophistication composite score. Together, these findings suggest that music training experience does not modulate audiovisual integration in speech as reflected by the McGurk effect.
Collapse
Affiliation(s)
- Hsing-Hao Lee
- Department of Psychology, New York University, New York, USA.
| | - Karleigh Groves
- Department of Psychology, New York University, New York, USA
- Center for Language, Music, and Emotion (CLaME), New York University, New York, USA
- Music and Audio Research Lab (MARL), New York University, New York, USA
| | - Pablo Ripollés
- Department of Psychology, New York University, New York, USA
- Center for Language, Music, and Emotion (CLaME), New York University, New York, USA
- Music and Audio Research Lab (MARL), New York University, New York, USA
| | - Marisa Carrasco
- Department of Psychology, New York University, New York, USA
- Center for Neural Science, New York University, New York, USA
| |
Collapse
|
7
|
Gijbels L, Lee AKC, Yeatman JD. Children with developmental dyslexia have equivalent audiovisual speech perception performance but their perceptual weights differ. Dev Sci 2024; 27:e13431. [PMID: 37403418 DOI: 10.1111/desc.13431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/18/2023] [Accepted: 06/19/2023] [Indexed: 07/06/2023]
Abstract
As reading is inherently a multisensory, audiovisual (AV) process where visual symbols (i.e., letters) are connected to speech sounds, the question has been raised whether individuals with reading difficulties, like children with developmental dyslexia (DD), have broader impairments in multisensory processing. This question has been posed before, yet it remains unanswered due to (a) the complexity and contentious etiology of DD along with (b) lack of consensus on developmentally appropriate AV processing tasks. We created an ecologically valid task for measuring multisensory AV processing by leveraging the natural phenomenon that speech perception improves when listeners are provided visual information from mouth movements (particularly when the auditory signal is degraded). We designed this AV processing task with low cognitive and linguistic demands such that children with and without DD would have equal unimodal (auditory and visual) performance. We then collected data in a group of 135 children (age 6.5-15) with an AV speech perception task to answer the following questions: (1) How do AV speech perception benefits manifest in children, with and without DD? (2) Do children all use the same perceptual weights to create AV speech perception benefits, and (3) what is the role of phonological processing in AV speech perception? We show that children with and without DD have equal AV speech perception benefits on this task, but that children with DD rely less on auditory processing in more difficult listening situations to create these benefits and weigh both incoming information streams differently. Lastly, any reported differences in speech perception in children with DD might be better explained by differences in phonological processing than differences in reading skills. RESEARCH HIGHLIGHTS: Children with versus without developmental dyslexia have equal audiovisual speech perception benefits, regardless of their phonological awareness or reading skills. Children with developmental dyslexia rely less on auditory performance to create audiovisual speech perception benefits. Individual differences in speech perception in children might be better explained by differences in phonological processing than differences in reading skills.
Collapse
Affiliation(s)
- Liesbeth Gijbels
- Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington, USA
- University of Washington, Institute for Learning & Brain Sciences, Seattle, Washington, USA
| | - Adrian K C Lee
- Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington, USA
- University of Washington, Institute for Learning & Brain Sciences, Seattle, Washington, USA
| | - Jason D Yeatman
- Division of Developmental-Behavioral Pediatrics, Stanford University School of Medicine, Stanford, California, USA
- Stanford University Graduate School of Education, Stanford, California, USA
- Stanford University Department of Psychology, Stanford, California, USA
| |
Collapse
|
8
|
Ahn E, Majumdar A, Lee T, Brang D. Evidence for a Causal Dissociation of the McGurk Effect and Congruent Audiovisual Speech Perception via TMS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568892. [PMID: 38077093 PMCID: PMC10705272 DOI: 10.1101/2023.11.27.568892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Congruent visual speech improves speech perception accuracy, particularly in noisy environments. Conversely, mismatched visual speech can alter what is heard, leading to an illusory percept known as the McGurk effect. This illusion has been widely used to study audiovisual speech integration, illustrating that auditory and visual cues are combined in the brain to generate a single coherent percept. While prior transcranial magnetic stimulation (TMS) and neuroimaging studies have identified the left posterior superior temporal sulcus (pSTS) as a causal region involved in the generation of the McGurk effect, it remains unclear whether this region is critical only for this illusion or also for the more general benefits of congruent visual speech (e.g., increased accuracy and faster reaction times). Indeed, recent correlative research suggests that the benefits of congruent visual speech and the McGurk effect reflect largely independent mechanisms. To better understand how these different features of audiovisual integration are causally generated by the left pSTS, we used single-pulse TMS to temporarily impair processing while subjects were presented with either incongruent (McGurk) or congruent audiovisual combinations. Consistent with past research, we observed that TMS to the left pSTS significantly reduced the strength of the McGurk effect. Importantly, however, left pSTS stimulation did not affect the positive benefits of congruent audiovisual speech (increased accuracy and faster reaction times), demonstrating a causal dissociation between the two processes. Our results are consistent with models proposing that the pSTS is but one of multiple critical areas supporting audiovisual speech interactions. Moreover, these data add to a growing body of evidence suggesting that the McGurk effect is an imperfect surrogate measure for more general and ecologically valid audiovisual speech behaviors.
Collapse
Affiliation(s)
- EunSeon Ahn
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Areti Majumdar
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Taraz Lee
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
9
|
Moro SS, Qureshi FA, Steeves JKE. Perception of the McGurk effect in people with one eye depends on whether the eye is removed during infancy or adulthood. Front Neurosci 2023; 17:1217831. [PMID: 37901426 PMCID: PMC10603249 DOI: 10.3389/fnins.2023.1217831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 09/25/2023] [Indexed: 10/31/2023] Open
Abstract
Background The visual system is not fully mature at birth and continues to develop throughout infancy until it reaches adult levels through late childhood and adolescence. Disruption of vision during this postnatal period and prior to visual maturation results in deficits of visual processing and in turn may affect the development of complementary senses. Studying people who have had one eye surgically removed during early postnatal development is a useful model for understanding timelines of sensory development and the role of binocularity in visual system maturation. Adaptive auditory and audiovisual plasticity following the loss of one eye early in life has been observed for both low-and high-level visual stimuli. Notably, people who have had one eye removed early in life perceive the McGurk effect much less than binocular controls. Methods The current study investigates whether multisensory compensatory mechanisms are also present in people who had one eye removed late in life, after postnatal visual system maturation, by measuring whether they perceive the McGurk effect compared to binocular controls and people who have had one eye removed early in life. Results People who had one eye removed late in life perceived the McGurk effect similar to binocular viewing controls, unlike those who had one eye removed early in life. Conclusion This suggests differences in multisensory compensatory mechanisms based on age at surgical eye removal. These results indicate that cross-modal adaptations for the loss of binocularity may be dependent on plasticity levels during cortical development.
Collapse
Affiliation(s)
- Stefania S. Moro
- Department of Psychology and Centre for Vision Research, York University, Toronto, ON, Canada
- The Hospital for Sick Children, Toronto, ON, Canada
| | - Faizaan A. Qureshi
- Department of Psychology and Centre for Vision Research, York University, Toronto, ON, Canada
| | - Jennifer K. E. Steeves
- Department of Psychology and Centre for Vision Research, York University, Toronto, ON, Canada
- The Hospital for Sick Children, Toronto, ON, Canada
| |
Collapse
|
10
|
Tiippana K. Advances in Understanding the Phenomena and Processing in Audiovisual Speech Perception. Brain Sci 2023; 13:1345. [PMID: 37759946 PMCID: PMC10527222 DOI: 10.3390/brainsci13091345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 09/18/2023] [Indexed: 09/29/2023] Open
Abstract
The Special Issue entitled "Advances in Understanding the Phenomena and Processing in Audiovisual Speech Perception" attracted a variety of articles written by prominent authors in the field [...].
Collapse
Affiliation(s)
- Kaisa Tiippana
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
11
|
Tiippana K, Ujiie Y, Peromaa T, Takahashi K. Investigation of Cross-Language and Stimulus-Dependent Effects on the McGurk Effect with Finnish and Japanese Speakers and Listeners. Brain Sci 2023; 13:1198. [PMID: 37626554 PMCID: PMC10452414 DOI: 10.3390/brainsci13081198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023] Open
Abstract
In the McGurk effect, perception of a spoken consonant is altered when an auditory (A) syllable is presented with an incongruent visual (V) syllable (e.g., A/pa/V/ka/ is often heard as /ka/ or /ta/). The McGurk effect provides a measure for visual influence on speech perception, becoming stronger the lower the proportion of auditory correct responses. Cross-language effects are studied to understand processing differences between one's own and foreign languages. Regarding the McGurk effect, it has sometimes been found to be stronger with foreign speakers. However, other studies have shown the opposite, or no difference between languages. Most studies have compared English with other languages. We investigated cross-language effects with native Finnish and Japanese speakers and listeners. Both groups of listeners had 49 participants. The stimuli (/ka/, /pa/, /ta/) were uttered by two female and male Finnish and Japanese speakers and presented in A, V and AV modality, including a McGurk stimulus A/pa/V/ka/. The McGurk effect was stronger with Japanese stimuli in both groups. Differences in speech perception were prominent between individual speakers but less so between native languages. Unisensory perception correlated with McGurk perception. These findings suggest that stimulus-dependent features contribute to the McGurk effect. This may have a stronger influence on syllable perception than cross-language factors.
Collapse
Affiliation(s)
- Kaisa Tiippana
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Yuta Ujiie
- Department of Psychology, College of Contemporary Psychology, Rikkyo University, Saitama 352-8558, Japan
- Research Organization of Open Innovation and Collaboration, Ritsumeikan University, Osaka 567-8570, Japan
| | - Tarja Peromaa
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Kohske Takahashi
- College of Comprehensive Psychology, Ritsumeikan University, Osaka 567-8570, Japan
| |
Collapse
|
12
|
Krason A, Vigliocco G, Mailend ML, Stoll H, Varley R, Buxbaum LJ. Benefit of visual speech information for word comprehension in post-stroke aphasia. Cortex 2023; 165:86-100. [PMID: 37271014 PMCID: PMC10850036 DOI: 10.1016/j.cortex.2023.04.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 03/13/2023] [Accepted: 04/22/2023] [Indexed: 06/06/2023]
Abstract
Aphasia is a language disorder that often involves speech comprehension impairments affecting communication. In face-to-face settings, speech is accompanied by mouth and facial movements, but little is known about the extent to which they benefit aphasic comprehension. This study investigated the benefit of visual information accompanying speech for word comprehension in people with aphasia (PWA) and the neuroanatomic substrates of any benefit. Thirty-six PWA and 13 neurotypical matched control participants performed a picture-word verification task in which they indicated whether a picture of an animate/inanimate object matched a subsequent word produced by an actress in a video. Stimuli were either audiovisual (with visible mouth and facial movements) or auditory-only (still picture of a silhouette) with audio being clear (unedited) or degraded (6-band noise-vocoding). We found that visual speech information was more beneficial for neurotypical participants than PWA, and more beneficial for both groups when speech was degraded. A multivariate lesion-symptom mapping analysis for the degraded speech condition showed that lesions to superior temporal gyrus, underlying insula, primary and secondary somatosensory cortices, and inferior frontal gyrus were associated with reduced benefit of audiovisual compared to auditory-only speech, suggesting that the integrity of these fronto-temporo-parietal regions may facilitate cross-modal mapping. These findings provide initial insights into our understanding of the impact of audiovisual information on comprehension in aphasia and the brain regions mediating any benefit.
Collapse
Affiliation(s)
- Anna Krason
- Experimental Psychology, University College London, UK; Moss Rehabilitation Research Institute, Elkins Park, PA, USA.
| | - Gabriella Vigliocco
- Experimental Psychology, University College London, UK; Moss Rehabilitation Research Institute, Elkins Park, PA, USA
| | - Marja-Liisa Mailend
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Department of Special Education, University of Tartu, Tartu Linn, Estonia
| | - Harrison Stoll
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Applied Cognitive and Brain Science, Drexel University, Philadelphia, PA, USA
| | | | - Laurel J Buxbaum
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Department of Rehabilitation Medicine, Thomas Jefferson University, Philadelphia, PA, USA
| |
Collapse
|
13
|
Pepper JL, Nuttall HE. Age-Related Changes to Multisensory Integration and Audiovisual Speech Perception. Brain Sci 2023; 13:1126. [PMID: 37626483 PMCID: PMC10452685 DOI: 10.3390/brainsci13081126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/20/2023] [Accepted: 07/22/2023] [Indexed: 08/27/2023] Open
Abstract
Multisensory integration is essential for the quick and accurate perception of our environment, particularly in everyday tasks like speech perception. Research has highlighted the importance of investigating bottom-up and top-down contributions to multisensory integration and how these change as a function of ageing. Specifically, perceptual factors like the temporal binding window and cognitive factors like attention and inhibition appear to be fundamental in the integration of visual and auditory information-integration that may become less efficient as we age. These factors have been linked to brain areas like the superior temporal sulcus, with neural oscillations in the alpha-band frequency also being implicated in multisensory processing. Age-related changes in multisensory integration may have significant consequences for the well-being of our increasingly ageing population, affecting their ability to communicate with others and safely move through their environment; it is crucial that the evidence surrounding this subject continues to be carefully investigated. This review will discuss research into age-related changes in the perceptual and cognitive mechanisms of multisensory integration and the impact that these changes have on speech perception and fall risk. The role of oscillatory alpha activity is of particular interest, as it may be key in the modulation of multisensory integration.
Collapse
Affiliation(s)
| | - Helen E. Nuttall
- Department of Psychology, Lancaster University, Bailrigg LA1 4YF, UK;
| |
Collapse
|
14
|
Dorsi J, Ostrand R, Rosenblum LD. Semantic priming from McGurk words: Priming depends on perception. Atten Percept Psychophys 2023; 85:1219-1237. [PMID: 37155085 DOI: 10.3758/s13414-023-02689-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2023] [Indexed: 05/10/2023]
Abstract
The McGurk effect is an illusion in which visible articulations alter the perception of auditory speech (e.g., video 'da' dubbed with audio 'ba' may be heard as 'da'). To test the timing of the multisensory processes that underlie the McGurk effect, Ostrand et al. Cognition 151, 96-107, 2016 used incongruent stimuli, such as auditory 'bait' + visual 'date' as primes in a lexical decision task. These authors reported that the auditory word, but not the perceived (visual) word, induced semantic priming, suggesting that the auditory signal alone can provide the input for lexical access, before multisensory integration is complete. Here, we conceptually replicate the design of Ostrand et al. (2016), using different stimuli chosen to optimize the success of the McGurk illusion. In contrast to the results of Ostrand et al. (2016), we find that the perceived (i.e., visual) word of the incongruent stimulus usually induced semantic priming. We further find that the strength of this priming corresponded to the magnitude of the McGurk effect for each word combination. These findings suggest, in contrast to the findings of Ostrand et al. (2016), that lexical access makes use of integrated multisensory information which is perceived by the listener. These findings further suggest that which unimodal signal of a multisensory stimulus is used in lexical access is dependent on the perception of that stimulus.
Collapse
Affiliation(s)
- Josh Dorsi
- Department of Psychology, University of California, Riverside, 900 University Ave, Riverside, CA, 92521, USA.
- Penn State University, College of Medicine, State College, PA, USA.
| | | | - Lawrence D Rosenblum
- Department of Psychology, University of California, Riverside, 900 University Ave, Riverside, CA, 92521, USA
| |
Collapse
|
15
|
Iqbal ZJ, Shahin AJ, Bortfeld H, Backer KC. The McGurk Illusion: A Default Mechanism of the Auditory System. Brain Sci 2023; 13:brainsci13030510. [PMID: 36979322 PMCID: PMC10046462 DOI: 10.3390/brainsci13030510] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/10/2023] [Accepted: 03/18/2023] [Indexed: 03/22/2023] Open
Abstract
Recent studies have questioned past conclusions regarding the mechanisms of the McGurk illusion, especially how McGurk susceptibility might inform our understanding of audiovisual (AV) integration. We previously proposed that the McGurk illusion is likely attributable to a default mechanism, whereby either the visual system, auditory system, or both default to specific phonemes—those implicated in the McGurk illusion. We hypothesized that the default mechanism occurs because visual stimuli with an indiscernible place of articulation (like those traditionally used in the McGurk illusion) lead to an ambiguous perceptual environment and thus a failure in AV integration. In the current study, we tested the default hypothesis as it pertains to the auditory system. Participants performed two tasks. One task was a typical McGurk illusion task, in which individuals listened to auditory-/ba/ paired with visual-/ga/ and judged what they heard. The second task was an auditory-only task, in which individuals transcribed trisyllabic words with a phoneme replaced by silence. We found that individuals’ transcription of missing phonemes often defaulted to ‘/d/t/th/’, the same phonemes often experienced during the McGurk illusion. Importantly, individuals’ default rate was positively correlated with their McGurk rate. We conclude that the McGurk illusion arises when people fail to integrate visual percepts with auditory percepts, due to visual ambiguity, thus leading the auditory system to default to phonemes often implicated in the McGurk illusion.
Collapse
Affiliation(s)
- Zunaira J. Iqbal
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
| | - Heather Bortfeld
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
- Department of Psychological Sciences, University of California, Merced, CA 95353, USA
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Health Sciences Research Institute, University of California, Merced, CA 95343, USA
- Correspondence:
| |
Collapse
|
16
|
Zamuner TS, Rabideau T, McDonald M, Yeung HH. Developmental change in children's speech processing of auditory and visual cues: An eyetracking study. JOURNAL OF CHILD LANGUAGE 2023; 50:27-51. [PMID: 36503546 DOI: 10.1017/s0305000921000684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.
Collapse
Affiliation(s)
| | | | - Margarethe McDonald
- Department of Linguistics, University of Ottawa, Canada
- School of Psychology, University of Ottawa, Canada
| | - H Henny Yeung
- Department of Linguistics, Simon Fraser University, Canada
- Integrative Neuroscience and Cognition Centre, UMR 8002, CNRS and University of Paris, France
| |
Collapse
|
17
|
Mathias B, von Kriegstein K. Enriched learning: behavior, brain, and computation. Trends Cogn Sci 2023; 27:81-97. [PMID: 36456401 DOI: 10.1016/j.tics.2022.10.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/05/2022] [Accepted: 10/25/2022] [Indexed: 11/29/2022]
Abstract
The presence of complementary information across multiple sensory or motor modalities during learning, referred to as multimodal enrichment, can markedly benefit learning outcomes. Why is this? Here, we integrate cognitive, neuroscientific, and computational approaches to understanding the effectiveness of enrichment and discuss recent neuroscience findings indicating that crossmodal responses in sensory and motor brain regions causally contribute to the behavioral benefits of enrichment. The findings provide novel evidence for multimodal theories of enriched learning, challenge assumptions of longstanding cognitive theories, and provide counterevidence to unimodal neurobiologically inspired theories. Enriched educational methods are likely effective not only because they may engage greater levels of attention or deeper levels of processing, but also because multimodal interactions in the brain can enhance learning and memory.
Collapse
Affiliation(s)
- Brian Mathias
- School of Psychology, University of Aberdeen, Aberdeen, UK; Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.
| |
Collapse
|
18
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
19
|
Kelley DB. Convergent and divergent neural circuit architectures that support acoustic communication. Front Neural Circuits 2022; 16:976789. [PMID: 36466364 PMCID: PMC9712726 DOI: 10.3389/fncir.2022.976789] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 10/19/2022] [Indexed: 11/18/2022] Open
Abstract
Vocal communication is used across extant vertebrates, is evolutionarily ancient, and been maintained, in many lineages. Here I review the neural circuit architectures that support intraspecific acoustic signaling in representative anuran, mammalian and avian species as well as two invertebrates, fruit flies and Hawaiian crickets. I focus on hindbrain motor control motifs and their ties to respiratory circuits, expression of receptors for gonadal steroids in motor, sensory, and limbic neurons as well as divergent modalities that evoke vocal responses. Hindbrain and limbic participants in acoustic communication are highly conserved, while forebrain participants have diverged between anurans and mammals, as well as songbirds and rodents. I discuss the roles of natural and sexual selection in driving speciation, as well as exaptation of circuit elements with ancestral roles in respiration, for producing sounds and driving rhythmic vocal features. Recent technical advances in whole brain fMRI across species will enable real time imaging of acoustic signaling partners, tying auditory perception to vocal production.
Collapse
|
20
|
Begau A, Klatt LI, Schneider D, Wascher E, Getzmann S. The role of informational content of visual speech in an audiovisual cocktail party: Evidence from cortical oscillations in young and old participants. Eur J Neurosci 2022; 56:5215-5234. [PMID: 36017762 DOI: 10.1111/ejn.15811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/01/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022]
Abstract
Age-related differences in the processing of audiovisual speech in a multi-talker environment were investigated analysing event-related spectral perturbations (ERSPs), focusing on theta, alpha and beta oscillations that are assumed to reflect conflict processing, multisensory integration and attentional mechanisms, respectively. Eighteen older and 21 younger healthy adults completed a two-alternative forced-choice word discrimination task, responding to audiovisual speech stimuli. In a cocktail-party scenario with two competing talkers (located at -15° and 15° azimuth), target words (/yes/or/no/) appeared at a pre-defined (attended) position, distractor words at the other position. In two audiovisual conditions, acoustic speech was combined either with informative or uninformative visual speech. While a behavioural benefit for informative visual speech occurred for both age groups, differences between audiovisual conditions in the theta and beta band were only present for older adults. A stronger increase in theta perturbations for stimuli containing uninformative visual speech could be associated with early conflict processing, while a stronger suppression in beta perturbations for informative visual speech could be associated to audiovisual integration. Compared to the younger group, the older group showed generally stronger beta perturbations. No condition differences in the alpha band were found. Overall, the findings suggest age-related differences in audiovisual speech integration in a multi-talker environment. While the behavioural benefit of informative visual speech was unaffected by age, older adults had a stronger need for cognitive control when processing conflicting audiovisual speech input. Furthermore, mechanisms of audiovisual integration are differently activated depending on the informational content of the visual information.
Collapse
Affiliation(s)
- Alexandra Begau
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Laura-Isabelle Klatt
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Daniel Schneider
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Edmund Wascher
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Stephan Getzmann
- Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| |
Collapse
|
21
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|
22
|
Goldenberg D, Tiede MK, Bennett RT, Whalen DH. Congruent aero-tactile stimuli bias perception of voicing continua. Front Hum Neurosci 2022; 16:879981. [PMID: 35911601 PMCID: PMC9334670 DOI: 10.3389/fnhum.2022.879981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Multimodal integration is the formation of a coherent percept from different sensory inputs such as vision, audition, and somatosensation. Most research on multimodal integration in speech perception has focused on audio-visual integration. In recent years, audio-tactile integration has also been investigated, and it has been established that puffs of air applied to the skin and timed with listening tasks shift the perception of voicing by naive listeners. The current study has replicated and extended these findings by testing the effect of air puffs on gradations of voice onset time along a continuum rather than the voiced and voiceless endpoints of the original work. Three continua were tested: bilabial (“pa/ba”), velar (“ka/ga”), and a vowel continuum (“head/hid”) used as a control. The presence of air puffs was found to significantly increase the likelihood of choosing voiceless responses for the two VOT continua but had no effect on choices for the vowel continuum. Analysis of response times revealed that the presence of air puffs lengthened responses for intermediate (ambiguous) stimuli and shortened them for endpoint (non-ambiguous) stimuli. The slowest response times were observed for the intermediate steps for all three continua, but for the bilabial continuum this effect interacted with the presence of air puffs: responses were slower in the presence of air puffs, and faster in their absence. This suggests that during integration auditory and aero-tactile inputs are weighted differently by the perceptual system, with the latter exerting greater influence in those cases where the auditory cues for voicing are ambiguous.
Collapse
Affiliation(s)
| | - Mark K. Tiede
- Haskins Laboratories, New Haven, CT, United States
- *Correspondence: Mark K. Tiede,
| | - Ryan T. Bennett
- Department of Linguistics, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - D. H. Whalen
- Haskins Laboratories, New Haven, CT, United States
- The Graduate Center, City University of New York (CUNY), New York, NY, United States
- Department of Linguistics, Yale University, New Haven, CT, United States
| |
Collapse
|
23
|
Abraham A. How We Tell Apart Fiction from Reality. AMERICAN JOURNAL OF PSYCHOLOGY 2022. [DOI: 10.5406/19398298.135.1.01] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Abstract
The human ability to tell apart reality from fiction is intriguing. Through a range of media, such as novels and movies, we are able to readily engage in fictional worlds and experience alternative realities. Yet even when we are completely immersed and emotionally engaged within these worlds, we have little difficulty in leaving the fictional landscapes and getting back to the day-to-day of our own world. How are we able to do this? How do we acquire our understanding of our real world? How is this similar to and different from the development of our knowledge of fictional worlds? In exploring these questions, this article makes the case for a novel multilevel explanation (called BLINCS) of our implicit understanding of the reality–fiction distinction, namely that it is derived from the fact that the worlds of fiction, relative to reality, are bounded, inference-light, curated, and sparse.
Collapse
|
24
|
Wahn B, Schmitz L, Kingstone A, Böckler-Raettig A. When eyes beat lips: speaker gaze affects audiovisual integration in the McGurk illusion. PSYCHOLOGICAL RESEARCH 2021; 86:1930-1943. [PMID: 34854983 PMCID: PMC9363401 DOI: 10.1007/s00426-021-01618-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 11/10/2021] [Indexed: 11/26/2022]
Abstract
Eye contact is a dynamic social signal that captures attention and plays a critical role in human communication. In particular, direct gaze often accompanies communicative acts in an ostensive function: a speaker directs her gaze towards the addressee to highlight the fact that this message is being intentionally communicated to her. The addressee, in turn, integrates the speaker’s auditory and visual speech signals (i.e., her vocal sounds and lip movements) into a unitary percept. It is an open question whether the speaker’s gaze affects how the addressee integrates the speaker’s multisensory speech signals. We investigated this question using the classic McGurk illusion, an illusory percept created by presenting mismatching auditory (vocal sounds) and visual information (speaker’s lip movements). Specifically, we manipulated whether the speaker (a) moved his eyelids up/down (i.e., open/closed his eyes) prior to speaking or did not show any eye motion, and (b) spoke with open or closed eyes. When the speaker’s eyes moved (i.e., opened or closed) before an utterance, and when the speaker spoke with closed eyes, the McGurk illusion was weakened (i.e., addressees reported significantly fewer illusory percepts). In line with previous research, this suggests that motion (opening or closing), as well as the closed state of the speaker’s eyes, captured addressees’ attention, thereby reducing the influence of the speaker’s lip movements on the addressees’ audiovisual integration process. Our findings reaffirm the power of speaker gaze to guide attention, showing that its dynamics can modulate low-level processes such as the integration of multisensory speech signals.
Collapse
Affiliation(s)
- Basil Wahn
- Department of Psychology, Leibniz Universität Hannover, Hannover, Germany.
| | - Laura Schmitz
- Institute of Sports Science, Leibniz Universität Hannover, Hannover, Germany
| | - Alan Kingstone
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | | |
Collapse
|
25
|
Kaganovich N, Christ S. Event-related potentials evidence for long-term audiovisual representations of phonemes in adults. Eur J Neurosci 2021; 54:7860-7875. [PMID: 34750895 PMCID: PMC8815308 DOI: 10.1111/ejn.15519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 11/03/2021] [Accepted: 11/05/2021] [Indexed: 10/19/2022]
Abstract
The presence of long-term auditory representations for phonemes has been well-established. However, since speech perception is typically audiovisual, we hypothesized that long-term phoneme representations may also contain information on speakers' mouth shape during articulation. We used an audiovisual oddball paradigm in which, on each trial, participants saw a face and heard one of two vowels. One vowel occurred frequently (standard), while another occurred rarely (deviant). In one condition (neutral), the face had a closed, non-articulating mouth. In the other condition (audiovisual violation), the mouth shape matched the frequent vowel. Although in both conditions stimuli were audiovisual, we hypothesized that identical auditory changes would be perceived differently by participants. Namely, in the neutral condition, deviants violated only the audiovisual pattern specific to each block. By contrast, in the audiovisual violation condition, deviants additionally violated long-term representations for how a speaker's mouth looks during articulation. We compared the amplitude of mismatch negativity (MMN) and P3 components elicited by deviants in the two conditions. The MMN extended posteriorly over temporal and occipital sites even though deviants contained no visual changes, suggesting that deviants were perceived as interruptions in audiovisual, rather than auditory only, sequences. As predicted, deviants elicited larger MMN and P3 in the audiovisual violation compared to the neutral condition. The results suggest that long-term representations of phonemes are indeed audiovisual.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana, USA
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Sharon Christ
- Department of Human Development and Family Studies, Purdue University, West Lafayette, Indiana, USA
- Department of Statistics, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
26
|
Schulze M, Aslan B, Stöcker T, Stirnberg R, Lux S, Philipsen A. Disentangling early versus late audiovisual integration in adult ADHD: a combined behavioural and resting-state connectivity study. J Psychiatry Neurosci 2021; 46:E528-E537. [PMID: 34548387 PMCID: PMC8526154 DOI: 10.1503/jpn.210017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 05/27/2021] [Accepted: 06/21/2021] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Studies investigating sensory processing in attention-deficit/hyperactivity disorder (ADHD) have shown altered visual and auditory processing. However, evidence is lacking for audiovisual interplay - namely, multisensory integration. As well, neuronal dysregulation at rest (e.g., aberrant within- or between-network functional connectivity) may account for difficulties with integration across the senses in ADHD. We investigated whether sensory processing was altered at the multimodal level in adult ADHD and included resting-state functional connectivity to illustrate a possible overlap between deficient network connectivity and the ability to integrate stimuli. METHODS We tested 25 patients with ADHD and 24 healthy controls using 2 illusionary paradigms: the sound-induced flash illusion and the McGurk illusion. We applied the Mann-Whitney U test to assess statistical differences between groups. We acquired resting-state functional MRIs on a 3.0 T Siemens magnetic resonance scanner, using a highly accelerated 3-dimensional echo planar imaging sequence. RESULTS For the sound-induced flash illusion, susceptibility and reaction time were not different between the 2 groups. For the McGurk illusion, susceptibility was significantly lower for patients with ADHD, and reaction times were significantly longer. At a neuronal level, resting-state functional connectivity in the ADHD group was more highly regulated in polymodal regions that play a role in binding unimodal sensory inputs from different modalities and enabling sensory-to-cognition integration. LIMITATIONS We did not explicitly screen for autism spectrum disorder, which has high rates of comorbidity with ADHD and also involves impairments in multisensory integration. Although the patients were carefully screened by our outpatient department, we could not rule out the possibility of autism spectrum disorder in some participants. CONCLUSION Unimodal hypersensitivity seems to have no influence on the integration of basal stimuli, but it might have negative consequences for the multisensory integration of complex stimuli. This finding was supported by observations of higher resting-state functional connectivity between unimodal sensory areas and polymodal multisensory integration convergence zones for complex stimuli.
Collapse
Affiliation(s)
- Marcel Schulze
- From the Department of Psychiatry and Psychotherapy, University of Bonn, Bonn, Germany (Schulze, Aslan, Lux, Philipsen); Biopsychology and Cognitive Neuroscience, Faculty of Psychology and Sports Science, Bielefeld University, Bielefeld, Germany (Schulze); the German Centre for Neurodegenerative Diseases (DZNE), Bonn, Germany (Stöcker, Stirnberg); and the Department of Physics and Astronomy, University of Bonn, Bonn, Germany (Stöcker)
| | | | | | | | | | | |
Collapse
|
27
|
Banks B, Gowen E, Munro KJ, Adank P. Eye Gaze and Perceptual Adaptation to Audiovisual Degraded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3432-3445. [PMID: 34463528 DOI: 10.1044/2021_jslhr-21-00106] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose Visual cues from a speaker's face may benefit perceptual adaptation to degraded speech, but current evidence is limited. We aimed to replicate results from previous studies to establish the extent to which visual speech cues can lead to greater adaptation over time, extending existing results to a real-time adaptation paradigm (i.e., without a separate training period). A second aim was to investigate whether eye gaze patterns toward the speaker's mouth were related to better perception, hypothesizing that listeners who looked more at the speaker's mouth would show greater adaptation. Method A group of listeners (n = 30) was presented with 90 noise-vocoded sentences in audiovisual format, whereas a control group (n = 29) was presented with the audio signal only. Recognition accuracy was measured throughout and eye tracking was used to measure fixations toward the speaker's eyes and mouth in the audiovisual group. Results Previous studies were partially replicated: The audiovisual group had better recognition throughout and adapted slightly more rapidly, but both groups showed an equal amount of improvement overall. Longer fixations on the speaker's mouth in the audiovisual group were related to better overall accuracy. An exploratory analysis further demonstrated that the duration of fixations to the speaker's mouth decreased over time. Conclusions The results suggest that visual cues may not benefit adaptation to degraded speech as much as previously thought. Longer fixations on a speaker's mouth may play a role in successfully decoding visual speech cues; however, this will need to be confirmed in future research to fully understand how patterns of eye gaze are related to audiovisual speech recognition. All materials, data, and code are available at https://osf.io/2wqkf/.
Collapse
Affiliation(s)
- Briony Banks
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Emma Gowen
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
- Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
28
|
Motor Circuit and Superior Temporal Sulcus Activities Linked to Individual Differences in Multisensory Speech Perception. Brain Topogr 2021; 34:779-792. [PMID: 34480635 DOI: 10.1007/s10548-021-00869-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 08/24/2021] [Indexed: 10/20/2022]
Abstract
Integrating multimodal information into a unified perception is a fundamental human capacity. McGurk effect is a remarkable multisensory illusion that demonstrates a percept different from incongruent auditory and visual syllables. However, not all listeners perceive the McGurk illusion to the same degree. The neural basis for individual differences in modulation of multisensory integration and syllabic perception remains largely unclear. To probe the possible involvement of specific neural circuits in individual differences in multisensory speech perception, we first implemented a behavioral experiment to examine the McGurk susceptibility. Then, functional magnetic resonance imaging was performed in 63 participants to measure the brain activity in response to non-McGurk audiovisual syllables. We revealed significant individual variability in McGurk illusion perception. Moreover, we found significant differential activations of the auditory and visual regions and the left Superior temporal sulcus (STS), as well as multiple motor areas between strong and weak McGurk perceivers. Importantly, the individual engagement of the STS and motor areas could specifically predict the behavioral McGurk susceptibility, contrary to the sensory regions. These findings suggest that the distinct multimodal integration in STS as well as coordinated phonemic modulatory processes in motor circuits may serve as a neural substrate for interindividual differences in multisensory speech perception.
Collapse
|
29
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
30
|
Gonzales MG, Backer KC, Mandujano B, Shahin AJ. Rethinking the Mechanisms Underlying the McGurk Illusion. Front Hum Neurosci 2021; 15:616049. [PMID: 33867954 PMCID: PMC8046930 DOI: 10.3389/fnhum.2021.616049] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
The McGurk illusion occurs when listeners hear an illusory percept (i.e., "da"), resulting from mismatched pairings of audiovisual (AV) speech stimuli (i.e., auditory/ba/paired with visual/ga/). Hearing a third percept-distinct from both the auditory and visual input-has been used as evidence of AV fusion. We examined whether the McGurk illusion is instead driven by visual dominance, whereby the third percept, e.g., "da," represents a default percept for visemes with an ambiguous place of articulation (POA), like/ga/. Participants watched videos of a talker uttering various consonant vowels (CVs) with (AV) and without (V-only) audios of/ba/. Individuals transcribed the CV they saw (V-only) or heard (AV). In the V-only condition, individuals predominantly saw "da"/"ta" when viewing CVs with indiscernible POAs. Likewise, in the AV condition, upon perceiving an illusion, they predominantly heard "da"/"ta" for CVs with indiscernible POAs. The illusion was stronger in individuals who exhibited weak/ba/auditory encoding (examined using a control auditory-only task). In Experiment2, we attempted to replicate these findings using stimuli recorded from a different talker. The V-only results were not replicated, but again individuals predominately heard "da"/"ta"/"tha" as an illusory percept for various AV combinations, and the illusion was stronger in individuals who exhibited weak/ba/auditory encoding. These results demonstrate that when visual CVs with indiscernible POAs are paired with a weakly encoded auditory/ba/, listeners default to hearing "da"/"ta"/"tha"-thus, tempering the AV fusion account, and favoring a default mechanism triggered when both AV stimuli are ambiguous.
Collapse
Affiliation(s)
- Mariel G. Gonzales
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Brenna Mandujano
- Department of Psychology, California State University, Fresno, Fresno, CA, United States
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| |
Collapse
|
31
|
Lindborg A, Andersen TS. Bayesian binding and fusion models explain illusion and enhancement effects in audiovisual speech perception. PLoS One 2021; 16:e0246986. [PMID: 33606815 PMCID: PMC7895372 DOI: 10.1371/journal.pone.0246986] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 01/31/2021] [Indexed: 11/24/2022] Open
Abstract
Speech is perceived with both the ears and the eyes. Adding congruent visual speech improves the perception of a faint auditory speech stimulus, whereas adding incongruent visual speech can alter the perception of the utterance. The latter phenomenon is the case of the McGurk illusion, where an auditory stimulus such as e.g. "ba" dubbed onto a visual stimulus such as "ga" produces the illusion of hearing "da". Bayesian models of multisensory perception suggest that both the enhancement and the illusion case can be described as a two-step process of binding (informed by prior knowledge) and fusion (informed by the information reliability of each sensory cue). However, there is to date no study which has accounted for how they each contribute to audiovisual speech perception. In this study, we expose subjects to both congruent and incongruent audiovisual speech, manipulating the binding and the fusion stages simultaneously. This is done by varying both temporal offset (binding) and auditory and visual signal-to-noise ratio (fusion). We fit two Bayesian models to the behavioural data and show that they can both account for the enhancement effect in congruent audiovisual speech, as well as the McGurk illusion. This modelling approach allows us to disentangle the effects of binding and fusion on behavioural responses. Moreover, we find that these models have greater predictive power than a forced fusion model. This study provides a systematic and quantitative approach to measuring audiovisual integration in the perception of the McGurk illusion as well as congruent audiovisual speech, which we hope will inform future work on audiovisual speech perception.
Collapse
Affiliation(s)
- Alma Lindborg
- Department of Psychology, University of Potsdam, Potsdam, Germany
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Tobias S. Andersen
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
32
|
Abstract
Beat gestures-spontaneously produced biphasic movements of the hand-are among the most frequently encountered co-speech gestures in human communication. They are closely temporally aligned to the prosodic characteristics of the speech signal, typically occurring on lexically stressed syllables. Despite their prevalence across speakers of the world's languages, how beat gestures impact spoken word recognition is unclear. Can these simple 'flicks of the hand' influence speech perception? Across a range of experiments, we demonstrate that beat gestures influence the explicit and implicit perception of lexical stress (e.g. distinguishing OBject from obJECT), and in turn can influence what vowels listeners hear. Thus, we provide converging evidence for a manual McGurk effect: relatively simple and widely occurring hand movements influence which speech sounds we hear.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - David Peeters
- Department of Communication and Cognition, TiCC Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
33
|
The McGurk effect in the time of pandemic: Age-dependent adaptation to an environmental loss of visual speech cues. Psychon Bull Rev 2021; 28:992-1002. [PMID: 33443708 DOI: 10.3758/s13423-020-01852-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/19/2020] [Indexed: 11/08/2022]
Abstract
Seeing a person's mouth move for [ga] while hearing [ba] often results in the perception of "da." Such audiovisual integration of speech cues, known as the McGurk effect, is stable within but variable across individuals. When the visual or auditory cues are degraded, due to signal distortion or the perceiver's sensory impairment, reliance on cues via the impoverished modality decreases. This study tested whether cue-reliance adjustments due to exposure to reduced cue availability are persistent and transfer to subsequent perception of speech with all cues fully available. A McGurk experiment was administered at the beginning and after a month of mandatory face-mask wearing (enforced in Czechia during the 2020 pandemic). Responses to audio-visually incongruent stimuli were analyzed from 292 persons (ages 16-55), representing a cross-sectional sample, and 41 students (ages 19-27), representing a longitudinal sample. The extent to which the participants relied exclusively on visual cues was affected by testing time in interaction with age. After a month of reduced access to lipreading, reliance on visual cues (present at test) somewhat lowered for younger and increased for older persons. This implies that adults adapt their speech perception faculty to an altered environmental availability of multimodal cues, and that younger adults do so more efficiently. This finding demonstrates that besides sensory impairment or signal noise, which reduce cue availability and thus affect audio-visual cue reliance, having experienced a change in environmental conditions can modulate the perceiver's (otherwise relatively stable) general bias towards different modalities during speech communication.
Collapse
|
34
|
Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect. Atten Percept Psychophys 2020; 82:3544-3557. [PMID: 32533526 PMCID: PMC7788022 DOI: 10.3758/s13414-020-02042-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seeing a talker’s face can aid audiovisual (AV) integration when speech is presented in noise. However, few studies have simultaneously manipulated auditory and visual degradation. We aimed to establish how degrading the auditory and visual signal affected AV integration. Where people look on the face in this context is also of interest; Buchan, Paré and Munhall (Brain Research, 1242, 162–171, 2008) found fixations on the mouth increased in the presence of auditory noise whilst Wilson, Alsius, Paré and Munhall (Journal of Speech, Language, and Hearing Research, 59(4), 601–615, 2016) found mouth fixations decreased with decreasing visual resolution. In Condition 1, participants listened to clear speech, and in Condition 2, participants listened to vocoded speech designed to simulate the information provided by a cochlear implant. Speech was presented in three levels of auditory noise and three levels of visual blurring. Adding noise to the auditory signal increased McGurk responses, while blurring the visual signal decreased McGurk responses. Participants fixated the mouth more on trials when the McGurk effect was perceived. Adding auditory noise led to people fixating the mouth more, while visual degradation led to people fixating the mouth less. Combined, the results suggest that modality preference and where people look during AV integration of incongruent syllables varies according to the quality of information available.
Collapse
|
35
|
Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure. Sci Rep 2020; 10:18009. [PMID: 33093570 PMCID: PMC7583249 DOI: 10.1038/s41598-020-75201-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 10/05/2020] [Indexed: 11/08/2022] Open
Abstract
In face-to-face communication, audio-visual (AV) stimuli can be fused, combined or perceived as mismatching. While the left superior temporal sulcus (STS) is presumably the locus of AV integration, the process leading to combination is unknown. Based on previous modelling work, we hypothesize that combination results from a complex dynamic originating in a failure to integrate AV inputs, followed by a reconstruction of the most plausible AV sequence. In two different behavioural tasks and one MEG experiment, we observed that combination is more time demanding than fusion. Using time-/source-resolved human MEG analyses with linear and dynamic causal models, we show that both fusion and combination involve early detection of AV incongruence in the STS, whereas combination is further associated with enhanced activity of AV asynchrony-sensitive regions (auditory and inferior frontal cortices). Based on neural signal decoding, we finally show that only combination can be decoded from the IFG activity and that combination is decoded later than fusion in the STS. These results indicate that the AV speech integration outcome primarily depends on whether the STS converges or not onto an existing multimodal syllable representation, and that combination results from subsequent temporal processing, presumably the off-line re-ordering of incongruent AV stimuli.
Collapse
|
36
|
A value-driven McGurk effect: Value-associated faces enhance the influence of visual information on audiovisual speech perception and its eye movement pattern. Atten Percept Psychophys 2020; 82:1928-1941. [PMID: 31898072 DOI: 10.3758/s13414-019-01918-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This study investigates whether and how value-associated faces affect audiovisual speech perception and its eye movement pattern. Participants were asked to learn to associate particular faces with or without monetary reward in the training phase, and, in the subsequent test phase, to identify syllables that the talkers had said in video clips in which the talkers' faces had or had not been associated with reward. The syllables were either congruent or incongruent with the talkers' mouth movements. Crucially, in some cases, the incongruent syllables could elicit the McGurk effect. Results showed that the McGurk effect occurred more often for reward-associated faces than for non-reward-associated faces. Moreover, the signal detection analysis revealed that participants had lower criterion and higher discriminability for reward-associated faces than for non-reward-associated faces. Surprisingly, eye movement data showed that participants spent more time looking at and fixated more often on the extraoral (nose/cheek) area for reward-associated faces than for non-reward-associated faces, while the opposite pattern was observed on the oral (mouth) area. The correlation analysis demonstrated that, over participants, the more they looked at the extraoral area in the training phase because of reward, the larger the increase of McGurk proportion (and the less they looked at the oral area) in the test phase. These findings not only demonstrate that value-associated faces enhance the influence of visual information on audiovisual speech perception but also highlight the importance of the extraoral facial area in the value-driven McGurk effect.
Collapse
|
37
|
Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020; 133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.
Collapse
|
38
|
Thézé R, Gadiri MA, Albert L, Provost A, Giraud AL, Mégevand P. Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments. Sci Rep 2020; 10:15540. [PMID: 32968127 PMCID: PMC7511320 DOI: 10.1038/s41598-020-72375-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 08/31/2020] [Indexed: 11/09/2022] Open
Abstract
Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.
Collapse
Affiliation(s)
- Raphaël Thézé
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Mehdi Ali Gadiri
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Louis Albert
- Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
| | - Antoine Provost
- Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Pierre Mégevand
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland. .,Division of Neurology, Geneva University Hospitals, Geneva, Switzerland.
| |
Collapse
|
39
|
Pasqualotto A, Yin CCJ, Ohka M, Kitada R. The Effect of Object Compliance on the Velvet Hand Illusion. IEEE TRANSACTIONS ON HAPTICS 2020; 13:571-577. [PMID: 31725388 DOI: 10.1109/toh.2019.2948603] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Movement of a grid of bars between the two hands creates the tactile illusion of a velvet-like material, namely, the velvet hand illusion (VHI). It was recently proposed that the VHI is caused by a masking effect; bar movement suppresses conscious perception of tactile inputs from the opposing hand. If this hypothesis sufficiently explains the VHI, the physical properties of the opposing hand should not affect the illusion. Another hypothesis suggests that the integration of inputs from the grid of bars and the hands plays a critical role in the VHI. To compare these two hypotheses, the VHI was elicited under two conditions; the grid of bars was between one hand and a soft texture or the grid of bars was between one hand and a hard texture. A hand was stimulated by moving bars while contacting the stationary texture held by the opposing hand. The grid of bars with the soft texture induced a stronger illusion and softer feeling than that with the hard texture. This result supports the integration hypothesis in which tactile inputs from both bars and textures attached to the opposing hand are integrated.
Collapse
|
40
|
Arantes ME, Cendes F. In Search of a New Paradigm for Functional Magnetic Resonance Experimentation With Language. Front Neurol 2020; 11:588. [PMID: 32670188 PMCID: PMC7326770 DOI: 10.3389/fneur.2020.00588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 05/22/2020] [Indexed: 11/23/2022] Open
Abstract
Human language can convey a broad range of entities and relationships through processes that are highly complex and structured. All of these processes are happening somewhere inside our brains, and one way of precising these locations is through the usage of the functional magnetic resonance imaging. The great obstacle when experimenting with complex processes, however, is the need to control them while still having data that are representative of reality. When it comes to language, an interactional phenomenon in its nature, and that integrates a wide range of processes, a question emerges concerning how compatible it is with the current experimental methodology, and how much of it is lost in order to fit the controlled experimental environment. Because of its particularities, the fMRI technique imposes several limitations to the expression of language during experimentation. This paper discusses the different conceptions of language as a research object, the hardships of combining this object with the requirements of fMRI, and what are the current perspectives for this field of research.
Collapse
Affiliation(s)
| | - Fernando Cendes
- Laboratory of Neuroimaging, Department of Neurology, University of Campinas—UNICAMP, Campinas, Brazil
| |
Collapse
|
41
|
Moro SS, Gorbet DJ, Steeves JKE. Brain Activation for Audiovisual Information in People With One Eye Compared to Binocular and Eye-Patched Viewing Controls. Front Neurosci 2020; 14:529. [PMID: 32508588 PMCID: PMC7253581 DOI: 10.3389/fnins.2020.00529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 04/29/2020] [Indexed: 11/24/2022] Open
Abstract
Blindness caused by early vision loss results in complete visual deprivation and subsequent changes in the use of the remaining intact senses. We have also observed adaptive plasticity in the case of partial visual deprivation. The removal of one eye, through unilateral eye enucleation, results in partial visual deprivation and is a unique model for examining the consequences of the loss of binocularity. Partial deprivation of the visual system from the loss of one eye early in life results in behavioral and structural changes in the remaining senses, namely auditory and audiovisual systems. In the current study we use functional neuroimaging data to relate function and behavior of the audiovisual system in this rare patient group compared to controls viewing binocularly or with one eye patched. In Experiment 1, a whole brain analysis compared common regions of cortical activation between groups, for auditory, visual and audiovisual stimuli. People with one eye demonstrated a trend for increased activation for low-level audiovisual stimuli compared to patched viewing controls but did not differ from binocular viewing controls. In Experiment 2, a region of interest (ROI) analysis for auditory, visual, audiovisual and illusory McGurk stimuli revealed that people with one eye had an increased trend for left hemisphere audiovisual activation for McGurk stimuli compared to binocular viewing controls. This aligns with current behavioral analysis and previous research showing reduced McGurk Effect in people with one eye. Furthermore, there is no evidence of a correlation between behavioral performance on the McGurk Effect task and functional activation. Together with previous behavioral work, these functional data contribute to the broader understanding of cross-sensory effects of early sensory deprivation from eye enucleation. Overall, these results contribute to a better understanding of the sensory deficits experienced by people with one eye, as well as, the relationship between behavior, structure and function in order to better predict the outcome of early partial visual deafferentation.
Collapse
Affiliation(s)
- Stefania S Moro
- Department of Psychology, York University, Toronto, ON, Canada.,Centre for Vision Research, York University, Toronto, ON, Canada.,The Hospital for Sick Children, Toronto, ON, Canada
| | - Diana J Gorbet
- Centre for Vision Research, York University, Toronto, ON, Canada.,School of Kinesiology and Health Science, York University, Toronto, ON, Canada
| | - Jennifer K E Steeves
- Department of Psychology, York University, Toronto, ON, Canada.,Centre for Vision Research, York University, Toronto, ON, Canada.,The Hospital for Sick Children, Toronto, ON, Canada
| |
Collapse
|
42
|
Randazzo M, Priefer R, Smith PJ, Nagler A, Avery T, Froud K. Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm. Brain Sci 2020; 10:brainsci10060328. [PMID: 32481538 PMCID: PMC7348766 DOI: 10.3390/brainsci10060328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/15/2020] [Accepted: 05/25/2020] [Indexed: 11/16/2022] Open
Abstract
The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.
Collapse
Affiliation(s)
- Melissa Randazzo
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
- Correspondence: ; Tel.: +1-516-877-4769
| | - Ryan Priefer
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Paul J. Smith
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Amanda Nagler
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Trey Avery
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Karen Froud
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| |
Collapse
|
43
|
Age-related hearing loss influences functional connectivity of auditory cortex for the McGurk illusion. Cortex 2020; 129:266-280. [PMID: 32535378 DOI: 10.1016/j.cortex.2020.04.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 03/30/2020] [Accepted: 04/09/2020] [Indexed: 01/23/2023]
Abstract
Age-related hearing loss affects hearing at high frequencies and is associated with difficulties in understanding speech. Increased audio-visual integration has recently been found in age-related hearing impairment, the brain mechanisms that contribute to this effect are however unclear. We used functional magnetic resonance imaging in elderly subjects with normal hearing and mild to moderate uncompensated hearing loss. Audio-visual integration was studied using the McGurk task. In this task, an illusionary fused percept can occur if incongruent auditory and visual syllables are presented. The paradigm included unisensory stimuli (auditory only, visual only), congruent audio-visual and incongruent (McGurk) audio-visual stimuli. An illusionary precept was reported in over 60% of incongruent trials. These McGurk illusion rates were equal in both groups of elderly subjects and correlated positively with speech-in-noise perception and daily listening effort. Normal-hearing participants showed an increased neural response in left pre- and postcentral gyri and right middle frontal gyrus for incongruent stimuli (McGurk) compared to congruent audio-visual stimuli. Activation patterns were however not different between groups. Task-modulated functional connectivity differed between groups showing increased connectivity from auditory cortex to visual, parietal and frontal areas in hard of hearing participants as compared to normal-hearing participants when comparing incongruent stimuli (McGurk) with congruent audio-visual stimuli. These results suggest that changes in functional connectivity of auditory cortex rather than activation strength during processing of audio-visual McGurk stimuli accompany age-related hearing loss.
Collapse
|
44
|
Stawicki M, Majdak P, Başkent D. Ventriloquist Illusion Produced With Virtual Acoustic Spatial Cues and Asynchronous Audiovisual Stimuli in Both Young and Older Individuals. Multisens Res 2019; 32:745-770. [DOI: 10.1163/22134808-20191430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 09/03/2019] [Indexed: 11/19/2022]
Abstract
Abstract
Ventriloquist illusion, the change in perceived location of an auditory stimulus when a synchronously presented but spatially discordant visual stimulus is added, has been previously shown in young healthy populations to be a robust paradigm that mainly relies on automatic processes. Here, we propose ventriloquist illusion as a potential simple test to assess audiovisual (AV) integration in young and older individuals. We used a modified version of the illusion paradigm that was adaptive, nearly bias-free, relied on binaural stimulus representation using generic head-related transfer functions (HRTFs) instead of multiple loudspeakers, and tested with synchronous and asynchronous presentation of AV stimuli (both tone and speech). The minimum audible angle (MAA), the smallest perceptible difference in angle between two sound sources, was compared with or without the visual stimuli in young and older adults with no or minimal sensory deficits. The illusion effect, measured by means of MAAs implemented with HRTFs, was observed with both synchronous and asynchronous visual stimulus, but only with tone and not speech stimulus. The patterns were similar between young and older individuals, indicating the versatility of the modified ventriloquist illusion paradigm.
Collapse
Affiliation(s)
- Marnix Stawicki
- 1Department of Otorhinolaryngology / Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- 2Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences (BCN), University of Groningen, Groningen, The Netherlands
| | - Piotr Majdak
- 3Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| | - Deniz Başkent
- 1Department of Otorhinolaryngology / Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- 2Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences (BCN), University of Groningen, Groningen, The Netherlands
| |
Collapse
|
45
|
Feng G, Zhou B, Zhou W, Beauchamp MS, Magnotti JF. A Laboratory Study of the McGurk Effect in 324 Monozygotic and Dizygotic Twins. Front Neurosci 2019; 13:1029. [PMID: 31636529 PMCID: PMC6787151 DOI: 10.3389/fnins.2019.01029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 09/10/2019] [Indexed: 11/13/2022] Open
Abstract
Multisensory integration of information from the talker's voice and the talker's mouth facilitates human speech perception. A popular assay of audiovisual integration is the McGurk effect, an illusion in which incongruent visual speech information categorically changes the percept of auditory speech. There is substantial interindividual variability in susceptibility to the McGurk effect. To better understand possible sources of this variability, we examined the McGurk effect in 324 native Mandarin speakers, consisting of 73 monozygotic (MZ) and 89 dizygotic (DZ) twin pairs. When tested with 9 different McGurk stimuli, some participants never perceived the illusion and others always perceived it. Within participants, perception was similar across time (r = 0.55 at a 2-year retest in 150 participants) suggesting that McGurk susceptibility reflects a stable trait rather than short-term perceptual fluctuations. To examine the effects of shared genetics and prenatal environment, we compared McGurk susceptibility between MZ and DZ twins. Both twin types had significantly greater correlation than unrelated pairs (r = 0.28 for MZ twins and r = 0.21 for DZ twins) suggesting that the genes and environmental factors shared by twins contribute to individual differences in multisensory speech perception. Conversely, the existence of substantial differences within twin pairs (even MZ co-twins) and the overall low percentage of explained variance (5.5%) argues against a deterministic view of individual differences in multisensory integration.
Collapse
Affiliation(s)
- Guo Feng
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Psychological Research and Counseling Center, Southwest Jiaotong University, Chengdu, China
| | - Bin Zhou
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Wen Zhou
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Michael S. Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, United States
| | - John F. Magnotti
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, United States
| |
Collapse
|
46
|
Kaganovich N, Ancel E. Different neural processes underlie visual speech perception in school-age children and adults: An event-related potentials study. J Exp Child Psychol 2019; 184:98-122. [PMID: 31015101 PMCID: PMC6857813 DOI: 10.1016/j.jecp.2019.03.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 03/15/2019] [Accepted: 03/26/2019] [Indexed: 11/18/2022]
Abstract
The ability to use visual speech cues does not fully develop until late adolescence. The cognitive and neural processes underlying this slow maturation are not yet understood. We examined electrophysiological responses of younger (8-9 years) and older (11-12 years) children as well as adults elicited by visually perceived articulations in an audiovisual word matching task and related them to the amount of benefit gained during a speech-in-noise (SIN) perception task when seeing the talker's face. On each trial, participants first heard a word and, after a short pause, saw a speaker silently articulate a word. In half of the trials the articulated word matched the auditory word (congruent trials), whereas in the other half it did not (incongruent trials). In all three age groups, incongruent articulations elicited the N400 component and congruent articulations elicited the late positive complex (LPC). Groups did not differ in the mean amplitude of N400. The mean amplitude of LPC was larger in younger children compared with older children and adults. Importantly, the relationship between event-related potential measures and SIN performance varied by group. In 8- and 9-year-olds, neither component was predictive of SIN gain. The LPC amplitude predicted the SIN gain in older children but not in adults. Conversely, the N400 amplitude predicted the SIN gain in adults. We argue that although all groups were able to detect correspondences between auditory and visual word onsets at the phonemic/syllabic level, only adults could use this information for lexical access.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA; Department of Psychological Sciences, Purdue University, West Lafayette, IN 47907, USA.
| | - Elizabeth Ancel
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
47
|
Lindborg A, Baart M, Stekelenburg JJ, Vroomen J, Andersen TS. Speech-specific audiovisual integration modulates induced theta-band oscillations. PLoS One 2019; 14:e0219744. [PMID: 31310616 PMCID: PMC6634411 DOI: 10.1371/journal.pone.0219744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4–8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.
Collapse
Affiliation(s)
- Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
48
|
"Paying" attention to audiovisual speech: Do incongruent stimuli incur greater costs? Atten Percept Psychophys 2019; 81:1743-1756. [PMID: 31197661 DOI: 10.3758/s13414-019-01772-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept. McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration, but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing - susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal, and distinct cortical regions are recruited when processing congruent versus incongruent speech. In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed, they do not appear to differ in terms of processing time or effort, at least in the open-set task speech task used here. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.
Collapse
|
49
|
Modelska M, Pourquié M, Baart M. No "Self" Advantage for Audiovisual Speech Aftereffects. Front Psychol 2019; 10:658. [PMID: 30967827 PMCID: PMC6440388 DOI: 10.3389/fpsyg.2019.00658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 03/08/2019] [Indexed: 11/13/2022] Open
Abstract
Although the default state of the world is that we see and hear other people talking, there is evidence that seeing and hearing ourselves rather than someone else may lead to visual (i.e., lip-read) or auditory "self" advantages. We assessed whether there is a "self" advantage for phonetic recalibration (a lip-read driven cross-modal learning effect) and selective adaptation (a contrastive effect in the opposite direction of recalibration). We observed both aftereffects as well as an on-line effect of lip-read information on auditory perception (i.e., immediate capture), but there was no evidence for a "self" advantage in any of the tasks (as additionally supported by Bayesian statistics). These findings strengthen the emerging notion that recalibration reflects a general learning mechanism, and bolster the argument that adaptation depends on rather low-level auditory/acoustic features of the speech signal.
Collapse
Affiliation(s)
- Maria Modelska
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Marie Pourquié
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
- UPPA, IKER (UMR5478), Bayonne, France
| | - Martijn Baart
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, Netherlands
| |
Collapse
|
50
|
Peiffer-Smadja N, Cohen L. The cerebral bases of the bouba-kiki effect. Neuroimage 2019; 186:679-689. [PMID: 30503933 DOI: 10.1016/j.neuroimage.2018.11.033] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2018] [Revised: 11/18/2018] [Accepted: 11/21/2018] [Indexed: 11/30/2022] Open
Abstract
The crossmodal correspondence between some speech sounds and some geometrical shapes, known as the bouba-kiki (BK) effect, constitutes a remarkable exception to the general arbitrariness of the links between word meaning and word sounds. We have analyzed the association of shapes and sounds in order to determine whether it occurs at a perceptual or at a decisional level, and whether it takes place in sensory cortices or in supramodal regions. First, using an Implicit Association Test (IAT), we have shown that the BK effect may occur without participants making any explicit decision relative to sound-shape associations. Second, looking for the brain correlates of implicit BK matching, we have found that intermodal matching influences activations in both auditory and visual sensory cortices. Moreover, we found stronger prefrontal activation to mismatching than to matching stimuli, presumably reflecting a modulation of executive processes by crossmodal correspondence. Thus, through its roots in the physiology of object categorization and crossmodal matching, the BK effect provides a unique insight into some non-linguistic components of word formation.
Collapse
Affiliation(s)
- Nathan Peiffer-Smadja
- Institut du Cerveau et de la Moelle épinière, ICM, Inserm U 1127, CNRS UMR 7225, Sorbonne Université, F-75013, Paris, France
| | - Laurent Cohen
- Institut du Cerveau et de la Moelle épinière, ICM, Inserm U 1127, CNRS UMR 7225, Sorbonne Université, F-75013, Paris, France; Département de Neurologie 1, Hôpital de la Pitié Salpêtrière, AP-HP, F-75013, Paris, France.
| |
Collapse
|