1
|
Magnotti JF, Lado A, Beauchamp MS. The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers. Front Neurosci 2024; 18:1421713. [PMID: 38988770 PMCID: PMC11233445 DOI: 10.3389/fnins.2024.1421713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/28/2024] [Indexed: 07/12/2024] Open
Abstract
In the McGurk effect, visual speech from the face of the talker alters the perception of auditory speech. The diversity of human languages has prompted many intercultural studies of the effect in both Western and non-Western cultures, including native Japanese speakers. Studies of large samples of native English speakers have shown that the McGurk effect is characterized by high variability in the susceptibility of different individuals to the illusion and in the strength of different experimental stimuli to induce the illusion. The noisy encoding of disparity (NED) model of the McGurk effect uses principles from Bayesian causal inference to account for this variability, separately estimating the susceptibility and sensory noise for each individual and the strength of each stimulus. To determine whether variation in McGurk perception is similar between Western and non-Western cultures, we applied the NED model to data collected from 80 native Japanese-speaking participants. Fifteen different McGurk stimuli that varied in syllable content (unvoiced auditory "pa" + visual "ka" or voiced auditory "ba" + visual "ga") were presented interleaved with audiovisual congruent stimuli. The McGurk effect was highly variable across stimuli and participants, with the percentage of illusory fusion responses ranging from 3 to 78% across stimuli and from 0 to 91% across participants. Despite this variability, the NED model accurately predicted perception, predicting fusion rates for individual stimuli with 2.1% error and for individual participants with 2.4% error. Stimuli containing the unvoiced pa/ka pairing evoked more fusion responses than the voiced ba/ga pairing. Model estimates of sensory noise were correlated with participant age, with greater sensory noise in older participants. The NED model of the McGurk effect offers a principled way to account for individual and stimulus differences when examining the McGurk effect in different cultures.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Anastasia Lado
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
2
|
Bujok R, Meyer AS, Bosker HR. Audiovisual Perception of Lexical Stress: Beat Gestures and Articulatory Cues. LANGUAGE AND SPEECH 2024:238309241258162. [PMID: 38877720 DOI: 10.1177/00238309241258162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Human communication is inherently multimodal. Auditory speech, but also visual cues can be used to understand another talker. Most studies of audiovisual speech perception have focused on the perception of speech segments (i.e., speech sounds). However, less is known about the influence of visual information on the perception of suprasegmental aspects of speech like lexical stress. In two experiments, we investigated the influence of different visual cues (e.g., facial articulatory cues and beat gestures) on the audiovisual perception of lexical stress. We presented auditory lexical stress continua of disyllabic Dutch stress pairs together with videos of a speaker producing stress on the first or second syllable (e.g., articulating VOORnaam or voorNAAM). Moreover, we combined and fully crossed the face of the speaker producing lexical stress on either syllable with a gesturing body producing a beat gesture on either the first or second syllable. Results showed that people successfully used visual articulatory cues to stress in muted videos. However, in audiovisual conditions, we were not able to find an effect of visual articulatory cues. In contrast, we found that the temporal alignment of beat gestures with speech robustly influenced participants' perception of lexical stress. These results highlight the importance of considering suprasegmental aspects of language in multimodal contexts.
Collapse
Affiliation(s)
- Ronny Bujok
- Max Planck Institute for Psycholinguistics, The Netherlands
- International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, The Netherlands
| | | | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, The Netherlands
| |
Collapse
|
3
|
Scheller M, Fang H, Sui J. Self as a prior: The malleability of Bayesian multisensory integration to social salience. Br J Psychol 2024; 115:185-205. [PMID: 37747452 DOI: 10.1111/bjop.12683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 08/26/2023] [Accepted: 09/11/2023] [Indexed: 09/26/2023]
Abstract
Our everyday perceptual experiences are grounded in the integration of information within and across our senses. Due to this direct behavioural relevance, cross-modal integration retains a certain degree of contextual flexibility, even to social relevance. However, how social relevance modulates cross-modal integration remains unclear. To investigate possible mechanisms, Experiment 1 tested the principles of audio-visual integration for numerosity estimation by deriving a Bayesian optimal observer model with perceptual prior from empirical data to explain perceptual biases. Such perceptual priors may shift towards locations of high salience in the stimulus space. Our results showed that the tendency to over- or underestimate numerosity, expressed in the frequency and strength of fission and fusion illusions, depended on the actual event numerosity. Experiment 2 replicated the effects of social relevance on multisensory integration from Scheller & Sui, 2022 JEP:HPP, using a lower number of events, thereby favouring the opposite illusion through enhanced influences of the prior. In line with the idea that the self acts like a prior, the more frequently observed illusion (more malleable to prior influences) was modulated by self-relevance. Our findings suggest that the self can influence perception by acting like a prior in cue integration, biasing perceptual estimates towards areas of high self-relevance.
Collapse
Affiliation(s)
- Meike Scheller
- Department of Psychology, University of Aberdeen, Aberdeen, UK
- Department of Psychology, Durham University, Durham, UK
| | - Huilin Fang
- Department of Psychology, University of Aberdeen, Aberdeen, UK
| | - Jie Sui
- Department of Psychology, University of Aberdeen, Aberdeen, UK
| |
Collapse
|
4
|
de Assis MF, Berti LC. Speech perception: Auditory and visual cue integration in children with and without phonological disorder in voiceless fricatives. CLINICAL LINGUISTICS & PHONETICS 2024:1-17. [PMID: 38560916 DOI: 10.1080/02699206.2024.2328792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024]
Abstract
The literature reports contradictory results regarding the influence of visual cues on speech perception tasks in children with phonological disorder (PD). This study aimed to compare the performance of children with (n = 15) and without PD (n = 15) in audiovisual perception task in voiceless fricatives. Assuming that PD could be associated with an inability to integrate phonological information from two sensory sources, we presumed that children with PD would present difficulties in integrating auditory and visual cues compared to typical children. A syllable identification task was conducted. The stimuli were presented according to four conditions: auditory-only (AO); visual-only (VO); audiovisual congruent (AV+); and audiovisual incongruent (AV-). The percentages of correct answers and the respective reaction times in the AO, VO, and AV+ conditions were considered for the analysis. The correct percentage of auditory stimuli was considered for the AV- condition, as well as the percentage of perceptual preference: auditory, visual, and/or illusion (McGurk effect), with the respective reaction time. In comparing the four conditions, children with PD presented a lower number of correct answers and longer reaction time than children with typical development, mainly for the VO. Both groups showed a preference for auditory stimuli for the AV- condition. However, children with PD showed higher percentages for visual perceptual preference and the McGurk effect than typical children. The superiority of typical children over PD children in auditory-visual speech perception depends on type of stimuli and condition of presentation.
Collapse
Affiliation(s)
- Mayara Ferreira de Assis
- Department of Speech, Language and Hearing Sciences, São Paulo State University, Marília, São Paulo, Brazil
| | - Larissa Cristina Berti
- Department of Speech, Language and Hearing Sciences, São Paulo State University, Marília, São Paulo, Brazil
| |
Collapse
|
5
|
Dong C, Noppeney U, Wang S. Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli. Hum Brain Mapp 2024; 45:e26653. [PMID: 38488460 DOI: 10.1002/hbm.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
Face-to-face communication relies on the integration of acoustic speech signals with the corresponding facial articulations. In the McGurk illusion, an auditory /ba/ phoneme presented simultaneously with a facial articulation of a /ga/ (i.e., viseme), is typically fused into an illusory 'da' percept. Despite its widespread use as an index of audiovisual speech integration, critics argue that it arises from perceptual processes that differ categorically from natural speech recognition. Conversely, Bayesian theoretical frameworks suggest that both the illusory McGurk and the veridical audiovisual congruent speech percepts result from probabilistic inference based on noisy sensory signals. According to these models, the inter-sensory conflict in McGurk stimuli may only increase observers' perceptual uncertainty. This functional magnetic resonance imaging (fMRI) study presented participants (20 male and 24 female) with audiovisual congruent, McGurk (i.e., auditory /ba/ + visual /ga/), and incongruent (i.e., auditory /ga/ + visual /ba/) stimuli along with their unisensory counterparts in a syllable categorization task. Behaviorally, observers' response entropy was greater for McGurk compared to congruent audiovisual stimuli. At the neural level, McGurk stimuli increased activations in a widespread neural system, extending from the inferior frontal sulci (IFS) to the pre-supplementary motor area (pre-SMA) and insulae, typically involved in cognitive control processes. Crucially, in line with Bayesian theories these activation increases were fully accounted for by observers' perceptual uncertainty as measured by their response entropy. Our findings suggest that McGurk and congruent speech processing rely on shared neural mechanisms, thereby supporting the McGurk illusion as a valid measure of natural audiovisual speech perception.
Collapse
Affiliation(s)
- Chenjie Dong
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Uta Noppeney
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Suiping Wang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
| |
Collapse
|
6
|
Lee HH, Groves K, Ripollés P, Carrasco M. Audiovisual integration in the McGurk effect is impervious to music training. Sci Rep 2024; 14:3262. [PMID: 38332159 PMCID: PMC10853564 DOI: 10.1038/s41598-024-53593-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 02/01/2024] [Indexed: 02/10/2024] Open
Abstract
The McGurk effect refers to an audiovisual speech illusion where the discrepant auditory and visual syllables produce a fused percept between the visual and auditory component. However, little is known about how individual differences contribute to the McGurk effect. Here, we examined whether music training experience-which involves audiovisual integration-can modulate the McGurk effect. Seventy-three participants completed the Goldsmiths Musical Sophistication Index (Gold-MSI) questionnaire to evaluate their music expertise on a continuous scale. Gold-MSI considers participants' daily-life exposure to music learning experiences (formal and informal), instead of merely classifying people into different groups according to how many years they have been trained in music. Participants were instructed to report, via a 3-alternative forced choice task, "what a person said": /Ba/, /Ga/ or /Da/. The experiment consisted of 96 audiovisual congruent trials and 96 audiovisual incongruent (McGurk) trials. We observed no significant correlations between the susceptibility of the McGurk effect and the different subscales of the Gold-MSI (active engagement, perceptual abilities, music training, singing abilities, emotion) or the general musical sophistication composite score. Together, these findings suggest that music training experience does not modulate audiovisual integration in speech as reflected by the McGurk effect.
Collapse
Affiliation(s)
- Hsing-Hao Lee
- Department of Psychology, New York University, New York, USA.
| | - Karleigh Groves
- Department of Psychology, New York University, New York, USA
- Center for Language, Music, and Emotion (CLaME), New York University, New York, USA
- Music and Audio Research Lab (MARL), New York University, New York, USA
| | - Pablo Ripollés
- Department of Psychology, New York University, New York, USA
- Center for Language, Music, and Emotion (CLaME), New York University, New York, USA
- Music and Audio Research Lab (MARL), New York University, New York, USA
| | - Marisa Carrasco
- Department of Psychology, New York University, New York, USA
- Center for Neural Science, New York University, New York, USA
| |
Collapse
|
7
|
Jones SA, Noppeney U. Multisensory Integration and Causal Inference in Typical and Atypical Populations. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1437:59-76. [PMID: 38270853 DOI: 10.1007/978-981-99-7611-9_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Multisensory perception is critical for effective interaction with the environment, but human responses to multisensory stimuli vary across the lifespan and appear changed in some atypical populations. In this review chapter, we consider multisensory integration within a normative Bayesian framework. We begin by outlining the complex computational challenges of multisensory causal inference and reliability-weighted cue integration, and discuss whether healthy young adults behave in accordance with normative Bayesian models. We then compare their behaviour with various other human populations (children, older adults, and those with neurological or neuropsychiatric disorders). In particular, we consider whether the differences seen in these groups are due only to changes in their computational parameters (such as sensory noise or perceptual priors), or whether the fundamental computational principles (such as reliability weighting) underlying multisensory perception may also be altered. We conclude by arguing that future research should aim explicitly to differentiate between these possibilities.
Collapse
Affiliation(s)
- Samuel A Jones
- Department of Psychology, Nottingham Trent University, Nottingham, UK.
| | - Uta Noppeney
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
8
|
Zou Z, Zhao B, Ting KH, Wong C, Hou X, Chan CCH. Multisensory integration augmenting motor processes among older adults. Front Aging Neurosci 2023; 15:1293479. [PMID: 38192281 PMCID: PMC10773807 DOI: 10.3389/fnagi.2023.1293479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024] Open
Abstract
Objective Multisensory integration enhances sensory processing in older adults. This study aimed to investigate how the sensory enhancement would modulate the motor related process in healthy older adults. Method Thirty-one older adults (12 males, mean age 67.7 years) and 29 younger adults as controls (16 males, mean age 24.9 years) participated in this study. Participants were asked to discriminate spatial information embedded in the unisensory (visual or audial) and multisensory (audiovisual) conditions. The responses made by the movements of the left and right wrists corresponding to the spatial information were registered with specially designed pads. The electroencephalogram (EEG) marker was the event-related super-additive P2 in the frontal-central region, the stimulus-locked lateralized readiness potentials (s-LRP) and response-locked lateralized readiness potentials (r-LRP). Results Older participants showed significantly faster and more accurate responses than controls in the multisensory condition than in the unisensory conditions. Both groups had significantly less negative-going s-LRP amplitudes elicited at the central sites in the between-condition contrasts. However, only the older group showed significantly less negative-going, centrally distributed r-LRP amplitudes. More importantly, only the r-LRP amplitude in the audiovisual condition significantly predicted behavioral performance. Conclusion Audiovisual integration enhances reaction time, which associates with modulated motor related processes among the older participants. The super-additive effects modulate both the motor preparation and generation processes. Interestingly, only the modulated motor generation process contributes to faster reaction time. As such effects were observed in older but not younger participants, multisensory integration likely augments motor functions in those with age-related neurodegeneration.
Collapse
Affiliation(s)
- Zhi Zou
- Department of Sport and Health, Guangzhou Sport University, Guangzhou, China
| | - Benxuan Zhao
- Department of Sport and Health, Guangzhou Sport University, Guangzhou, China
| | - Kin-hung Ting
- University Research Facility in Behavioral and Systems Neuroscience, The Hong Kong Polytechnic University, Hong Kong, Hong Kong SAR, China
| | - Clive Wong
- Department of Psychology, The Education University of Hong Kong, New Territories, Hong Kong SAR, China
| | - Xiaohui Hou
- Department of Sport and Health, Guangzhou Sport University, Guangzhou, China
| | - Chetwyn C. H. Chan
- Department of Psychology, The Education University of Hong Kong, New Territories, Hong Kong SAR, China
| |
Collapse
|
9
|
Choi I, Demir I, Oh S, Lee SH. Multisensory integration in the mammalian brain: diversity and flexibility in health and disease. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220338. [PMID: 37545309 PMCID: PMC10404930 DOI: 10.1098/rstb.2022.0338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/30/2023] [Indexed: 08/08/2023] Open
Abstract
Multisensory integration (MSI) occurs in a variety of brain areas, spanning cortical and subcortical regions. In traditional studies on sensory processing, the sensory cortices have been considered for processing sensory information in a modality-specific manner. The sensory cortices, however, send the information to other cortical and subcortical areas, including the higher association cortices and the other sensory cortices, where the multiple modality inputs converge and integrate to generate a meaningful percept. This integration process is neither simple nor fixed because these brain areas interact with each other via complicated circuits, which can be modulated by numerous internal and external conditions. As a result, dynamic MSI makes multisensory decisions flexible and adaptive in behaving animals. Impairments in MSI occur in many psychiatric disorders, which may result in an altered perception of the multisensory stimuli and an abnormal reaction to them. This review discusses the diversity and flexibility of MSI in mammals, including humans, primates and rodents, as well as the brain areas involved. It further explains how such flexibility influences perceptual experiences in behaving animals in both health and disease. This article is part of the theme issue 'Decision and control processes in multisensory perception'.
Collapse
Affiliation(s)
- Ilsong Choi
- Center for Synaptic Brain Dysfunctions, Institute for Basic Science (IBS), Daejeon 34141, Republic of Korea
| | - Ilayda Demir
- Department of biological sciences, KAIST, Daejeon 34141, Republic of Korea
| | - Seungmi Oh
- Department of biological sciences, KAIST, Daejeon 34141, Republic of Korea
| | - Seung-Hee Lee
- Center for Synaptic Brain Dysfunctions, Institute for Basic Science (IBS), Daejeon 34141, Republic of Korea
- Department of biological sciences, KAIST, Daejeon 34141, Republic of Korea
| |
Collapse
|
10
|
Noel JP, Bill J, Ding H, Vastola J, DeAngelis GC, Angelaki DE, Drugowitsch J. Causal inference during closed-loop navigation: parsing of self- and object-motion. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220344. [PMID: 37545300 PMCID: PMC10404925 DOI: 10.1098/rstb.2022.0344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/20/2023] [Indexed: 08/08/2023] Open
Abstract
A key computation in building adaptive internal models of the external world is to ascribe sensory signals to their likely cause(s), a process of causal inference (CI). CI is well studied within the framework of two-alternative forced-choice tasks, but less well understood within the cadre of naturalistic action-perception loops. Here, we examine the process of disambiguating retinal motion caused by self- and/or object-motion during closed-loop navigation. First, we derive a normative account specifying how observers ought to intercept hidden and moving targets given their belief about (i) whether retinal motion was caused by the target moving, and (ii) if so, with what velocity. Next, in line with the modelling results, we show that humans report targets as stationary and steer towards their initial rather than final position more often when they are themselves moving, suggesting a putative misattribution of object-motion to the self. Further, we predict that observers should misattribute retinal motion more often: (i) during passive rather than active self-motion (given the lack of an efference copy informing self-motion estimates in the former), and (ii) when targets are presented eccentrically rather than centrally (given that lateral self-motion flow vectors are larger at eccentric locations during forward self-motion). Results support both of these predictions. Lastly, analysis of eye movements show that, while initial saccades toward targets were largely accurate regardless of the self-motion condition, subsequent gaze pursuit was modulated by target velocity during object-only motion, but not during concurrent object- and self-motion. These results demonstrate CI within action-perception loops, and suggest a protracted temporal unfolding of the computations characterizing CI. This article is part of the theme issue 'Decision and control processes in multisensory perception'.
Collapse
Affiliation(s)
- Jean-Paul Noel
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - Johannes Bill
- Department of Neurobiology, Harvard University, Boston, MA 02115, USA
- Department of Psychology, Harvard University, Boston, MA 02115, USA
| | - Haoran Ding
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - John Vastola
- Department of Neurobiology, Harvard University, Boston, MA 02115, USA
| | - Gregory C. DeAngelis
- Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14611, USA
| | - Dora E. Angelaki
- Center for Neural Science, New York University, New York, NY 10003, USA
- Tandon School of Engineering, New York University, New York, NY 10003, USA
| | - Jan Drugowitsch
- Department of Neurobiology, Harvard University, Boston, MA 02115, USA
- Center for Brain Science, Harvard University, Boston, MA 02115, USA
| |
Collapse
|
11
|
Meijer D, Noppeney U. Metacognition in the audiovisual McGurk illusion: perceptual and causal confidence. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220348. [PMID: 37545307 PMCID: PMC10404922 DOI: 10.1098/rstb.2022.0348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 07/02/2023] [Indexed: 08/08/2023] Open
Abstract
Almost all decisions in everyday life rely on multiple sensory inputs that can come from common or independent causes. These situations invoke perceptual uncertainty about environmental properties and the signals' causal structure. Using the audiovisual McGurk illusion, this study investigated how observers formed perceptual and causal confidence judgements in information integration tasks under causal uncertainty. Observers were presented with spoken syllables, their corresponding articulatory lip movements or their congruent and McGurk combinations (e.g. auditory B/P with visual G/K). Observers reported their perceived auditory syllable, the causal structure and confidence for each judgement. Observers were more accurate and confident on congruent than unisensory trials. Their perceptual and causal confidence were tightly related over trials as predicted by the interactive nature of perceptual and causal inference. Further, observers assigned comparable perceptual and causal confidence to veridical 'G/K' percepts on audiovisual congruent trials and their causal and perceptual metamers on McGurk trials (i.e. illusory 'G/K' percepts). Thus, observers metacognitively evaluate the integrated audiovisual percept with limited access to the conflicting unisensory stimulus components on McGurk trials. Collectively, our results suggest that observers form meaningful perceptual and causal confidence judgements about multisensory scenes that are qualitatively consistent with principles of Bayesian causal inference. This article is part of the theme issue 'Decision and control processes in multisensory perception'.
Collapse
Affiliation(s)
- David Meijer
- Computational Neuroscience and Cognitive Robotics Centre, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
- Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, 1040, Wien, Austria
| | - Uta Noppeney
- Computational Neuroscience and Cognitive Robotics Centre, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Kapittelweg 29, 6525 EN, Nijmegen, The Netherlands
| |
Collapse
|
12
|
Ma KST, Schnupp JWH. The unity hypothesis revisited: can the male/female incongruent McGurk effect be disrupted by familiarization and priming? Front Psychol 2023; 14:1106562. [PMID: 37705948 PMCID: PMC10495566 DOI: 10.3389/fpsyg.2023.1106562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 08/10/2023] [Indexed: 09/15/2023] Open
Abstract
The unity assumption hypothesis contends that higher-level factors, such as a perceiver's belief and prior experience, modulate multisensory integration. The McGurk illusion exemplifies such integration. When a visual velar consonant /ga/ is dubbed with an auditory bilabial /ba/, listeners unify the discrepant signals with knowledge that open lips cannot produce /ba/ and a fusion percept /da/ is perceived. Previous research claimed to have falsified the unity assumption hypothesis by demonstrating the McGurk effect occurs even when a face is dubbed with a voice of the opposite sex, and thus violates expectations from prior experience. But perhaps stronger counter-evidence is needed to prevent perceptual unity than just an apparent incongruence between unfamiliar faces and voices. Here we investigated whether the McGurk illusion with male/female incongruent stimuli can be disrupted by familiarization and priming with an appropriate pairing of face and voice. In an online experiment, the susceptibility of participants to the McGurk illusion was tested with stimuli containing either a male or female face with a voice of incongruent gender. The number of times participants experienced a McGurk illusion was measured before and after a familiarization block, which familiarized them with the true pairings of face and voice. After familiarization and priming, the susceptibility to the McGurk effects decreased significantly on average. The findings support the notion that unity assumptions modulate intersensory bias, and confirm and extend previous studies using male/female incongruent McGurk stimuli.
Collapse
Affiliation(s)
- Kennis S. T. Ma
- The School of Psychology & Counselling, The Open University (UK), Milton Keynes, United Kingdom
| | - Jan W. H. Schnupp
- Department of Neuroscience, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| |
Collapse
|
13
|
Noel JP, Bill J, Ding H, Vastola J, DeAngelis GC, Angelaki DE, Drugowitsch J. Causal inference during closed-loop navigation: parsing of self- and object-motion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.27.525974. [PMID: 36778376 PMCID: PMC9915492 DOI: 10.1101/2023.01.27.525974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
A key computation in building adaptive internal models of the external world is to ascribe sensory signals to their likely cause(s), a process of Bayesian Causal Inference (CI). CI is well studied within the framework of two-alternative forced-choice tasks, but less well understood within the cadre of naturalistic action-perception loops. Here, we examine the process of disambiguating retinal motion caused by self- and/or object-motion during closed-loop navigation. First, we derive a normative account specifying how observers ought to intercept hidden and moving targets given their belief over (i) whether retinal motion was caused by the target moving, and (ii) if so, with what velocity. Next, in line with the modeling results, we show that humans report targets as stationary and steer toward their initial rather than final position more often when they are themselves moving, suggesting a misattribution of object-motion to the self. Further, we predict that observers should misattribute retinal motion more often: (i) during passive rather than active self-motion (given the lack of an efference copy informing self-motion estimates in the former), and (ii) when targets are presented eccentrically rather than centrally (given that lateral self-motion flow vectors are larger at eccentric locations during forward self-motion). Results confirm both of these predictions. Lastly, analysis of eye-movements show that, while initial saccades toward targets are largely accurate regardless of the self-motion condition, subsequent gaze pursuit was modulated by target velocity during object-only motion, but not during concurrent object- and self-motion. These results demonstrate CI within action-perception loops, and suggest a protracted temporal unfolding of the computations characterizing CI.
Collapse
Affiliation(s)
- Jean-Paul Noel
- Center for Neural Science, New York University, New York City, NY, United States
| | - Johannes Bill
- Department of Neurobiology, Harvard Medical School, Boston, MA, United States
- Department of Psychology, Harvard University, Cambridge, MA, United States
| | - Haoran Ding
- Center for Neural Science, New York University, New York City, NY, United States
| | - John Vastola
- Department of Neurobiology, Harvard Medical School, Boston, MA, United States
| | - Gregory C. DeAngelis
- Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY, United States
| | - Dora E. Angelaki
- Center for Neural Science, New York University, New York City, NY, United States
- Tandon School of Engineering, New York University, New York City, NY, United states
| | - Jan Drugowitsch
- Department of Neurobiology, Harvard Medical School, Boston, MA, United States
- Center for Brain Science, Harvard University, Boston, MA, United States
| |
Collapse
|
14
|
Association between different sensory modalities based on concurrent time series data obtained by a collaborative reservoir computing model. Sci Rep 2023; 13:173. [PMID: 36600034 DOI: 10.1038/s41598-023-27385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 01/02/2023] [Indexed: 01/06/2023] Open
Abstract
Humans perceive the external world by integrating information from different modalities, obtained through the sensory organs. However, the aforementioned mechanism is still unclear and has been a subject of widespread interest in the fields of psychology and brain science. A model using two reservoir computing systems, i.e., a type of recurrent neural network trained to mimic each other's output, can detect stimulus patterns that repeatedly appear in a time series signal. We applied this model for identifying specific patterns that co-occur between information from different modalities. The model was self-organized by specific fluctuation patterns that co-occurred between different modalities, and could detect each fluctuation pattern. Additionally, similarly to the case where perception is influenced by synchronous/asynchronous presentation of multimodal stimuli, the model failed to work correctly for signals that did not co-occur with corresponding fluctuation patterns. Recent experimental studies have suggested that direct interaction between different sensory systems is important for multisensory integration, in addition to top-down control from higher brain regions such as the association cortex. Because several patterns of interaction between sensory modules can be incorporated into the employed model, we were able to compare the performance between them; the original version of the employed model incorporated such an interaction as the teaching signals for learning. The performance of the original and alternative models was evaluated, and the original model was found to perform the best. Thus, we demonstrated that feedback of the outputs of appropriately learned sensory modules performed the best when compared to the other examined patterns of interaction. The proposed model incorporated information encoded by the dynamic state of the neural population and the interactions between different sensory modules, both of which were based on recent experimental observations; this allowed us to study the influence of the temporal relationship and frequency of occurrence of multisensory signals on sensory integration, as well as the nature of interaction between different sensory signals.
Collapse
|
15
|
Fisher VL, Dean CL, Nave CS, Parkins EV, Kerkhoff WG, Kwakye LD. Increases in sensory noise predict attentional disruptions to audiovisual speech perception. Front Hum Neurosci 2023; 16:1027335. [PMID: 36684833 PMCID: PMC9846366 DOI: 10.3389/fnhum.2022.1027335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/05/2022] [Indexed: 01/06/2023] Open
Abstract
We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD.
Collapse
Affiliation(s)
- Victoria L. Fisher
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Yale University School of Medicine and the Connecticut Mental Health Center, New Haven, CT, United States
| | - Cassandra L. Dean
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Roche/Genentech Neurodevelopment & Psychiatry Teams Product Development, Neuroscience, South San Francisco, CA, United States
| | - Claire S. Nave
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
| | - Emma V. Parkins
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Neuroscience Graduate Program, University of Cincinnati, Cincinnati, OH, United States
| | - Willa G. Kerkhoff
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Leslie D. Kwakye
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,*Correspondence: Leslie D. Kwakye,
| |
Collapse
|
16
|
Begau A, Arnau S, Klatt LI, Wascher E, Getzmann S. Using visual speech at the cocktail-party: CNV evidence for early speech extraction in younger and older adults. Hear Res 2022; 426:108636. [DOI: 10.1016/j.heares.2022.108636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/26/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022]
|
17
|
Wilbiks JMP, Brown VA, Strand JF. Speech and non-speech measures of audiovisual integration are not correlated. Atten Percept Psychophys 2022; 84:1809-1819. [PMID: 35610409 PMCID: PMC10699539 DOI: 10.3758/s13414-022-02517-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2022] [Indexed: 11/08/2022]
Abstract
Many natural events generate both visual and auditory signals, and humans are remarkably adept at integrating information from those sources. However, individuals appear to differ markedly in their ability or propensity to combine what they hear with what they see. Individual differences in audiovisual integration have been established using a range of materials, including speech stimuli (seeing and hearing a talker) and simpler audiovisual stimuli (seeing flashes of light combined with tones). Although there are multiple tasks in the literature that are referred to as "measures of audiovisual integration," the tasks themselves differ widely with respect to both the type of stimuli used (speech versus non-speech) and the nature of the tasks themselves (e.g., some tasks use conflicting auditory and visual stimuli whereas others use congruent stimuli). It is not clear whether these varied tasks are actually measuring the same underlying construct: audiovisual integration. This study tested the relationships among four commonly-used measures of audiovisual integration, two of which use speech stimuli (susceptibility to the McGurk effect and a measure of audiovisual benefit), and two of which use non-speech stimuli (the sound-induced flash illusion and audiovisual integration capacity). We replicated previous work showing large individual differences in each measure but found no significant correlations among any of the measures. These results suggest that tasks that are commonly referred to as measures of audiovisual integration may be tapping into different parts of the same process or different constructs entirely.
Collapse
Affiliation(s)
| | - Violet A Brown
- Department of Psychological & Brain Sciences, Washington University in St. Louis, Saint Louis, MO, USA
| | - Julia F Strand
- Department of Psychology, Carleton College, Northfield, MN, USA
| |
Collapse
|
18
|
Shams L, Beierholm U. Bayesian causal inference: A unifying neuroscience theory. Neurosci Biobehav Rev 2022; 137:104619. [PMID: 35331819 DOI: 10.1016/j.neubiorev.2022.104619] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 02/21/2022] [Accepted: 03/10/2022] [Indexed: 01/08/2023]
Abstract
Understanding of the brain and the principles governing neural processing requires theories that are parsimonious, can account for a diverse set of phenomena, and can make testable predictions. Here, we review the theory of Bayesian causal inference, which has been tested, refined, and extended in a variety of tasks in humans and other primates by several research groups. Bayesian causal inference is normative and has explained human behavior in a vast number of tasks including unisensory and multisensory perceptual tasks, sensorimotor, and motor tasks, and has accounted for counter-intuitive findings. The theory has made novel predictions that have been tested and confirmed empirically, and recent studies have started to map its algorithms and neural implementation in the human brain. The parsimony, the diversity of the phenomena that the theory has explained, and its illuminating brain function at all three of Marr's levels of analysis make Bayesian causal inference a strong neuroscience theory. This also highlights the importance of collaborative and multi-disciplinary research for the development of new theories in neuroscience.
Collapse
Affiliation(s)
- Ladan Shams
- Departments of Psychology, BioEngineering, and Neuroscience Interdepartmental Program, University of California, Los Angeles, USA.
| | | |
Collapse
|
19
|
Rennig J, Beauchamp MS. Intelligibility of audiovisual sentences drives multivoxel response patterns in human superior temporal cortex. Neuroimage 2022; 247:118796. [PMID: 34906712 PMCID: PMC8819942 DOI: 10.1016/j.neuroimage.2021.118796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 11/18/2021] [Accepted: 12/08/2021] [Indexed: 11/18/2022] Open
Abstract
Regions of the human posterior superior temporal gyrus and sulcus (pSTG/S) respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech, and neural responses in pSTG/S may underlie the perceptual benefit of visual speech for the comprehension of noisy auditory speech. We examined this possibility through the lens of multivoxel pattern responses in pSTG/S. BOLD fMRI data was collected from 22 participants presented with speech consisting of English sentences presented in five different formats: visual-only; auditory with and without added auditory noise; and audiovisual with and without auditory noise. Participants reported the intelligibility of each sentence with a button press and trials were sorted post-hoc into those that were more or less intelligible. Response patterns were measured in regions of the pSTG/S identified with an independent localizer. Noisy audiovisual sentences with very similar physical properties evoked very different response patterns depending on their intelligibility. When a noisy audiovisual sentence was reported as intelligible, the pattern was nearly identical to that elicited by clear audiovisual sentences. In contrast, an unintelligible noisy audiovisual sentence evoked a pattern like that of visual-only sentences. This effect was less pronounced for noisy auditory-only sentences, which evoked similar response patterns regardless of intelligibility. The successful integration of visual and auditory speech produces a characteristic neural signature in pSTG/S, highlighting the importance of this region in generating the perceptual benefit of visual speech.
Collapse
Affiliation(s)
- Johannes Rennig
- Division of Neuropsychology, Center of Neurology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Building, A607, 3700 Hamilton Walk, Philadelphia, PA 19104-6016, United States.
| |
Collapse
|
20
|
Wahn B, Schmitz L, Kingstone A, Böckler-Raettig A. When eyes beat lips: speaker gaze affects audiovisual integration in the McGurk illusion. PSYCHOLOGICAL RESEARCH 2021; 86:1930-1943. [PMID: 34854983 PMCID: PMC9363401 DOI: 10.1007/s00426-021-01618-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 11/10/2021] [Indexed: 11/26/2022]
Abstract
Eye contact is a dynamic social signal that captures attention and plays a critical role in human communication. In particular, direct gaze often accompanies communicative acts in an ostensive function: a speaker directs her gaze towards the addressee to highlight the fact that this message is being intentionally communicated to her. The addressee, in turn, integrates the speaker’s auditory and visual speech signals (i.e., her vocal sounds and lip movements) into a unitary percept. It is an open question whether the speaker’s gaze affects how the addressee integrates the speaker’s multisensory speech signals. We investigated this question using the classic McGurk illusion, an illusory percept created by presenting mismatching auditory (vocal sounds) and visual information (speaker’s lip movements). Specifically, we manipulated whether the speaker (a) moved his eyelids up/down (i.e., open/closed his eyes) prior to speaking or did not show any eye motion, and (b) spoke with open or closed eyes. When the speaker’s eyes moved (i.e., opened or closed) before an utterance, and when the speaker spoke with closed eyes, the McGurk illusion was weakened (i.e., addressees reported significantly fewer illusory percepts). In line with previous research, this suggests that motion (opening or closing), as well as the closed state of the speaker’s eyes, captured addressees’ attention, thereby reducing the influence of the speaker’s lip movements on the addressees’ audiovisual integration process. Our findings reaffirm the power of speaker gaze to guide attention, showing that its dynamics can modulate low-level processes such as the integration of multisensory speech signals.
Collapse
Affiliation(s)
- Basil Wahn
- Department of Psychology, Leibniz Universität Hannover, Hannover, Germany.
| | - Laura Schmitz
- Institute of Sports Science, Leibniz Universität Hannover, Hannover, Germany
| | - Alan Kingstone
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | | |
Collapse
|
21
|
Laeng B, Kuyateh S, Kelkar T. Substituting facial movements in singers changes the sounds of musical intervals. Sci Rep 2021; 11:22442. [PMID: 34789775 PMCID: PMC8599708 DOI: 10.1038/s41598-021-01797-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 10/26/2021] [Indexed: 11/18/2022] Open
Abstract
Cross-modal integration is ubiquitous within perception and, in humans, the McGurk effect demonstrates that seeing a person articulating speech can change what we hear into a new auditory percept. It remains unclear whether cross-modal integration of sight and sound generalizes to other visible vocal articulations like those made by singers. We surmise that perceptual integrative effects should involve music deeply, since there is ample indeterminacy and variability in its auditory signals. We show that switching videos of sung musical intervals changes systematically the estimated distance between two notes of a musical interval so that pairing the video of a smaller sung interval to a relatively larger auditory led to compression effects on rated intervals, whereas the reverse led to a stretching effect. In addition, after seeing a visually switched video of an equally-tempered sung interval and then hearing the same interval played on the piano, the two intervals were judged often different though they differed only in instrument. These findings reveal spontaneous, cross-modal, integration of vocal sounds and clearly indicate that strong integration of sound and sight can occur beyond the articulations of natural speech.
Collapse
Affiliation(s)
- Bruno Laeng
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Forskningsveien 3A, 1094 Blindern, 0317, Oslo, Norway.
- Department of Psychology, University of Oslo, Oslo, Norway.
| | - Sarjo Kuyateh
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Forskningsveien 3A, 1094 Blindern, 0317, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Tejaswinee Kelkar
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Forskningsveien 3A, 1094 Blindern, 0317, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| |
Collapse
|
22
|
Ferrari A, Noppeney U. Attention controls multisensory perception via two distinct mechanisms at different levels of the cortical hierarchy. PLoS Biol 2021; 19:e3001465. [PMID: 34793436 PMCID: PMC8639080 DOI: 10.1371/journal.pbio.3001465] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/02/2021] [Accepted: 11/01/2021] [Indexed: 11/22/2022] Open
Abstract
To form a percept of the multisensory world, the brain needs to integrate signals from common sources weighted by their reliabilities and segregate those from independent sources. Previously, we have shown that anterior parietal cortices combine sensory signals into representations that take into account the signals' causal structure (i.e., common versus independent sources) and their sensory reliabilities as predicted by Bayesian causal inference. The current study asks to what extent and how attentional mechanisms can actively control how sensory signals are combined for perceptual inference. In a pre- and postcueing paradigm, we presented observers with audiovisual signals at variable spatial disparities. Observers were precued to attend to auditory or visual modalities prior to stimulus presentation and postcued to report their perceived auditory or visual location. Combining psychophysics, functional magnetic resonance imaging (fMRI), and Bayesian modelling, we demonstrate that the brain moulds multisensory inference via two distinct mechanisms. Prestimulus attention to vision enhances the reliability and influence of visual inputs on spatial representations in visual and posterior parietal cortices. Poststimulus report determines how parietal cortices flexibly combine sensory estimates into spatial representations consistent with Bayesian causal inference. Our results show that distinct neural mechanisms control how signals are combined for perceptual inference at different levels of the cortical hierarchy.
Collapse
Affiliation(s)
- Ambra Ferrari
- Computational Neuroscience and Cognitive Robotics Centre, University of Birmingham, Birmingham, United Kingdom
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Uta Noppeney
- Computational Neuroscience and Cognitive Robotics Centre, University of Birmingham, Birmingham, United Kingdom
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
23
|
Galazka MA, Hadjikhani N, Sundqvist M, Åsberg Johnels J. Facial speech processing in children with and without dyslexia. ANNALS OF DYSLEXIA 2021; 71:501-524. [PMID: 34115279 PMCID: PMC8458188 DOI: 10.1007/s11881-021-00231-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 05/10/2021] [Indexed: 06/04/2023]
Abstract
What role does the presence of facial speech play for children with dyslexia? Current literature proposes two distinctive claims. One claim states that children with dyslexia make less use of visual information from the mouth during speech processing due to a deficit in recruitment of audiovisual areas. An opposing claim suggests that children with dyslexia are in fact reliant on such information in order to compensate for auditory/phonological impairments. The current paper aims at directly testing these contrasting hypotheses (here referred to as "mouth insensitivity" versus "mouth reliance") in school-age children with and without dyslexia, matched on age and listening comprehension. Using eye tracking, in Study 1, we examined how children look at the mouth across conditions varying in speech processing demands. The results did not indicate significant group differences in looking at the mouth. However, correlation analyses suggest potentially important distinctions within the dyslexia group: those children with dyslexia who are better readers attended more to the mouth while presented with a person's face in a phonologically demanding condition. In Study 2, we examined whether the presence of facial speech cues is functionally beneficial when a child is encoding written words. The results indicated lack of overall group differences on the task, although those with less severe reading problems in the dyslexia group were more accurate when reading words that were presented with articulatory facial speech cues. Collectively, our results suggest that children with dyslexia differ in their "mouth reliance" versus "mouth insensitivity," a profile that seems to be related to the severity of their reading problems.
Collapse
Affiliation(s)
- Martyna A Galazka
- Gillberg Neuropsychiatry Center, Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden.
| | - Nouchine Hadjikhani
- Gillberg Neuropsychiatry Center, Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden
- Harvard Medical School/MGH/MIT, Athinoula A. Martinos Center for Biomedical Imaging, Boston, MA, USA
| | - Maria Sundqvist
- Department of Education and Special Education, University of Gothenburg, Gothenburg, Sweden
| | - Jakob Åsberg Johnels
- Gillberg Neuropsychiatry Center, Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden.
- Section of Speech and Language Pathology, Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
24
|
Visual Influences on Auditory Behavioral, Neural, and Perceptual Processes: A Review. J Assoc Res Otolaryngol 2021; 22:365-386. [PMID: 34014416 PMCID: PMC8329114 DOI: 10.1007/s10162-021-00789-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/07/2021] [Indexed: 01/03/2023] Open
Abstract
In a naturalistic environment, auditory cues are often accompanied by information from other senses, which can be redundant with or complementary to the auditory information. Although the multisensory interactions derived from this combination of information and that shape auditory function are seen across all sensory modalities, our greatest body of knowledge to date centers on how vision influences audition. In this review, we attempt to capture the state of our understanding at this point in time regarding this topic. Following a general introduction, the review is divided into 5 sections. In the first section, we review the psychophysical evidence in humans regarding vision's influence in audition, making the distinction between vision's ability to enhance versus alter auditory performance and perception. Three examples are then described that serve to highlight vision's ability to modulate auditory processes: spatial ventriloquism, cross-modal dynamic capture, and the McGurk effect. The final part of this section discusses models that have been built based on available psychophysical data and that seek to provide greater mechanistic insights into how vision can impact audition. The second section reviews the extant neuroimaging and far-field imaging work on this topic, with a strong emphasis on the roles of feedforward and feedback processes, on imaging insights into the causal nature of audiovisual interactions, and on the limitations of current imaging-based approaches. These limitations point to a greater need for machine-learning-based decoding approaches toward understanding how auditory representations are shaped by vision. The third section reviews the wealth of neuroanatomical and neurophysiological data from animal models that highlights audiovisual interactions at the neuronal and circuit level in both subcortical and cortical structures. It also speaks to the functional significance of audiovisual interactions for two critically important facets of auditory perception-scene analysis and communication. The fourth section presents current evidence for alterations in audiovisual processes in three clinical conditions: autism, schizophrenia, and sensorineural hearing loss. These changes in audiovisual interactions are postulated to have cascading effects on higher-order domains of dysfunction in these conditions. The final section highlights ongoing work seeking to leverage our knowledge of audiovisual interactions to develop better remediation approaches to these sensory-based disorders, founded in concepts of perceptual plasticity in which vision has been shown to have the capacity to facilitate auditory learning.
Collapse
|
25
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
26
|
Abstract
Adaptive behavior in a complex, dynamic, and multisensory world poses some of the most fundamental computational challenges for the brain, notably inference, decision-making, learning, binding, and attention. We first discuss how the brain integrates sensory signals from the same source to support perceptual inference and decision-making by weighting them according to their momentary sensory uncertainties. We then show how observers solve the binding or causal inference problem-deciding whether signals come from common causes and should hence be integrated or else be treated independently. Next, we describe the multifarious interplay between multisensory processing and attention. We argue that attentional mechanisms are crucial to compute approximate solutions to the binding problem in naturalistic environments when complex time-varying signals arise from myriad causes. Finally, we review how the brain dynamically adapts multisensory processing to a changing world across multiple timescales.
Collapse
Affiliation(s)
- Uta Noppeney
- Donders Institute for Brain, Cognition and Behavior, Radboud University, 6525 AJ Nijmegen, The Netherlands;
| |
Collapse
|
27
|
Gonzales MG, Backer KC, Mandujano B, Shahin AJ. Rethinking the Mechanisms Underlying the McGurk Illusion. Front Hum Neurosci 2021; 15:616049. [PMID: 33867954 PMCID: PMC8046930 DOI: 10.3389/fnhum.2021.616049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
The McGurk illusion occurs when listeners hear an illusory percept (i.e., "da"), resulting from mismatched pairings of audiovisual (AV) speech stimuli (i.e., auditory/ba/paired with visual/ga/). Hearing a third percept-distinct from both the auditory and visual input-has been used as evidence of AV fusion. We examined whether the McGurk illusion is instead driven by visual dominance, whereby the third percept, e.g., "da," represents a default percept for visemes with an ambiguous place of articulation (POA), like/ga/. Participants watched videos of a talker uttering various consonant vowels (CVs) with (AV) and without (V-only) audios of/ba/. Individuals transcribed the CV they saw (V-only) or heard (AV). In the V-only condition, individuals predominantly saw "da"/"ta" when viewing CVs with indiscernible POAs. Likewise, in the AV condition, upon perceiving an illusion, they predominantly heard "da"/"ta" for CVs with indiscernible POAs. The illusion was stronger in individuals who exhibited weak/ba/auditory encoding (examined using a control auditory-only task). In Experiment2, we attempted to replicate these findings using stimuli recorded from a different talker. The V-only results were not replicated, but again individuals predominately heard "da"/"ta"/"tha" as an illusory percept for various AV combinations, and the illusion was stronger in individuals who exhibited weak/ba/auditory encoding. These results demonstrate that when visual CVs with indiscernible POAs are paired with a weakly encoded auditory/ba/, listeners default to hearing "da"/"ta"/"tha"-thus, tempering the AV fusion account, and favoring a default mechanism triggered when both AV stimuli are ambiguous.
Collapse
Affiliation(s)
- Mariel G. Gonzales
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Brenna Mandujano
- Department of Psychology, California State University, Fresno, Fresno, CA, United States
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| |
Collapse
|
28
|
Lindborg A, Andersen TS. Bayesian binding and fusion models explain illusion and enhancement effects in audiovisual speech perception. PLoS One 2021; 16:e0246986. [PMID: 33606815 PMCID: PMC7895372 DOI: 10.1371/journal.pone.0246986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 01/31/2021] [Indexed: 11/24/2022] Open
Abstract
Speech is perceived with both the ears and the eyes. Adding congruent visual speech improves the perception of a faint auditory speech stimulus, whereas adding incongruent visual speech can alter the perception of the utterance. The latter phenomenon is the case of the McGurk illusion, where an auditory stimulus such as e.g. "ba" dubbed onto a visual stimulus such as "ga" produces the illusion of hearing "da". Bayesian models of multisensory perception suggest that both the enhancement and the illusion case can be described as a two-step process of binding (informed by prior knowledge) and fusion (informed by the information reliability of each sensory cue). However, there is to date no study which has accounted for how they each contribute to audiovisual speech perception. In this study, we expose subjects to both congruent and incongruent audiovisual speech, manipulating the binding and the fusion stages simultaneously. This is done by varying both temporal offset (binding) and auditory and visual signal-to-noise ratio (fusion). We fit two Bayesian models to the behavioural data and show that they can both account for the enhancement effect in congruent audiovisual speech, as well as the McGurk illusion. This modelling approach allows us to disentangle the effects of binding and fusion on behavioural responses. Moreover, we find that these models have greater predictive power than a forced fusion model. This study provides a systematic and quantitative approach to measuring audiovisual integration in the perception of the McGurk illusion as well as congruent audiovisual speech, which we hope will inform future work on audiovisual speech perception.
Collapse
Affiliation(s)
- Alma Lindborg
- Department of Psychology, University of Potsdam, Potsdam, Germany
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Tobias S. Andersen
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
29
|
Jones SA, Noppeney U. Ageing and multisensory integration: A review of the evidence, and a computational perspective. Cortex 2021; 138:1-23. [PMID: 33676086 DOI: 10.1016/j.cortex.2021.02.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 01/23/2021] [Accepted: 02/02/2021] [Indexed: 11/29/2022]
Abstract
The processing of multisensory signals is crucial for effective interaction with the environment, but our ability to perform this vital function changes as we age. In the first part of this review, we summarise existing research into the effects of healthy ageing on multisensory integration. We note that age differences vary substantially with the paradigms and stimuli used: older adults often receive at least as much benefit (to both accuracy and response times) as younger controls from congruent multisensory stimuli, but are also consistently more negatively impacted by the presence of intersensory conflict. In the second part, we outline a normative Bayesian framework that provides a principled and computationally informed perspective on the key ingredients involved in multisensory perception, and how these are affected by ageing. Applying this framework to the existing literature, we conclude that changes to sensory reliability, prior expectations (together with attentional control), and decisional strategies all contribute to the age differences observed. However, we find no compelling evidence of any age-related changes to the basic inference mechanisms involved in multisensory perception.
Collapse
Affiliation(s)
- Samuel A Jones
- The Staffordshire Centre for Psychological Research, Staffordshire University, Stoke-on-Trent, UK.
| | - Uta Noppeney
- Donders Institute for Brain, Cognition & Behaviour, Radboud University, Nijmegen, the Netherlands.
| |
Collapse
|
30
|
Li L, Rehr R, Bruns P, Gerkmann T, Röder B. A Survey on Probabilistic Models in Human Perception and Machines. Front Robot AI 2021; 7:85. [PMID: 33501252 PMCID: PMC7805657 DOI: 10.3389/frobt.2020.00085] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 05/29/2020] [Indexed: 11/29/2022] Open
Abstract
Extracting information from noisy signals is of fundamental importance for both biological and artificial perceptual systems. To provide tractable solutions to this challenge, the fields of human perception and machine signal processing (SP) have developed powerful computational models, including Bayesian probabilistic models. However, little true integration between these fields exists in their applications of the probabilistic models for solving analogous problems, such as noise reduction, signal enhancement, and source separation. In this mini review, we briefly introduce and compare selective applications of probabilistic models in machine SP and human psychophysics. We focus on audio and audio-visual processing, using examples of speech enhancement, automatic speech recognition, audio-visual cue integration, source separation, and causal inference to illustrate the basic principles of the probabilistic approach. Our goal is to identify commonalities between probabilistic models addressing brain processes and those aiming at building intelligent machines. These commonalities could constitute the closest points for interdisciplinary convergence.
Collapse
Affiliation(s)
- Lux Li
- Biological Psychology and Neuropsychology, University of Hamburg, Hamburg, Germany
| | - Robert Rehr
- Signal Processing (SP), Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Patrick Bruns
- Biological Psychology and Neuropsychology, University of Hamburg, Hamburg, Germany
| | - Timo Gerkmann
- Signal Processing (SP), Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Brigitte Röder
- Biological Psychology and Neuropsychology, University of Hamburg, Hamburg, Germany
| |
Collapse
|
31
|
|
32
|
Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020; 133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.
Collapse
|
33
|
Plass J, Brang D, Suzuki S, Grabowecky M. Vision perceptually restores auditory spectral dynamics in speech. Proc Natl Acad Sci U S A 2020; 117:16920-16927. [PMID: 32632010 PMCID: PMC7382243 DOI: 10.1073/pnas.2002887117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.
Collapse
Affiliation(s)
- John Plass
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109;
- Department of Psychology, Northwestern University, Evanston, IL 60208
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Satoru Suzuki
- Department of Psychology, Northwestern University, Evanston, IL 60208
- Interdepartmental Neuroscience Program, Northwestern University, Chicago, IL 60611
| | - Marcia Grabowecky
- Department of Psychology, Northwestern University, Evanston, IL 60208
- Interdepartmental Neuroscience Program, Northwestern University, Chicago, IL 60611
| |
Collapse
|
34
|
Boyce WP, Lindsay A, Zgonnikov A, Rañó I, Wong-Lin K. Optimality and Limitations of Audio-Visual Integration for Cognitive Systems. Front Robot AI 2020; 7:94. [PMID: 33501261 PMCID: PMC7805627 DOI: 10.3389/frobt.2020.00094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 06/09/2020] [Indexed: 11/13/2022] Open
Abstract
Multimodal integration is an important process in perceptual decision-making. In humans, this process has often been shown to be statistically optimal, or near optimal: sensory information is combined in a fashion that minimizes the average error in perceptual representation of stimuli. However, sometimes there are costs that come with the optimization, manifesting as illusory percepts. We review audio-visual facilitations and illusions that are products of multisensory integration, and the computational models that account for these phenomena. In particular, the same optimal computational model can lead to illusory percepts, and we suggest that more studies should be needed to detect and mitigate these illusions, as artifacts in artificial cognitive systems. We provide cautionary considerations when designing artificial cognitive systems with the view of avoiding such artifacts. Finally, we suggest avenues of research toward solutions to potential pitfalls in system design. We conclude that detailed understanding of multisensory integration and the mechanisms behind audio-visual illusions can benefit the design of artificial cognitive systems.
Collapse
Affiliation(s)
- William Paul Boyce
- Intelligent Systems Research Centre, Ulster University, Magee Campus, Derry Londonderry, Northern Ireland, United Kingdom
| | - Anthony Lindsay
- Intelligent Systems Research Centre, Ulster University, Magee Campus, Derry Londonderry, Northern Ireland, United Kingdom
| | - Arkady Zgonnikov
- AiTech, Delft University of Technology, Delft, Netherlands
- Department of Cognitive Robotics, Faculty of Mechanical, Maritime, and Materials Engineering, Delft University of Technology, Delft, Netherlands
| | - Iñaki Rañó
- Intelligent Systems Research Centre, Ulster University, Magee Campus, Derry Londonderry, Northern Ireland, United Kingdom
| | - KongFatt Wong-Lin
- Intelligent Systems Research Centre, Ulster University, Magee Campus, Derry Londonderry, Northern Ireland, United Kingdom
| |
Collapse
|
35
|
French RL, DeAngelis GC. Multisensory neural processing: from cue integration to causal inference. CURRENT OPINION IN PHYSIOLOGY 2020; 16:8-13. [PMID: 32968701 DOI: 10.1016/j.cophys.2020.04.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Neurophysiological studies of multisensory processing have largely focused on how the brain integrates information from different sensory modalities to form a coherent percept. However, in the natural environment, an important extra step is needed: the brain faces the problem of causal inference, which involves determining whether different sources of sensory information arise from the same environmental cause, such that integrating them is advantageous Behavioral and computational studies have provided a strong foundation for studying causal inference, but studies of its neural basis have only recently been undertaken. This review focuses on recent advances regarding how the brain infers the causes of sensory inputs and uses this information to make robust perceptual estimates.
Collapse
Affiliation(s)
- Ranran L French
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY
| | - Gregory C DeAngelis
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY
| |
Collapse
|
36
|
Olasagasti I, Giraud AL. Integrating prediction errors at two time scales permits rapid recalibration of speech sound categories. eLife 2020; 9:44516. [PMID: 32223894 PMCID: PMC7217692 DOI: 10.7554/elife.44516] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 03/17/2020] [Indexed: 01/01/2023] Open
Abstract
Speech perception presumably arises from internal models of how specific sensory features are associated with speech sounds. These features change constantly (e.g. different speakers, articulation modes etc.), and listeners need to recalibrate their internal models by appropriately weighing new versus old evidence. Models of speech recalibration classically ignore this volatility. The effect of volatility in tasks where sensory cues were associated with arbitrary experimenter-defined categories were well described by models that continuously adapt the learning rate while keeping a single representation of the category. Using neurocomputational modelling we show that recalibration of natural speech sound categories is better described by representing the latter at different time scales. We illustrate our proposal by modeling fast recalibration of speech sounds after experiencing the McGurk effect. We propose that working representations of speech categories are driven both by their current environment and their long-term memory representations. People can distinguish words or syllables even though they may sound different with every speaker. This striking ability reflects the fact that our brain is continually modifying the way we recognise and interpret the spoken word based on what we have heard before, by comparing past experience with the most recent one to update expectations. This phenomenon also occurs in the McGurk effect: an auditory illusion in which someone hears one syllable but sees a person saying another syllable and ends up perceiving a third distinct sound. Abstract models, which provide a functional rather than a mechanistic description of what the brain does, can test how humans use expectations and prior knowledge to interpret the information delivered by the senses at any given moment. Olasagasti and Giraud have now built an abstract model of how brains recalibrate perception of natural speech sounds. By fitting the model with existing experimental data using the McGurk effect, the results suggest that, rather than using a single sound representation that is adjusted with each sensory experience, the brain recalibrates sounds at two different timescales. Over and above slow “procedural” learning, the findings show that there is also rapid recalibration of how different sounds are interpreted. This working representation of speech enables adaptation to changing or noisy environments and illustrates that the process is far more dynamic and flexible than previously thought.
Collapse
Affiliation(s)
- Itsaso Olasagasti
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
| |
Collapse
|
37
|
Colonius H, Diederich A. Formal models and quantitative measures of multisensory integration: a selective overview. Eur J Neurosci 2020; 51:1161-1178. [DOI: 10.1111/ejn.13813] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 12/18/2017] [Accepted: 12/20/2017] [Indexed: 11/26/2022]
Affiliation(s)
- Hans Colonius
- Department of Psychology Carl von Ossietzky Universität Oldenburg Oldenburg 26111 Germany
- Department of Psychological Sciences Purdue University West Lafayette IN USA
| | - Adele Diederich
- Department of Psychological Sciences Purdue University West Lafayette IN USA
- Life Sciences and Chemistry Jacobs University Bremen Bremen Germany
| |
Collapse
|
38
|
Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019; 8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open
Abstract
Visual information about speech content from the talker's mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.
Collapse
Affiliation(s)
- Patrick J Karas
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - John F Magnotti
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Brian A Metzger
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Lin L Zhu
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Kristen B Smith
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Daniel Yoshor
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | | |
Collapse
|
39
|
Lindborg A, Baart M, Stekelenburg JJ, Vroomen J, Andersen TS. Speech-specific audiovisual integration modulates induced theta-band oscillations. PLoS One 2019; 14:e0219744. [PMID: 31310616 PMCID: PMC6634411 DOI: 10.1371/journal.pone.0219744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4–8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.
Collapse
Affiliation(s)
- Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
40
|
Cao Y, Summerfield C, Park H, Giordano BL, Kayser C. Causal Inference in the Multisensory Brain. Neuron 2019; 102:1076-1087.e8. [PMID: 31047778 DOI: 10.1016/j.neuron.2019.03.043] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 02/18/2019] [Accepted: 03/27/2019] [Indexed: 01/13/2023]
Abstract
When combining information across different senses, humans need to flexibly select cues of a common origin while avoiding distraction from irrelevant inputs. The brain could solve this challenge using a hierarchical principle by deriving rapidly a fused sensory estimate for computational expediency and, later and if required, filtering out irrelevant signals based on the inferred sensory cause(s). Analyzing time- and source-resolved human magnetoencephalographic data, we unveil a systematic spatiotemporal cascade of the relevant computations, starting with early segregated unisensory representations, continuing with sensory fusion in parietal-temporal regions, and culminating as causal inference in the frontal lobe. Our results reconcile previous computational accounts of multisensory perception by showing that prefrontal cortex guides flexible integrative behavior based on candidate representations established in sensory and association cortices, thereby framing multisensory integration in the generalized context of adaptive behavior.
Collapse
Affiliation(s)
- Yinan Cao
- Department of Experimental Psychology, University of Oxford, Walton Street, Oxford OX2 6AE, UK.
| | - Christopher Summerfield
- Department of Experimental Psychology, University of Oxford, Walton Street, Oxford OX2 6AE, UK
| | - Hame Park
- Department for Cognitive Neuroscience and Cognitive Interaction Technology-Center of Excellence, Bielefeld University, 33615 Bielefeld, Germany
| | - Bruno Lucio Giordano
- Institut de Neurosciences de la Timone UMR 7289 Centre National de la Recherche Scientifique and Aix-Marseille Université, Marseille, France; Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QB, UK
| | - Christoph Kayser
- Department for Cognitive Neuroscience and Cognitive Interaction Technology-Center of Excellence, Bielefeld University, 33615 Bielefeld, Germany.
| |
Collapse
|
41
|
Causal inference accounts for heading perception in the presence of object motion. Proc Natl Acad Sci U S A 2019; 116:9060-9065. [PMID: 30996126 DOI: 10.1073/pnas.1820373116] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The brain infers our spatial orientation and properties of the world from ambiguous and noisy sensory cues. Judging self-motion (heading) in the presence of independently moving objects poses a challenging inference problem because the image motion of an object could be attributed to movement of the object, self-motion, or some combination of the two. We test whether perception of heading and object motion follows predictions of a normative causal inference framework. In a dual-report task, subjects indicated whether an object appeared stationary or moving in the virtual world, while simultaneously judging their heading. Consistent with causal inference predictions, the proportion of object stationarity reports, as well as the accuracy and precision of heading judgments, depended on the speed of object motion. Critically, biases in perceived heading declined when the object was perceived to be moving in the world. Our findings suggest that the brain interprets object motion and self-motion using a causal inference framework.
Collapse
|
42
|
Lalonde K, Werner LA. Perception of incongruent audiovisual English consonants. PLoS One 2019; 14:e0213588. [PMID: 30897109 PMCID: PMC6428273 DOI: 10.1371/journal.pone.0213588] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 02/25/2019] [Indexed: 11/21/2022] Open
Abstract
Causal inference—the process of deciding whether two incoming signals come from the same source—is an important step in audiovisual (AV) speech perception. This research explored causal inference and perception of incongruent AV English consonants. Nine adults were presented auditory, visual, congruent AV, and incongruent AV consonant-vowel syllables. Incongruent AV stimuli included auditory and visual syllables with matched vowels, but mismatched consonants. Open-set responses were collected. For most incongruent syllables, participants were aware of the mismatch between auditory and visual signals (59.04%) or reported the auditory syllable (33.73%). Otherwise, participants reported the visual syllable (1.13%) or some other syllable (6.11%). Statistical analyses were used to assess whether visual distinctiveness and place, voice, and manner features predicted responses. Mismatch responses occurred more when the auditory and visual consonants were visually distinct, when place and manner differed across auditory and visual consonants, and for consonants with high visual accuracy. Auditory responses occurred more when the auditory and visual consonants were visually similar, when place and manner were the same across auditory and visual stimuli, and with consonants produced further back in the mouth. Visual responses occurred more when voicing and manner were the same across auditory and visual stimuli, and for front and middle consonants. Other responses were variable, but typically matched the visual place, auditory voice, and auditory manner of the input. Overall, results indicate that causal inference and incongruent AV consonant perception depend on salience and reliability of auditory and visual inputs and degree of redundancy between auditory and visual inputs. A parameter-free computational model of incongruent AV speech perception based on unimodal confusions, with a causal inference rule, was applied. Data from the current study present an opportunity to test and improve the generalizability of current AV speech integration models.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| | - Lynne A. Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
43
|
Magnotti JF, Smith KB, Salinas M, Mays J, Zhu LL, Beauchamp MS. A causal inference explanation for enhancement of multisensory integration by co-articulation. Sci Rep 2018; 8:18032. [PMID: 30575791 PMCID: PMC6303389 DOI: 10.1038/s41598-018-36772-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 11/22/2018] [Indexed: 11/09/2022] Open
Abstract
The McGurk effect is a popular assay of multisensory integration in which participants report the illusory percept of "da" when presented with incongruent auditory "ba" and visual "ga" (AbaVga). While the original publication describing the effect found that 98% of participants perceived it, later studies reported much lower prevalence, ranging from 17% to 81%. Understanding the source of this variability is important for interpreting the panoply of studies that examine McGurk prevalence between groups, including clinical populations such as individuals with autism or schizophrenia. The original publication used stimuli consisting of multiple repetitions of a co-articulated syllable (three repetitions, AgagaVbaba). Later studies used stimuli without repetition or co-articulation (AbaVga) and used congruent syllables from the same talker as a control. In three experiments, we tested how stimulus repetition, co-articulation, and talker repetition affect McGurk prevalence. Repetition with co-articulation increased prevalence by 20%, while repetition without co-articulation and talker repetition had no effect. A fourth experiment compared the effect of the on-line testing used in the first three experiments with the in-person testing used in the original publication; no differences were observed. We interpret our results in the framework of causal inference: co-articulation increases the evidence that auditory and visual speech tokens arise from the same talker, increasing tolerance for content disparity and likelihood of integration. The results provide a principled explanation for how co-articulation aids multisensory integration and can explain the high prevalence of the McGurk effect in the initial publication.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA.
| | - Kristen B Smith
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Marcelo Salinas
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Jacqunae Mays
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Lin L Zhu
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Michael S Beauchamp
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA.
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
44
|
Brown VA, Hedayati M, Zanger A, Mayn S, Ray L, Dillman-Hasso N, Strand JF. What accounts for individual differences in susceptibility to the McGurk effect? PLoS One 2018; 13:e0207160. [PMID: 30418995 PMCID: PMC6231656 DOI: 10.1371/journal.pone.0207160] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 10/25/2018] [Indexed: 11/29/2022] Open
Abstract
The McGurk effect is a classic audiovisual speech illusion in which discrepant auditory and visual syllables can lead to a fused percept (e.g., an auditory /bɑ/ paired with a visual /gɑ/ often leads to the perception of /dɑ/). The McGurk effect is robust and easily replicated in pooled group data, but there is tremendous variability in the extent to which individual participants are susceptible to it. In some studies, the rate at which individuals report fusion responses ranges from 0% to 100%. Despite its widespread use in the audiovisual speech perception literature, the roots of the wide variability in McGurk susceptibility are largely unknown. This study evaluated whether several perceptual and cognitive traits are related to McGurk susceptibility through correlational analyses and mixed effects modeling. We found that an individual's susceptibility to the McGurk effect was related to their ability to extract place of articulation information from the visual signal (i.e., a more fine-grained analysis of lipreading ability), but not to scores on tasks measuring attentional control, processing speed, working memory capacity, or auditory perceptual gradiency. These results provide support for the claim that a small amount of the variability in susceptibility to the McGurk effect is attributable to lipreading skill. In contrast, cognitive and perceptual abilities that are commonly used predictors in individual differences studies do not appear to underlie susceptibility to the McGurk effect.
Collapse
Affiliation(s)
- Violet A. Brown
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Maryam Hedayati
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Annie Zanger
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Sasha Mayn
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Lucia Ray
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Naseem Dillman-Hasso
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Julia F. Strand
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| |
Collapse
|
45
|
Language Experience Changes Audiovisual Perception. Brain Sci 2018; 8:brainsci8050085. [PMID: 29751619 PMCID: PMC5977076 DOI: 10.3390/brainsci8050085] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 05/05/2018] [Accepted: 05/08/2018] [Indexed: 11/17/2022] Open
Abstract
Can experience change perception? Here, we examine whether language experience shapes the way individuals process auditory and visual information. We used the McGurk effect—the discovery that when people hear a speech sound (e.g., “ba”) and see a conflicting lip movement (e.g., “ga”), they recognize it as a completely new sound (e.g., “da”). This finding suggests that the brain fuses input across auditory and visual modalities demonstrates that what we hear is profoundly influenced by what we see. We find that cross-modal integration is affected by language background, with bilinguals experiencing the McGurk effect more than monolinguals. This increased reliance on the visual channel is not due to decreased language proficiency, as the effect was observed even among highly proficient bilinguals. Instead, we propose that the challenges of learning and monitoring multiple languages have lasting consequences for how individuals process auditory and visual information.
Collapse
|
46
|
Ozker M, Yoshor D, Beauchamp MS. Converging Evidence From Electrocorticography and BOLD fMRI for a Sharp Functional Boundary in Superior Temporal Gyrus Related to Multisensory Speech Processing. Front Hum Neurosci 2018; 12:141. [PMID: 29740294 PMCID: PMC5928751 DOI: 10.3389/fnhum.2018.00141] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 03/28/2018] [Indexed: 01/15/2023] Open
Abstract
Although humans can understand speech using the auditory modality alone, in noisy environments visual speech information from the talker’s mouth can rescue otherwise unintelligible auditory speech. To investigate the neural substrates of multisensory speech perception, we compared neural activity from the human superior temporal gyrus (STG) in two datasets. One dataset consisted of direct neural recordings (electrocorticography, ECoG) from surface electrodes implanted in epilepsy patients (this dataset has been previously published). The second dataset consisted of indirect measures of neural activity using blood oxygen level dependent functional magnetic resonance imaging (BOLD fMRI). Both ECoG and fMRI participants viewed the same clear and noisy audiovisual speech stimuli and performed the same speech recognition task. Both techniques demonstrated a sharp functional boundary in the STG, spatially coincident with an anatomical boundary defined by the posterior edge of Heschl’s gyrus. Cortex on the anterior side of the boundary responded more strongly to clear audiovisual speech than to noisy audiovisual speech while cortex on the posterior side of the boundary did not. For both ECoG and fMRI measurements, the transition between the functionally distinct regions happened within 10 mm of anterior-to-posterior distance along the STG. We relate this boundary to the multisensory neural code underlying speech perception and propose that it represents an important functional division within the human speech perception network.
Collapse
Affiliation(s)
- Muge Ozker
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, United States
| | - Daniel Yoshor
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, United States.,Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, United States
| |
Collapse
|
47
|
Noppeney U, Lee HL. Causal inference and temporal predictions in audiovisual perception of speech and music. Ann N Y Acad Sci 2018; 1423:102-116. [PMID: 29604082 DOI: 10.1111/nyas.13615] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Revised: 12/13/2017] [Accepted: 12/22/2017] [Indexed: 11/28/2022]
Abstract
To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses.
Collapse
Affiliation(s)
- Uta Noppeney
- Computational Neuroscience and Cognitive Robotics Centre, University of Birmingham, Birmingham, UK
| | - Hwee Ling Lee
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| |
Collapse
|
48
|
Morís Fernández L, Torralba M, Soto-Faraco S. Theta oscillations reflect conflict processing in the perception of the McGurk illusion. Eur J Neurosci 2018; 48:2630-2641. [DOI: 10.1111/ejn.13804] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 12/12/2017] [Accepted: 12/12/2017] [Indexed: 11/27/2022]
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Mireia Torralba
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Salvador Soto-Faraco
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
49
|
Magnotti JF, Basu Mallick D, Beauchamp MS. Reducing Playback Rate of Audiovisual Speech Leads to a Surprising Decrease in the McGurk Effect. Multisens Res 2018; 31:19-38. [DOI: 10.1163/22134808-00002586] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 06/03/2017] [Indexed: 11/19/2022]
Abstract
We report the unexpected finding that slowing video playback decreases perception of the McGurk effect. This reduction is counter-intuitive because the illusion depends on visual speech influencing the perception of auditory speech, and slowing speech should increase the amount of visual information available to observers. We recorded perceptual data from 110 subjects viewing audiovisual syllables (either McGurk or congruent control stimuli) played back at one of three rates: the rate used by the talker during recording (the natural rate), a slow rate (50% of natural), or a fast rate (200% of natural). We replicated previous studies showing dramatic variability in McGurk susceptibility at the natural rate, ranging from 0–100% across subjects and from 26–76% across the eight McGurk stimuli tested. Relative to the natural rate, slowed playback reduced the frequency of McGurk responses by 11% (79% of subjects showed a reduction) and reduced congruent accuracy by 3% (25% of subjects showed a reduction). Fast playback rate had little effect on McGurk responses or congruent accuracy. To determine whether our results are consistent with Bayesian integration, we constructed a Bayes-optimal model that incorporated two assumptions: individuals combine auditory and visual information according to their reliability, and changing playback rate affects sensory reliability. The model reproduced both our findings of large individual differences and the playback rate effect. This work illustrates that surprises remain in the McGurk effect and that Bayesian integration provides a useful framework for understanding audiovisual speech perception.
Collapse
Affiliation(s)
- John F. Magnotti
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| | | | - Michael S. Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
50
|
Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017; 38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open
Abstract
There are two main behavioral expressions of multisensory integration (MSI) in speech; the perceptual enhancement produced by the sight of the congruent lip movements of the speaker, and the illusory sound perceived when a speech syllable is dubbed with incongruent lip movements, in the McGurk effect. These two models have been used very often to study MSI. Here, we contend that, unlike congruent audiovisually (AV) speech, the McGurk effect involves brain areas related to conflict detection and resolution. To test this hypothesis, we used fMRI to measure blood oxygen level dependent responses to AV speech syllables. We analyzed brain activity as a function of the nature of the stimuli-McGurk or non-McGurk-and the perceptual outcome regarding MSI-integrated or not integrated response-in a 2 × 2 factorial design. The results showed that, regardless of perceptual outcome, AV mismatch activated general-purpose conflict areas (e.g., anterior cingulate cortex) as well as specific AV speech conflict areas (e.g., inferior frontal gyrus), compared with AV matching stimuli. Moreover, these conflict areas showed stronger activation on trials where the McGurk illusion was perceived compared with non-illusory trials, despite the stimuli where physically identical. We conclude that the AV incongruence in McGurk stimuli triggers the activation of conflict processing areas and that the process of resolving the cross-modal conflict is critical for the McGurk illusion to arise. Hum Brain Mapp 38:5691-5705, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy.,ImpAct Team, Lyon Neuroscience Research Center (UCBL1, INSERM 1028, CNRS 5292), Lyon, France
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|