Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Magnotti JF, Beauchamp MS. A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech. PLoS Comput Biol 2017;13:e1005229. [PMID: 28207734 DOI: 10.1371/journal.pcbi.1005229] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 11/01/2016] [Indexed: 11/19/2022] Open

For:	Magnotti JF, Beauchamp MS. A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech. PLoS Comput Biol 2017;13:e1005229. [PMID: 28207734 DOI: 10.1371/journal.pcbi.1005229] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 11/01/2016] [Indexed: 11/19/2022] Open

Number

Cited by Other Article(s)

Magnotti JF, Lado A, Beauchamp MS. The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers. Front Neurosci 2024;18:1421713. [PMID: 38988770 PMCID: PMC11233445 DOI: 10.3389/fnins.2024.1421713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/28/2024] [Indexed: 07/12/2024] Open

Abstract

In the McGurk effect, visual speech from the face of the talker alters the perception of auditory speech. The diversity of human languages has prompted many intercultural studies of the effect in both Western and non-Western cultures, including native Japanese speakers. Studies of large samples of native English speakers have shown that the McGurk effect is characterized by high variability in the susceptibility of different individuals to the illusion and in the strength of different experimental stimuli to induce the illusion. The noisy encoding of disparity (NED) model of the McGurk effect uses principles from Bayesian causal inference to account for this variability, separately estimating the susceptibility and sensory noise for each individual and the strength of each stimulus. To determine whether variation in McGurk perception is similar between Western and non-Western cultures, we applied the NED model to data collected from 80 native Japanese-speaking participants. Fifteen different McGurk stimuli that varied in syllable content (unvoiced auditory "pa" + visual "ka" or voiced auditory "ba" + visual "ga") were presented interleaved with audiovisual congruent stimuli. The McGurk effect was highly variable across stimuli and participants, with the percentage of illusory fusion responses ranging from 3 to 78% across stimuli and from 0 to 91% across participants. Despite this variability, the NED model accurately predicted perception, predicting fusion rates for individual stimuli with 2.1% error and for individual participants with 2.4% error. Stimuli containing the unvoiced pa/ka pairing evoked more fusion responses than the voiced ba/ga pairing. Model estimates of sensory noise were correlated with participant age, with greater sensory noise in older participants. The NED model of the McGurk effect offers a principled way to account for individual and stimulus differences when examining the McGurk effect in different cultures.

Collapse

Bujok R, Meyer AS, Bosker HR. Audiovisual Perception of Lexical Stress: Beat Gestures and Articulatory Cues. LANGUAGE AND SPEECH 2024:238309241258162. [PMID: 38877720 DOI: 10.1177/00238309241258162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]

Scheller M, Fang H, Sui J. Self as a prior: The malleability of Bayesian multisensory integration to social salience. Br J Psychol 2024;115:185-205. [PMID: 37747452 DOI: 10.1111/bjop.12683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 08/26/2023] [Accepted: 09/11/2023] [Indexed: 09/26/2023]

de Assis MF, Berti LC. Speech perception: Auditory and visual cue integration in children with and without phonological disorder in voiceless fricatives. CLINICAL LINGUISTICS & PHONETICS 2024:1-17. [PMID: 38560916 DOI: 10.1080/02699206.2024.2328792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024]

Dong C, Noppeney U, Wang S. Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli. Hum Brain Mapp 2024;45:e26653. [PMID: 38488460 DOI: 10.1002/hbm.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open

Abstract

Face-to-face communication relies on the integration of acoustic speech signals with the corresponding facial articulations. In the McGurk illusion, an auditory /ba/ phoneme presented simultaneously with a facial articulation of a /ga/ (i.e., viseme), is typically fused into an illusory 'da' percept. Despite its widespread use as an index of audiovisual speech integration, critics argue that it arises from perceptual processes that differ categorically from natural speech recognition. Conversely, Bayesian theoretical frameworks suggest that both the illusory McGurk and the veridical audiovisual congruent speech percepts result from probabilistic inference based on noisy sensory signals. According to these models, the inter-sensory conflict in McGurk stimuli may only increase observers' perceptual uncertainty. This functional magnetic resonance imaging (fMRI) study presented participants (20 male and 24 female) with audiovisual congruent, McGurk (i.e., auditory /ba/ + visual /ga/), and incongruent (i.e., auditory /ga/ + visual /ba/) stimuli along with their unisensory counterparts in a syllable categorization task. Behaviorally, observers' response entropy was greater for McGurk compared to congruent audiovisual stimuli. At the neural level, McGurk stimuli increased activations in a widespread neural system, extending from the inferior frontal sulci (IFS) to the pre-supplementary motor area (pre-SMA) and insulae, typically involved in cognitive control processes. Crucially, in line with Bayesian theories these activation increases were fully accounted for by observers' perceptual uncertainty as measured by their response entropy. Our findings suggest that McGurk and congruent speech processing rely on shared neural mechanisms, thereby supporting the McGurk illusion as a valid measure of natural audiovisual speech perception.

Collapse

Lee HH, Groves K, Ripollés P, Carrasco M. Audiovisual integration in the McGurk effect is impervious to music training. Sci Rep 2024;14:3262. [PMID: 38332159 PMCID: PMC10853564 DOI: 10.1038/s41598-024-53593-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 02/01/2024] [Indexed: 02/10/2024] Open

Jones SA, Noppeney U. Multisensory Integration and Causal Inference in Typical and Atypical Populations. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024;1437:59-76. [PMID: 38270853 DOI: 10.1007/978-981-99-7611-9_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]

Zou Z, Zhao B, Ting KH, Wong C, Hou X, Chan CCH. Multisensory integration augmenting motor processes among older adults. Front Aging Neurosci 2023;15:1293479. [PMID: 38192281 PMCID: PMC10773807 DOI: 10.3389/fnagi.2023.1293479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024] Open

Abstract

Objective

Multisensory integration enhances sensory processing in older adults. This study aimed to investigate how the sensory enhancement would modulate the motor related process in healthy older adults.

Method

Thirty-one older adults (12 males, mean age 67.7 years) and 29 younger adults as controls (16 males, mean age 24.9 years) participated in this study. Participants were asked to discriminate spatial information embedded in the unisensory (visual or audial) and multisensory (audiovisual) conditions. The responses made by the movements of the left and right wrists corresponding to the spatial information were registered with specially designed pads. The electroencephalogram (EEG) marker was the event-related super-additive P2 in the frontal-central region, the stimulus-locked lateralized readiness potentials (s-LRP) and response-locked lateralized readiness potentials (r-LRP).

Results

Older participants showed significantly faster and more accurate responses than controls in the multisensory condition than in the unisensory conditions. Both groups had significantly less negative-going s-LRP amplitudes elicited at the central sites in the between-condition contrasts. However, only the older group showed significantly less negative-going, centrally distributed r-LRP amplitudes. More importantly, only the r-LRP amplitude in the audiovisual condition significantly predicted behavioral performance.

Conclusion

Audiovisual integration enhances reaction time, which associates with modulated motor related processes among the older participants. The super-additive effects modulate both the motor preparation and generation processes. Interestingly, only the modulated motor generation process contributes to faster reaction time. As such effects were observed in older but not younger participants, multisensory integration likely augments motor functions in those with age-related neurodegeneration.

Collapse

Choi I, Demir I, Oh S, Lee SH. Multisensory integration in the mammalian brain: diversity and flexibility in health and disease. Philos Trans R Soc Lond B Biol Sci 2023;378:20220338. [PMID: 37545309 PMCID: PMC10404930 DOI: 10.1098/rstb.2022.0338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/30/2023] [Indexed: 08/08/2023] Open

Noel JP, Bill J, Ding H, Vastola J, DeAngelis GC, Angelaki DE, Drugowitsch J. Causal inference during closed-loop navigation: parsing of self- and object-motion. Philos Trans R Soc Lond B Biol Sci 2023;378:20220344. [PMID: 37545300 PMCID: PMC10404925 DOI: 10.1098/rstb.2022.0344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/20/2023] [Indexed: 08/08/2023] Open

Abstract

A key computation in building adaptive internal models of the external world is to ascribe sensory signals to their likely cause(s), a process of causal inference (CI). CI is well studied within the framework of two-alternative forced-choice tasks, but less well understood within the cadre of naturalistic action-perception loops. Here, we examine the process of disambiguating retinal motion caused by self- and/or object-motion during closed-loop navigation. First, we derive a normative account specifying how observers ought to intercept hidden and moving targets given their belief about (i) whether retinal motion was caused by the target moving, and (ii) if so, with what velocity. Next, in line with the modelling results, we show that humans report targets as stationary and steer towards their initial rather than final position more often when they are themselves moving, suggesting a putative misattribution of object-motion to the self. Further, we predict that observers should misattribute retinal motion more often: (i) during passive rather than active self-motion (given the lack of an efference copy informing self-motion estimates in the former), and (ii) when targets are presented eccentrically rather than centrally (given that lateral self-motion flow vectors are larger at eccentric locations during forward self-motion). Results support both of these predictions. Lastly, analysis of eye movements show that, while initial saccades toward targets were largely accurate regardless of the self-motion condition, subsequent gaze pursuit was modulated by target velocity during object-only motion, but not during concurrent object- and self-motion. These results demonstrate CI within action-perception loops, and suggest a protracted temporal unfolding of the computations characterizing CI. This article is part of the theme issue 'Decision and control processes in multisensory perception'.

Collapse

Meijer D, Noppeney U. Metacognition in the audiovisual McGurk illusion: perceptual and causal confidence. Philos Trans R Soc Lond B Biol Sci 2023;378:20220348. [PMID: 37545307 PMCID: PMC10404922 DOI: 10.1098/rstb.2022.0348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 07/02/2023] [Indexed: 08/08/2023] Open

Ma KST, Schnupp JWH. The unity hypothesis revisited: can the male/female incongruent McGurk effect be disrupted by familiarization and priming? Front Psychol 2023;14:1106562. [PMID: 37705948 PMCID: PMC10495566 DOI: 10.3389/fpsyg.2023.1106562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 08/10/2023] [Indexed: 09/15/2023] Open

Noel JP, Bill J, Ding H, Vastola J, DeAngelis GC, Angelaki DE, Drugowitsch J. Causal inference during closed-loop navigation: parsing of self- and object-motion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.27.525974. [PMID: 36778376 PMCID: PMC9915492 DOI: 10.1101/2023.01.27.525974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Association between different sensory modalities based on concurrent time series data obtained by a collaborative reservoir computing model. Sci Rep 2023;13:173. [PMID: 36600034 DOI: 10.1038/s41598-023-27385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 01/02/2023] [Indexed: 01/06/2023] Open

Abstract

Humans perceive the external world by integrating information from different modalities, obtained through the sensory organs. However, the aforementioned mechanism is still unclear and has been a subject of widespread interest in the fields of psychology and brain science. A model using two reservoir computing systems, i.e., a type of recurrent neural network trained to mimic each other's output, can detect stimulus patterns that repeatedly appear in a time series signal. We applied this model for identifying specific patterns that co-occur between information from different modalities. The model was self-organized by specific fluctuation patterns that co-occurred between different modalities, and could detect each fluctuation pattern. Additionally, similarly to the case where perception is influenced by synchronous/asynchronous presentation of multimodal stimuli, the model failed to work correctly for signals that did not co-occur with corresponding fluctuation patterns. Recent experimental studies have suggested that direct interaction between different sensory systems is important for multisensory integration, in addition to top-down control from higher brain regions such as the association cortex. Because several patterns of interaction between sensory modules can be incorporated into the employed model, we were able to compare the performance between them; the original version of the employed model incorporated such an interaction as the teaching signals for learning. The performance of the original and alternative models was evaluated, and the original model was found to perform the best. Thus, we demonstrated that feedback of the outputs of appropriately learned sensory modules performed the best when compared to the other examined patterns of interaction. The proposed model incorporated information encoded by the dynamic state of the neural population and the interactions between different sensory modules, both of which were based on recent experimental observations; this allowed us to study the influence of the temporal relationship and frequency of occurrence of multisensory signals on sensory integration, as well as the nature of interaction between different sensory signals.

Collapse

Fisher VL, Dean CL, Nave CS, Parkins EV, Kerkhoff WG, Kwakye LD. Increases in sensory noise predict attentional disruptions to audiovisual speech perception. Front Hum Neurosci 2023;16:1027335. [PMID: 36684833 PMCID: PMC9846366 DOI: 10.3389/fnhum.2022.1027335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/05/2022] [Indexed: 01/06/2023] Open

Begau A, Arnau S, Klatt LI, Wascher E, Getzmann S. Using visual speech at the cocktail-party: CNV evidence for early speech extraction in younger and older adults. Hear Res 2022;426:108636. [DOI: 10.1016/j.heares.2022.108636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/26/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022]

Wilbiks JMP, Brown VA, Strand JF. Speech and non-speech measures of audiovisual integration are not correlated. Atten Percept Psychophys 2022;84:1809-1819. [PMID: 35610409 PMCID: PMC10699539 DOI: 10.3758/s13414-022-02517-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2022] [Indexed: 11/08/2022]

Shams L, Beierholm U. Bayesian causal inference: A unifying neuroscience theory. Neurosci Biobehav Rev 2022;137:104619. [PMID: 35331819 DOI: 10.1016/j.neubiorev.2022.104619] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 02/21/2022] [Accepted: 03/10/2022] [Indexed: 01/08/2023]

Rennig J, Beauchamp MS. Intelligibility of audiovisual sentences drives multivoxel response patterns in human superior temporal cortex. Neuroimage 2022;247:118796. [PMID: 34906712 PMCID: PMC8819942 DOI: 10.1016/j.neuroimage.2021.118796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 11/18/2021] [Accepted: 12/08/2021] [Indexed: 11/18/2022] Open

Wahn B, Schmitz L, Kingstone A, Böckler-Raettig A. When eyes beat lips: speaker gaze affects audiovisual integration in the McGurk illusion. PSYCHOLOGICAL RESEARCH 2021;86:1930-1943. [PMID: 34854983 PMCID: PMC9363401 DOI: 10.1007/s00426-021-01618-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 11/10/2021] [Indexed: 11/26/2022]

Laeng B, Kuyateh S, Kelkar T. Substituting facial movements in singers changes the sounds of musical intervals. Sci Rep 2021;11:22442. [PMID: 34789775 PMCID: PMC8599708 DOI: 10.1038/s41598-021-01797-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 10/26/2021] [Indexed: 11/18/2022] Open

Ferrari A, Noppeney U. Attention controls multisensory perception via two distinct mechanisms at different levels of the cortical hierarchy. PLoS Biol 2021;19:e3001465. [PMID: 34793436 PMCID: PMC8639080 DOI: 10.1371/journal.pbio.3001465] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/02/2021] [Accepted: 11/01/2021] [Indexed: 11/22/2022] Open

Galazka MA, Hadjikhani N, Sundqvist M, Åsberg Johnels J. Facial speech processing in children with and without dyslexia. ANNALS OF DYSLEXIA 2021;71:501-524. [PMID: 34115279 PMCID: PMC8458188 DOI: 10.1007/s11881-021-00231-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 05/10/2021] [Indexed: 06/04/2023]

Visual Influences on Auditory Behavioral, Neural, and Perceptual Processes: A Review. J Assoc Res Otolaryngol 2021;22:365-386. [PMID: 34014416 PMCID: PMC8329114 DOI: 10.1007/s10162-021-00789-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/07/2021] [Indexed: 01/03/2023] Open

Abstract

In a naturalistic environment, auditory cues are often accompanied by information from other senses, which can be redundant with or complementary to the auditory information. Although the multisensory interactions derived from this combination of information and that shape auditory function are seen across all sensory modalities, our greatest body of knowledge to date centers on how vision influences audition. In this review, we attempt to capture the state of our understanding at this point in time regarding this topic. Following a general introduction, the review is divided into 5 sections. In the first section, we review the psychophysical evidence in humans regarding vision's influence in audition, making the distinction between vision's ability to enhance versus alter auditory performance and perception. Three examples are then described that serve to highlight vision's ability to modulate auditory processes: spatial ventriloquism, cross-modal dynamic capture, and the McGurk effect. The final part of this section discusses models that have been built based on available psychophysical data and that seek to provide greater mechanistic insights into how vision can impact audition. The second section reviews the extant neuroimaging and far-field imaging work on this topic, with a strong emphasis on the roles of feedforward and feedback processes, on imaging insights into the causal nature of audiovisual interactions, and on the limitations of current imaging-based approaches. These limitations point to a greater need for machine-learning-based decoding approaches toward understanding how auditory representations are shaped by vision. The third section reviews the wealth of neuroanatomical and neurophysiological data from animal models that highlights audiovisual interactions at the neuronal and circuit level in both subcortical and cortical structures. It also speaks to the functional significance of audiovisual interactions for two critically important facets of auditory perception-scene analysis and communication. The fourth section presents current evidence for alterations in audiovisual processes in three clinical conditions: autism, schizophrenia, and sensorineural hearing loss. These changes in audiovisual interactions are postulated to have cascading effects on higher-order domains of dysfunction in these conditions. The final section highlights ongoing work seeking to leverage our knowledge of audiovisual interactions to develop better remediation approaches to these sensory-based disorders, founded in concepts of perceptual plasticity in which vision has been shown to have the capacity to facilitate auditory learning.

Collapse

Rethinking the McGurk effect as a perceptual illusion. Atten Percept Psychophys 2021;83:2583-2598. [PMID: 33884572 DOI: 10.3758/s13414-021-02265-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/25/2021] [Indexed: 11/08/2022]

Noppeney U. Perceptual Inference, Learning, and Attention in a Multisensory World. Annu Rev Neurosci 2021;44:449-473. [PMID: 33882258 DOI: 10.1146/annurev-neuro-100120-085519] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Gonzales MG, Backer KC, Mandujano B, Shahin AJ. Rethinking the Mechanisms Underlying the McGurk Illusion. Front Hum Neurosci 2021;15:616049. [PMID: 33867954 PMCID: PMC8046930 DOI: 10.3389/fnhum.2021.616049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open

Lindborg A, Andersen TS. Bayesian binding and fusion models explain illusion and enhancement effects in audiovisual speech perception. PLoS One 2021;16:e0246986. [PMID: 33606815 PMCID: PMC7895372 DOI: 10.1371/journal.pone.0246986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 01/31/2021] [Indexed: 11/24/2022] Open

Jones SA, Noppeney U. Ageing and multisensory integration: A review of the evidence, and a computational perspective. Cortex 2021;138:1-23. [PMID: 33676086 DOI: 10.1016/j.cortex.2021.02.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 01/23/2021] [Accepted: 02/02/2021] [Indexed: 11/29/2022]

Li L, Rehr R, Bruns P, Gerkmann T, Röder B. A Survey on Probabilistic Models in Human Perception and Machines. Front Robot AI 2021;7:85. [PMID: 33501252 PMCID: PMC7805657 DOI: 10.3389/frobt.2020.00085] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 05/29/2020] [Indexed: 11/29/2022] Open

The problem with communication stress from face masks. JOURNAL OF AFFECTIVE DISORDERS REPORTS 2021. [DOI: 10.1016/j.jadr.2020.100069] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020;133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]

Abstract

The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.

Collapse

Plass J, Brang D, Suzuki S, Grabowecky M. Vision perceptually restores auditory spectral dynamics in speech. Proc Natl Acad Sci U S A 2020;117:16920-16927. [PMID: 32632010 PMCID: PMC7382243 DOI: 10.1073/pnas.2002887117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

Collapse

Boyce WP, Lindsay A, Zgonnikov A, Rañó I, Wong-Lin K. Optimality and Limitations of Audio-Visual Integration for Cognitive Systems. Front Robot AI 2020;7:94. [PMID: 33501261 PMCID: PMC7805627 DOI: 10.3389/frobt.2020.00094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 06/09/2020] [Indexed: 11/13/2022] Open

French RL, DeAngelis GC. Multisensory neural processing: from cue integration to causal inference. CURRENT OPINION IN PHYSIOLOGY 2020;16:8-13. [PMID: 32968701 DOI: 10.1016/j.cophys.2020.04.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Olasagasti I, Giraud AL. Integrating prediction errors at two time scales permits rapid recalibration of speech sound categories. eLife 2020;9:44516. [PMID: 32223894 PMCID: PMC7217692 DOI: 10.7554/elife.44516] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 03/17/2020] [Indexed: 01/01/2023] Open

Abstract

Speech perception presumably arises from internal models of how specific sensory features are associated with speech sounds. These features change constantly (e.g. different speakers, articulation modes etc.), and listeners need to recalibrate their internal models by appropriately weighing new versus old evidence. Models of speech recalibration classically ignore this volatility. The effect of volatility in tasks where sensory cues were associated with arbitrary experimenter-defined categories were well described by models that continuously adapt the learning rate while keeping a single representation of the category. Using neurocomputational modelling we show that recalibration of natural speech sound categories is better described by representing the latter at different time scales. We illustrate our proposal by modeling fast recalibration of speech sounds after experiencing the McGurk effect. We propose that working representations of speech categories are driven both by their current environment and their long-term memory representations.

People can distinguish words or syllables even though they may sound different with every speaker. This striking ability reflects the fact that our brain is continually modifying the way we recognise and interpret the spoken word based on what we have heard before, by comparing past experience with the most recent one to update expectations. This phenomenon also occurs in the McGurk effect: an auditory illusion in which someone hears one syllable but sees a person saying another syllable and ends up perceiving a third distinct sound.

Abstract models, which provide a functional rather than a mechanistic description of what the brain does, can test how humans use expectations and prior knowledge to interpret the information delivered by the senses at any given moment. Olasagasti and Giraud have now built an abstract model of how brains recalibrate perception of natural speech sounds. By fitting the model with existing experimental data using the McGurk effect, the results suggest that, rather than using a single sound representation that is adjusted with each sensory experience, the brain recalibrates sounds at two different timescales.

Over and above slow “procedural” learning, the findings show that there is also rapid recalibration of how different sounds are interpreted. This working representation of speech enables adaptation to changing or noisy environments and illustrates that the process is far more dynamic and flexible than previously thought.

Collapse

Colonius H, Diederich A. Formal models and quantitative measures of multisensory integration: a selective overview. Eur J Neurosci 2020;51:1161-1178. [DOI: 10.1111/ejn.13813] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 12/18/2017] [Accepted: 12/20/2017] [Indexed: 11/26/2022]

Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019;8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open

Lindborg A, Baart M, Stekelenburg JJ, Vroomen J, Andersen TS. Speech-specific audiovisual integration modulates induced theta-band oscillations. PLoS One 2019;14:e0219744. [PMID: 31310616 PMCID: PMC6634411 DOI: 10.1371/journal.pone.0219744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open

Abstract

Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4–8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.

Collapse

Cao Y, Summerfield C, Park H, Giordano BL, Kayser C. Causal Inference in the Multisensory Brain. Neuron 2019;102:1076-1087.e8. [PMID: 31047778 DOI: 10.1016/j.neuron.2019.03.043] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 02/18/2019] [Accepted: 03/27/2019] [Indexed: 01/13/2023]

Causal inference accounts for heading perception in the presence of object motion. Proc Natl Acad Sci U S A 2019;116:9060-9065. [PMID: 30996126 DOI: 10.1073/pnas.1820373116] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Lalonde K, Werner LA. Perception of incongruent audiovisual English consonants. PLoS One 2019;14:e0213588. [PMID: 30897109 PMCID: PMC6428273 DOI: 10.1371/journal.pone.0213588] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 02/25/2019] [Indexed: 11/21/2022] Open

Abstract

Causal inference—the process of deciding whether two incoming signals come from the same source—is an important step in audiovisual (AV) speech perception. This research explored causal inference and perception of incongruent AV English consonants. Nine adults were presented auditory, visual, congruent AV, and incongruent AV consonant-vowel syllables. Incongruent AV stimuli included auditory and visual syllables with matched vowels, but mismatched consonants. Open-set responses were collected. For most incongruent syllables, participants were aware of the mismatch between auditory and visual signals (59.04%) or reported the auditory syllable (33.73%). Otherwise, participants reported the visual syllable (1.13%) or some other syllable (6.11%). Statistical analyses were used to assess whether visual distinctiveness and place, voice, and manner features predicted responses. Mismatch responses occurred more when the auditory and visual consonants were visually distinct, when place and manner differed across auditory and visual consonants, and for consonants with high visual accuracy. Auditory responses occurred more when the auditory and visual consonants were visually similar, when place and manner were the same across auditory and visual stimuli, and with consonants produced further back in the mouth. Visual responses occurred more when voicing and manner were the same across auditory and visual stimuli, and for front and middle consonants. Other responses were variable, but typically matched the visual place, auditory voice, and auditory manner of the input. Overall, results indicate that causal inference and incongruent AV consonant perception depend on salience and reliability of auditory and visual inputs and degree of redundancy between auditory and visual inputs. A parameter-free computational model of incongruent AV speech perception based on unimodal confusions, with a causal inference rule, was applied. Data from the current study present an opportunity to test and improve the generalizability of current AV speech integration models.

Collapse

Magnotti JF, Smith KB, Salinas M, Mays J, Zhu LL, Beauchamp MS. A causal inference explanation for enhancement of multisensory integration by co-articulation. Sci Rep 2018;8:18032. [PMID: 30575791 PMCID: PMC6303389 DOI: 10.1038/s41598-018-36772-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 11/22/2018] [Indexed: 11/09/2022] Open

Brown VA, Hedayati M, Zanger A, Mayn S, Ray L, Dillman-Hasso N, Strand JF. What accounts for individual differences in susceptibility to the McGurk effect? PLoS One 2018;13:e0207160. [PMID: 30418995 PMCID: PMC6231656 DOI: 10.1371/journal.pone.0207160] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 10/25/2018] [Indexed: 11/29/2022] Open

Language Experience Changes Audiovisual Perception. Brain Sci 2018;8:brainsci8050085. [PMID: 29751619 PMCID: PMC5977076 DOI: 10.3390/brainsci8050085] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 05/05/2018] [Accepted: 05/08/2018] [Indexed: 11/17/2022] Open

Ozker M, Yoshor D, Beauchamp MS. Converging Evidence From Electrocorticography and BOLD fMRI for a Sharp Functional Boundary in Superior Temporal Gyrus Related to Multisensory Speech Processing. Front Hum Neurosci 2018;12:141. [PMID: 29740294 PMCID: PMC5928751 DOI: 10.3389/fnhum.2018.00141] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 03/28/2018] [Indexed: 01/15/2023] Open

Noppeney U, Lee HL. Causal inference and temporal predictions in audiovisual perception of speech and music. Ann N Y Acad Sci 2018;1423:102-116. [PMID: 29604082 DOI: 10.1111/nyas.13615] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Revised: 12/13/2017] [Accepted: 12/22/2017] [Indexed: 11/28/2022]

Morís Fernández L, Torralba M, Soto-Faraco S. Theta oscillations reflect conflict processing in the perception of the McGurk illusion. Eur J Neurosci 2018;48:2630-2641. [DOI: 10.1111/ejn.13804] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 12/12/2017] [Accepted: 12/12/2017] [Indexed: 11/27/2022]

Magnotti JF, Basu Mallick D, Beauchamp MS. Reducing Playback Rate of Audiovisual Speech Leads to a Surprising Decrease in the McGurk Effect. Multisens Res 2018;31:19-38. [DOI: 10.1163/22134808-00002586] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 06/03/2017] [Indexed: 11/19/2022]

Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017;38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open