1
|
Zou T, Li L, Huang X, Deng C, Wang X, Gao Q, Chen H, Li R. Dynamic causal modeling analysis reveals the modulation of motor cortex and integration in superior temporal gyrus during multisensory speech perception. Cogn Neurodyn 2024; 18:931-946. [PMID: 38826672 PMCID: PMC11143173 DOI: 10.1007/s11571-023-09945-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 02/03/2023] [Accepted: 02/10/2023] [Indexed: 03/06/2023] Open
Abstract
The processing of speech information from various sensory modalities is crucial for human communication. Both left posterior superior temporal gyrus (pSTG) and motor cortex importantly involve in the multisensory speech perception. However, the dynamic integration of primary sensory regions to pSTG and the motor cortex remain unclear. Here, we implemented a behavioral experiment of classical McGurk effect paradigm and acquired the task functional magnetic resonance imaging (fMRI) data during synchronized audiovisual syllabic perception from 63 normal adults. We conducted dynamic causal modeling (DCM) analysis to explore the cross-modal interactions among the left pSTG, left precentral gyrus (PrG), left middle superior temporal gyrus (mSTG), and left fusiform gyrus (FuG). Bayesian model selection favored a winning model that included modulations of connections to PrG (mSTG → PrG, FuG → PrG), from PrG (PrG → mSTG, PrG → FuG), and to pSTG (mSTG → pSTG, FuG → pSTG). Moreover, the coupling strength of the above connections correlated with behavioral McGurk susceptibility. In addition, significant differences were found in the coupling strength of these connections between strong and weak McGurk perceivers. Strong perceivers modulated less inhibitory visual influence, allowed less excitatory auditory information flowing into PrG, but integrated more audiovisual information in pSTG. Taken together, our findings show that the PrG and pSTG interact dynamically with primary cortices during audiovisual speech, and support the motor cortex plays a specifically functional role in modulating the gain and salience between auditory and visual modalities. Supplementary Information The online version contains supplementary material available at 10.1007/s11571-023-09945-z.
Collapse
Affiliation(s)
- Ting Zou
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Liyuan Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Xinju Huang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Chijun Deng
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Xuyang Wang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Qing Gao
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Huafu Chen
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| | - Rong Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054 People’s Republic of China
| |
Collapse
|
2
|
Marchand Knight J, Sares AG, Deroche MLD. Visual biases in evaluation of speakers' and singers' voice type by cis and trans listeners. Front Psychol 2023; 14:1046672. [PMID: 37205083 PMCID: PMC10187036 DOI: 10.3389/fpsyg.2023.1046672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 03/29/2023] [Indexed: 05/21/2023] Open
Abstract
Introduction A singer's or speaker's Fach (voice type) should be appraised based on acoustic cues characterizing their voice. Instead, in practice, it is often influenced by the individual's physical appearance. This is especially distressful for transgender people who may be excluded from formal singing because of perceived mismatch between their voice and appearance. To eventually break down these visual biases, we need a better understanding of the conditions under which they occur. Specifically, we hypothesized that trans listeners (not actors) would be better able to resist such biases, relative to cis listeners, precisely because they would be more aware of appearance-voice dissociations. Methods In an online study, 85 cisgender and 81 transgender participants were presented with 18 different actors singing or speaking short sentences. These actors covered six voice categories from high/bright (traditionally feminine) to low/dark (traditionally masculine) voices: namely soprano, mezzo-soprano (referred to henceforth as mezzo), contralto (referred to henceforth as alto), tenor, baritone, and bass. Every participant provided voice type ratings for (1) Audio-only (A) stimuli to get an unbiased estimate of a given actor's voice type, (2) Video-only (V) stimuli to get an estimate of the strength of the bias itself, and (3) combined Audio-Visual (AV) stimuli to see how much visual cues would affect the evaluation of the audio. Results Results demonstrated that visual biases are not subtle and hold across the entire scale, shifting voice appraisal by about a third of the distance between adjacent voice types (for example, a third of the bass-to-baritone distance). This shift was 30% smaller for trans than for cis listeners, confirming our main hypothesis. This pattern was largely similar whether actors sung or spoke, though singing overall led to more feminine/high/bright ratings. Conclusion This study is one of the first demonstrations that transgender listeners are in fact better judges of a singer's or speaker's voice type because they are better able to separate the actors' voice from their appearance, a finding that opens exciting avenues to fight more generally against implicit (or sometimes explicit) biases in voice appraisal.
Collapse
|
3
|
Hong F, Badde S, Landy MS. Repeated exposure to either consistently spatiotemporally congruent or consistently incongruent audiovisual stimuli modulates the audiovisual common-cause prior. Sci Rep 2022; 12:15532. [PMID: 36109544 PMCID: PMC9478143 DOI: 10.1038/s41598-022-19041-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 08/23/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractTo estimate an environmental property such as object location from multiple sensory signals, the brain must infer their causal relationship. Only information originating from the same source should be integrated. This inference relies on the characteristics of the measurements, the information the sensory modalities provide on a given trial, as well as on a cross-modal common-cause prior: accumulated knowledge about the probability that cross-modal measurements originate from the same source. We examined the plasticity of this cross-modal common-cause prior. In a learning phase, participants were exposed to a series of audiovisual stimuli that were either consistently spatiotemporally congruent or consistently incongruent; participants’ audiovisual spatial integration was measured before and after this exposure. We fitted several Bayesian causal-inference models to the data; the models differed in the plasticity of the common-source prior. Model comparison revealed that, for the majority of the participants, the common-cause prior changed during the learning phase. Our findings reveal that short periods of exposure to audiovisual stimuli with a consistent causal relationship can modify the common-cause prior. In accordance with previous studies, both exposure conditions could either strengthen or weaken the common-cause prior at the participant level. Simulations imply that the direction of the prior-update might be mediated by the degree of sensory noise, the variability of the measurements of the same signal across trials, during the learning phase.
Collapse
|
4
|
Wilbiks JMP, Brown VA, Strand JF. Speech and non-speech measures of audiovisual integration are not correlated. Atten Percept Psychophys 2022; 84:1809-1819. [PMID: 35610409 PMCID: PMC10699539 DOI: 10.3758/s13414-022-02517-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2022] [Indexed: 11/08/2022]
Abstract
Many natural events generate both visual and auditory signals, and humans are remarkably adept at integrating information from those sources. However, individuals appear to differ markedly in their ability or propensity to combine what they hear with what they see. Individual differences in audiovisual integration have been established using a range of materials, including speech stimuli (seeing and hearing a talker) and simpler audiovisual stimuli (seeing flashes of light combined with tones). Although there are multiple tasks in the literature that are referred to as "measures of audiovisual integration," the tasks themselves differ widely with respect to both the type of stimuli used (speech versus non-speech) and the nature of the tasks themselves (e.g., some tasks use conflicting auditory and visual stimuli whereas others use congruent stimuli). It is not clear whether these varied tasks are actually measuring the same underlying construct: audiovisual integration. This study tested the relationships among four commonly-used measures of audiovisual integration, two of which use speech stimuli (susceptibility to the McGurk effect and a measure of audiovisual benefit), and two of which use non-speech stimuli (the sound-induced flash illusion and audiovisual integration capacity). We replicated previous work showing large individual differences in each measure but found no significant correlations among any of the measures. These results suggest that tasks that are commonly referred to as measures of audiovisual integration may be tapping into different parts of the same process or different constructs entirely.
Collapse
Affiliation(s)
| | - Violet A Brown
- Department of Psychological & Brain Sciences, Washington University in St. Louis, Saint Louis, MO, USA
| | - Julia F Strand
- Department of Psychology, Carleton College, Northfield, MN, USA
| |
Collapse
|
5
|
Abstract
Adaptive behavior in a complex, dynamic, and multisensory world poses some of the most fundamental computational challenges for the brain, notably inference, decision-making, learning, binding, and attention. We first discuss how the brain integrates sensory signals from the same source to support perceptual inference and decision-making by weighting them according to their momentary sensory uncertainties. We then show how observers solve the binding or causal inference problem-deciding whether signals come from common causes and should hence be integrated or else be treated independently. Next, we describe the multifarious interplay between multisensory processing and attention. We argue that attentional mechanisms are crucial to compute approximate solutions to the binding problem in naturalistic environments when complex time-varying signals arise from myriad causes. Finally, we review how the brain dynamically adapts multisensory processing to a changing world across multiple timescales.
Collapse
Affiliation(s)
- Uta Noppeney
- Donders Institute for Brain, Cognition and Behavior, Radboud University, 6525 AJ Nijmegen, The Netherlands;
| |
Collapse
|
6
|
Gonzales MG, Backer KC, Mandujano B, Shahin AJ. Rethinking the Mechanisms Underlying the McGurk Illusion. Front Hum Neurosci 2021; 15:616049. [PMID: 33867954 PMCID: PMC8046930 DOI: 10.3389/fnhum.2021.616049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
The McGurk illusion occurs when listeners hear an illusory percept (i.e., "da"), resulting from mismatched pairings of audiovisual (AV) speech stimuli (i.e., auditory/ba/paired with visual/ga/). Hearing a third percept-distinct from both the auditory and visual input-has been used as evidence of AV fusion. We examined whether the McGurk illusion is instead driven by visual dominance, whereby the third percept, e.g., "da," represents a default percept for visemes with an ambiguous place of articulation (POA), like/ga/. Participants watched videos of a talker uttering various consonant vowels (CVs) with (AV) and without (V-only) audios of/ba/. Individuals transcribed the CV they saw (V-only) or heard (AV). In the V-only condition, individuals predominantly saw "da"/"ta" when viewing CVs with indiscernible POAs. Likewise, in the AV condition, upon perceiving an illusion, they predominantly heard "da"/"ta" for CVs with indiscernible POAs. The illusion was stronger in individuals who exhibited weak/ba/auditory encoding (examined using a control auditory-only task). In Experiment2, we attempted to replicate these findings using stimuli recorded from a different talker. The V-only results were not replicated, but again individuals predominately heard "da"/"ta"/"tha" as an illusory percept for various AV combinations, and the illusion was stronger in individuals who exhibited weak/ba/auditory encoding. These results demonstrate that when visual CVs with indiscernible POAs are paired with a weakly encoded auditory/ba/, listeners default to hearing "da"/"ta"/"tha"-thus, tempering the AV fusion account, and favoring a default mechanism triggered when both AV stimuli are ambiguous.
Collapse
Affiliation(s)
- Mariel G. Gonzales
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Kristina C. Backer
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| | - Brenna Mandujano
- Department of Psychology, California State University, Fresno, Fresno, CA, United States
| | - Antoine J. Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| |
Collapse
|
7
|
Lindborg A, Andersen TS. Bayesian binding and fusion models explain illusion and enhancement effects in audiovisual speech perception. PLoS One 2021; 16:e0246986. [PMID: 33606815 PMCID: PMC7895372 DOI: 10.1371/journal.pone.0246986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 01/31/2021] [Indexed: 11/24/2022] Open
Abstract
Speech is perceived with both the ears and the eyes. Adding congruent visual speech improves the perception of a faint auditory speech stimulus, whereas adding incongruent visual speech can alter the perception of the utterance. The latter phenomenon is the case of the McGurk illusion, where an auditory stimulus such as e.g. "ba" dubbed onto a visual stimulus such as "ga" produces the illusion of hearing "da". Bayesian models of multisensory perception suggest that both the enhancement and the illusion case can be described as a two-step process of binding (informed by prior knowledge) and fusion (informed by the information reliability of each sensory cue). However, there is to date no study which has accounted for how they each contribute to audiovisual speech perception. In this study, we expose subjects to both congruent and incongruent audiovisual speech, manipulating the binding and the fusion stages simultaneously. This is done by varying both temporal offset (binding) and auditory and visual signal-to-noise ratio (fusion). We fit two Bayesian models to the behavioural data and show that they can both account for the enhancement effect in congruent audiovisual speech, as well as the McGurk illusion. This modelling approach allows us to disentangle the effects of binding and fusion on behavioural responses. Moreover, we find that these models have greater predictive power than a forced fusion model. This study provides a systematic and quantitative approach to measuring audiovisual integration in the perception of the McGurk illusion as well as congruent audiovisual speech, which we hope will inform future work on audiovisual speech perception.
Collapse
Affiliation(s)
- Alma Lindborg
- Department of Psychology, University of Potsdam, Potsdam, Germany
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Tobias S. Andersen
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
8
|
Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure. Sci Rep 2020; 10:18009. [PMID: 33093570 PMCID: PMC7583249 DOI: 10.1038/s41598-020-75201-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 10/05/2020] [Indexed: 11/08/2022] Open
Abstract
In face-to-face communication, audio-visual (AV) stimuli can be fused, combined or perceived as mismatching. While the left superior temporal sulcus (STS) is presumably the locus of AV integration, the process leading to combination is unknown. Based on previous modelling work, we hypothesize that combination results from a complex dynamic originating in a failure to integrate AV inputs, followed by a reconstruction of the most plausible AV sequence. In two different behavioural tasks and one MEG experiment, we observed that combination is more time demanding than fusion. Using time-/source-resolved human MEG analyses with linear and dynamic causal models, we show that both fusion and combination involve early detection of AV incongruence in the STS, whereas combination is further associated with enhanced activity of AV asynchrony-sensitive regions (auditory and inferior frontal cortices). Based on neural signal decoding, we finally show that only combination can be decoded from the IFG activity and that combination is decoded later than fusion in the STS. These results indicate that the AV speech integration outcome primarily depends on whether the STS converges or not onto an existing multimodal syllable representation, and that combination results from subsequent temporal processing, presumably the off-line re-ordering of incongruent AV stimuli.
Collapse
|
9
|
Thézé R, Gadiri MA, Albert L, Provost A, Giraud AL, Mégevand P. Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments. Sci Rep 2020; 10:15540. [PMID: 32968127 PMCID: PMC7511320 DOI: 10.1038/s41598-020-72375-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 08/31/2020] [Indexed: 11/09/2022] Open
Abstract
Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.
Collapse
Affiliation(s)
- Raphaël Thézé
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Mehdi Ali Gadiri
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Louis Albert
- Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
| | - Antoine Provost
- Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Pierre Mégevand
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland. .,Division of Neurology, Geneva University Hospitals, Geneva, Switzerland.
| |
Collapse
|
10
|
Englund N, Behne DM. Perception of audiovisual infant directed speech. Scand J Psychol 2019; 61:218-226. [PMID: 31820436 DOI: 10.1111/sjop.12599] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 10/20/2019] [Indexed: 11/30/2022]
Abstract
Infant perception often deals with audiovisual speech input and a first step in processing this input is to perceive both visual and auditory information. The speech directed to infants has special characteristics and may enhance visual aspects of speech. The current study was designed to explore the impact of visual enhancement in infant-directed speech (IDS) on audiovisual mismatch detection in a naturalistic setting. Twenty infants participated in an experiment with a visual fixation task conducted in participants' homes. Stimuli consisted of IDS and adult-directed speech (ADS) syllables with a plosive and the vowel /a:/, /i:/ or /u:/. These were either audiovisually congruent or incongruent. Infants looked longer at incongruent than congruent syllables and longer at IDS than ADS syllables, indicating that IDS and incongruent stimuli contain cues that can make audiovisual perception challenging and thereby attract infants' gaze.
Collapse
Affiliation(s)
- Nunne Englund
- Department of Psychology, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dawn M Behne
- Department of Psychology, NTNU, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
11
|
"Paying" attention to audiovisual speech: Do incongruent stimuli incur greater costs? Atten Percept Psychophys 2019; 81:1743-1756. [PMID: 31197661 DOI: 10.3758/s13414-019-01772-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept. McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration, but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing - susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal, and distinct cortical regions are recruited when processing congruent versus incongruent speech. In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed, they do not appear to differ in terms of processing time or effort, at least in the open-set task speech task used here. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.
Collapse
|
12
|
Metrical congruency and kinematic familiarity facilitate temporal binding between musical and dance rhythms. Psychon Bull Rev 2019; 25:1416-1422. [PMID: 29766450 DOI: 10.3758/s13423-018-1480-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Although music and dance are often experienced simultaneously, it is unclear what modulates their perceptual integration. This study investigated how two factors related to music-dance correspondences influenced audiovisual binding of their rhythms: the metrical match between the music and dance, and the kinematic familiarity of the dance movement. Participants watched a point-light figure dancing synchronously to a triple-meter rhythm that they heard in parallel, whereby the dance communicated a triple (congruent) or a duple (incongruent) visual meter. The movement was either the participant's own or that of another participant. Participants attended to both streams while detecting a temporal perturbation in the auditory beat. The results showed lower sensitivity to the auditory deviant when the visual dance was metrically congruent to the auditory rhythm and when the movement was the participant's own. This indicated stronger audiovisual binding and a more coherent bimodal rhythm in these conditions, thus making a slight auditory deviant less noticeable. Moreover, binding in the metrically incongruent condition involving self-generated visual stimuli was correlated with self-recognition of the movement, suggesting that action simulation mediates the perceived coherence between one's own movement and a mismatching auditory rhythm. Overall, the mechanisms of rhythm perception and action simulation could inform the perceived compatibility between music and dance, thus modulating the temporal integration of these audiovisual stimuli.
Collapse
|
13
|
Magnotti JF, Smith KB, Salinas M, Mays J, Zhu LL, Beauchamp MS. A causal inference explanation for enhancement of multisensory integration by co-articulation. Sci Rep 2018; 8:18032. [PMID: 30575791 PMCID: PMC6303389 DOI: 10.1038/s41598-018-36772-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 11/22/2018] [Indexed: 11/09/2022] Open
Abstract
The McGurk effect is a popular assay of multisensory integration in which participants report the illusory percept of "da" when presented with incongruent auditory "ba" and visual "ga" (AbaVga). While the original publication describing the effect found that 98% of participants perceived it, later studies reported much lower prevalence, ranging from 17% to 81%. Understanding the source of this variability is important for interpreting the panoply of studies that examine McGurk prevalence between groups, including clinical populations such as individuals with autism or schizophrenia. The original publication used stimuli consisting of multiple repetitions of a co-articulated syllable (three repetitions, AgagaVbaba). Later studies used stimuli without repetition or co-articulation (AbaVga) and used congruent syllables from the same talker as a control. In three experiments, we tested how stimulus repetition, co-articulation, and talker repetition affect McGurk prevalence. Repetition with co-articulation increased prevalence by 20%, while repetition without co-articulation and talker repetition had no effect. A fourth experiment compared the effect of the on-line testing used in the first three experiments with the in-person testing used in the original publication; no differences were observed. We interpret our results in the framework of causal inference: co-articulation increases the evidence that auditory and visual speech tokens arise from the same talker, increasing tolerance for content disparity and likelihood of integration. The results provide a principled explanation for how co-articulation aids multisensory integration and can explain the high prevalence of the McGurk effect in the initial publication.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA.
| | - Kristen B Smith
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Marcelo Salinas
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Jacqunae Mays
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Lin L Zhu
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Michael S Beauchamp
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA.
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
14
|
Brown VA, Hedayati M, Zanger A, Mayn S, Ray L, Dillman-Hasso N, Strand JF. What accounts for individual differences in susceptibility to the McGurk effect? PLoS One 2018; 13:e0207160. [PMID: 30418995 PMCID: PMC6231656 DOI: 10.1371/journal.pone.0207160] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 10/25/2018] [Indexed: 11/29/2022] Open
Abstract
The McGurk effect is a classic audiovisual speech illusion in which discrepant auditory and visual syllables can lead to a fused percept (e.g., an auditory /bɑ/ paired with a visual /gɑ/ often leads to the perception of /dɑ/). The McGurk effect is robust and easily replicated in pooled group data, but there is tremendous variability in the extent to which individual participants are susceptible to it. In some studies, the rate at which individuals report fusion responses ranges from 0% to 100%. Despite its widespread use in the audiovisual speech perception literature, the roots of the wide variability in McGurk susceptibility are largely unknown. This study evaluated whether several perceptual and cognitive traits are related to McGurk susceptibility through correlational analyses and mixed effects modeling. We found that an individual's susceptibility to the McGurk effect was related to their ability to extract place of articulation information from the visual signal (i.e., a more fine-grained analysis of lipreading ability), but not to scores on tasks measuring attentional control, processing speed, working memory capacity, or auditory perceptual gradiency. These results provide support for the claim that a small amount of the variability in susceptibility to the McGurk effect is attributable to lipreading skill. In contrast, cognitive and perceptual abilities that are commonly used predictors in individual differences studies do not appear to underlie susceptibility to the McGurk effect.
Collapse
Affiliation(s)
- Violet A. Brown
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Maryam Hedayati
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Annie Zanger
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Sasha Mayn
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Lucia Ray
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Naseem Dillman-Hasso
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| | - Julia F. Strand
- Department of Psychology, Carleton College, Northfield, Minnesota, United States of America
| |
Collapse
|
15
|
Chaplin TA, Rosa MGP, Lui LL. Auditory and Visual Motion Processing and Integration in the Primate Cerebral Cortex. Front Neural Circuits 2018; 12:93. [PMID: 30416431 PMCID: PMC6212655 DOI: 10.3389/fncir.2018.00093] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 10/08/2018] [Indexed: 11/13/2022] Open
Abstract
The ability of animals to detect motion is critical for survival, and errors or even delays in motion perception may prove costly. In the natural world, moving objects in the visual field often produce concurrent sounds. Thus, it can highly advantageous to detect motion elicited from sensory signals of either modality, and to integrate them to produce more reliable motion perception. A great deal of progress has been made in understanding how visual motion perception is governed by the activity of single neurons in the primate cerebral cortex, but far less progress has been made in understanding both auditory motion and audiovisual motion integration. Here we, review the key cortical regions for motion processing, focussing on translational motion. We compare the representations of space and motion in the visual and auditory systems, and examine how single neurons in these two sensory systems encode the direction of motion. We also discuss the way in which humans integrate of audio and visual motion cues, and the regions of the cortex that may mediate this process.
Collapse
Affiliation(s)
- Tristan A Chaplin
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, VIC, Australia.,Australian Research Council (ARC) Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, VIC, Australia
| | - Marcello G P Rosa
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, VIC, Australia.,Australian Research Council (ARC) Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, VIC, Australia
| | - Leo L Lui
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, VIC, Australia.,Australian Research Council (ARC) Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, VIC, Australia
| |
Collapse
|
16
|
Chaplin TA, Allitt BJ, Hagan MA, Rosa MGP, Rajan R, Lui LL. Auditory motion does not modulate spiking activity in the middle temporal and medial superior temporal visual areas. Eur J Neurosci 2018; 48:2013-2029. [PMID: 30019438 DOI: 10.1111/ejn.14071] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 06/27/2018] [Accepted: 07/07/2018] [Indexed: 12/29/2022]
Abstract
The integration of multiple sensory modalities is a key aspect of brain function, allowing animals to take advantage of concurrent sources of information to make more accurate perceptual judgments. For many years, multisensory integration in the cerebral cortex was deemed to occur only in high-level "polysensory" association areas. However, more recent studies have suggested that cross-modal stimulation can also influence neural activity in areas traditionally considered to be unimodal. In particular, several human neuroimaging studies have reported that extrastriate areas involved in visual motion perception are also activated by auditory motion, and may integrate audiovisual motion cues. However, the exact nature and extent of the effects of auditory motion on the visual cortex have not been studied at the single neuron level. We recorded the spiking activity of neurons in the middle temporal (MT) and medial superior temporal (MST) areas of anesthetized marmoset monkeys upon presentation of unimodal stimuli (moving auditory or visual patterns), as well as bimodal stimuli (concurrent audiovisual motion). Despite robust, direction selective responses to visual motion, none of the sampled neurons responded to auditory motion stimuli. Moreover, concurrent moving auditory stimuli had no significant effect on the ability of single MT and MST neurons, or populations of simultaneously recorded neurons, to discriminate the direction of motion of visual stimuli (moving random dot patterns with varying levels of motion noise). Our findings do not support the hypothesis that direct interactions between MT, MST and areas low in the hierarchy of auditory areas underlie audiovisual motion integration.
Collapse
Affiliation(s)
- Tristan A Chaplin
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia.,ARC Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, Victoria, Australia
| | - Benjamin J Allitt
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia
| | - Maureen A Hagan
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia.,ARC Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, Victoria, Australia
| | - Marcello G P Rosa
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia.,ARC Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, Victoria, Australia
| | - Ramesh Rajan
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia.,ARC Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, Victoria, Australia
| | - Leo L Lui
- Neuroscience Program, Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia.,ARC Centre of Excellence for Integrative Brain Function, Monash University Node, Clayton, Victoria, Australia
| |
Collapse
|
17
|
Chen YC, Spence C. Dissociating the time courses of the cross-modal semantic priming effects elicited by naturalistic sounds and spoken words. Psychon Bull Rev 2018; 25:1138-1146. [PMID: 28600716 PMCID: PMC5990551 DOI: 10.3758/s13423-017-1324-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The present study compared the time courses of the cross-modal semantic priming effects elicited by naturalistic sounds and spoken words on visual picture processing. Following an auditory prime, a picture (or blank frame) was briefly presented and then immediately masked. The participants had to judge whether or not a picture had been presented. Naturalistic sounds consistently elicited a cross-modal semantic priming effect on visual sensitivity (d') for pictures (higher d' in the congruent than in the incongruent condition) at the 350-ms rather than at the 1,000-ms stimulus onset asynchrony (SOA). Spoken words mainly elicited a cross-modal semantic priming effect at the 1,000-ms rather than at the 350-ms SOA, but this effect was modulated by the order of testing these two SOAs. It would therefore appear that visual picture processing can be rapidly primed by naturalistic sounds via cross-modal associations, and this effect is short lived. In contrast, spoken words prime visual picture processing over a wider range of prime-target intervals, though this effect was conditioned by the prior context.
Collapse
Affiliation(s)
- Yi-Chuan Chen
- Crossmodal Research Laboratory, Department of Experimental Psychology, University of Oxford, 9 South Parks Road, Oxford, OX1 3UD, UK.
| | - Charles Spence
- Crossmodal Research Laboratory, Department of Experimental Psychology, University of Oxford, 9 South Parks Road, Oxford, OX1 3UD, UK
| |
Collapse
|
18
|
Morís Fernández L, Torralba M, Soto-Faraco S. Theta oscillations reflect conflict processing in the perception of the McGurk illusion. Eur J Neurosci 2018; 48:2630-2641. [DOI: 10.1111/ejn.13804] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 12/12/2017] [Accepted: 12/12/2017] [Indexed: 11/27/2022]
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Mireia Torralba
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Salvador Soto-Faraco
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
19
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
20
|
Costantini M, Migliorati D, Donno B, Sirota M, Ferri F. Expected but omitted stimuli affect crossmodal interaction. Cognition 2017; 171:52-64. [PMID: 29107888 DOI: 10.1016/j.cognition.2017.10.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 10/19/2017] [Accepted: 10/20/2017] [Indexed: 11/29/2022]
Abstract
One of the most important ability of our brain is to integrate input from different sensory modalities to create a coherent representation of the environment. Does expectation affect such multisensory integration? In this paper, we tackled this issue by taking advantage from the crossmodal congruency effect (CCE). Participants made elevation judgments to visual target while ignoring tactile distractors. We manipulated the expectation of the tactile distractor by pairing the tactile stimulus to the index finger with a high-frequency tone and the tactile stimulus to the thumb with a low-frequency tone in 80% of the trials. In the remaining trials we delivered the tone and the visual target, but the tactile distractor was omitted (Study 1). Results fully replicated the basic crossmodal congruency effect. Strikingly, the CCE was observed, though at a lesser degree, also when the tactile distractor was not presented but merely expected. The contingencies between tones and tactile distractors were reversed in a follow-up study (Study 2), and the effect was further tested in two conceptual replications using different combinations of stimuli (Studies 5 and 6). Two control studies ruled out alternative explanations of the observed effect that would not involve a role for tactile distractors (Studies 3, 4). Two additional control studies unequivocally proved the dependency of the CCE on the spatial and temporal expectation of the distractors (Study 7, 8). An internal small-scale meta-analysis showed that the crossmodal congruency effect with predicted distractors is a robust medium size effect. Our findings reveal that multisensory integration, one of the most basic and ubiquitous mechanisms to encode external events, benefits from expectation of sensory input.
Collapse
Affiliation(s)
- Marcello Costantini
- Centre for Brain Science, Department of Psychology, University of Essex, Colchester, UK; Laboratory of Neuropsychology and Cognitive Neuroscience, Department of Neuroscience and Imaging, University G. d'Annunzio, Chieti, Italy; Institute for Advanced Biomedical Technologies - ITAB, Foundation University G. d'Annunzio, Chieti, Italy.
| | - Daniele Migliorati
- Laboratory of Neuropsychology and Cognitive Neuroscience, Department of Neuroscience and Imaging, University G. d'Annunzio, Chieti, Italy; Institute for Advanced Biomedical Technologies - ITAB, Foundation University G. d'Annunzio, Chieti, Italy
| | - Brunella Donno
- Laboratory of Neuropsychology and Cognitive Neuroscience, Department of Neuroscience and Imaging, University G. d'Annunzio, Chieti, Italy; Institute for Advanced Biomedical Technologies - ITAB, Foundation University G. d'Annunzio, Chieti, Italy
| | - Miroslav Sirota
- Centre for Brain Science, Department of Psychology, University of Essex, Colchester, UK
| | - Francesca Ferri
- Centre for Brain Science, Department of Psychology, University of Essex, Colchester, UK.
| |
Collapse
|
21
|
Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect. Atten Percept Psychophys 2017; 79:396-403. [PMID: 27921268 DOI: 10.3758/s13414-016-1238-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners' auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants' susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners' McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.
Collapse
|
22
|
Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017; 38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open
Abstract
There are two main behavioral expressions of multisensory integration (MSI) in speech; the perceptual enhancement produced by the sight of the congruent lip movements of the speaker, and the illusory sound perceived when a speech syllable is dubbed with incongruent lip movements, in the McGurk effect. These two models have been used very often to study MSI. Here, we contend that, unlike congruent audiovisually (AV) speech, the McGurk effect involves brain areas related to conflict detection and resolution. To test this hypothesis, we used fMRI to measure blood oxygen level dependent responses to AV speech syllables. We analyzed brain activity as a function of the nature of the stimuli-McGurk or non-McGurk-and the perceptual outcome regarding MSI-integrated or not integrated response-in a 2 × 2 factorial design. The results showed that, regardless of perceptual outcome, AV mismatch activated general-purpose conflict areas (e.g., anterior cingulate cortex) as well as specific AV speech conflict areas (e.g., inferior frontal gyrus), compared with AV matching stimuli. Moreover, these conflict areas showed stronger activation on trials where the McGurk illusion was perceived compared with non-illusory trials, despite the stimuli where physically identical. We conclude that the AV incongruence in McGurk stimuli triggers the activation of conflict processing areas and that the process of resolving the cross-modal conflict is critical for the McGurk illusion to arise. Hum Brain Mapp 38:5691-5705, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy.,ImpAct Team, Lyon Neuroscience Research Center (UCBL1, INSERM 1028, CNRS 5292), Lyon, France
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
23
|
Odegaard B, Wozny DR, Shams L. A simple and efficient method to enhance audiovisual binding tendencies. PeerJ 2017; 5:e3143. [PMID: 28462016 PMCID: PMC5407282 DOI: 10.7717/peerj.3143] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Accepted: 03/04/2017] [Indexed: 11/20/2022] Open
Abstract
Individuals vary in their tendency to bind signals from multiple senses. For the same set of sights and sounds, one individual may frequently integrate multisensory signals and experience a unified percept, whereas another individual may rarely bind them and often experience two distinct sensations. Thus, while this binding/integration tendency is specific to each individual, it is not clear how plastic this tendency is in adulthood, and how sensory experiences may cause it to change. Here, we conducted an exploratory investigation which provides evidence that (1) the brain’s tendency to bind in spatial perception is plastic, (2) that it can change following brief exposure to simple audiovisual stimuli, and (3) that exposure to temporally synchronous, spatially discrepant stimuli provides the most effective method to modify it. These results can inform current theories about how the brain updates its internal model of the surrounding sensory world, as well as future investigations seeking to increase integration tendencies.
Collapse
Affiliation(s)
- Brian Odegaard
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| | - David R Wozny
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| | - Ladan Shams
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States.,Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA, United States.,Neuroscience Interdepartmental Program, University of California-Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
24
|
Sight and sound persistently out of synch: stable individual differences in audiovisual synchronisation revealed by implicit measures of lip-voice integration. Sci Rep 2017; 7:46413. [PMID: 28429784 PMCID: PMC5399466 DOI: 10.1038/srep46413] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 03/17/2017] [Indexed: 11/08/2022] Open
Abstract
Are sight and sound out of synch? Signs that they are have been dismissed for over two centuries as an artefact of attentional and response bias, to which traditional subjective methods are prone. To avoid such biases, we measured performance on objective tasks that depend implicitly on achieving good lip-synch. We measured the McGurk effect (in which incongruent lip-voice pairs evoke illusory phonemes), and also identification of degraded speech, while manipulating audiovisual asynchrony. Peak performance was found at an average auditory lag of ~100 ms, but this varied widely between individuals. Participants’ individual optimal asynchronies showed trait-like stability when the same task was re-tested one week later, but measures based on different tasks did not correlate. This discounts the possible influence of common biasing factors, suggesting instead that our different tasks probe different brain networks, each subject to their own intrinsic auditory and visual processing latencies. Our findings call for renewed interest in the biological causes and cognitive consequences of individual sensory asynchronies, leading potentially to fresh insights into the neural representation of sensory timing. A concrete implication is that speech comprehension might be enhanced, by first measuring each individual’s optimal asynchrony and then applying a compensatory auditory delay.
Collapse
|
25
|
Chen YC, Spence C. Assessing the Role of the 'Unity Assumption' on Multisensory Integration: A Review. Front Psychol 2017; 8:445. [PMID: 28408890 PMCID: PMC5374162 DOI: 10.3389/fpsyg.2017.00445] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 03/09/2017] [Indexed: 01/20/2023] Open
Abstract
There has been longstanding interest from both experimental psychologists and cognitive neuroscientists in the potential modulatory role of various top-down factors on multisensory integration/perception in humans. One such top-down influence, often referred to in the literature as the 'unity assumption,' is thought to occur in those situations in which an observer considers that various of the unisensory stimuli that they have been presented with belong to one and the same object or event (Welch and Warren, 1980). Here, we review the possible factors that may lead to the emergence of the unity assumption. We then critically evaluate the evidence concerning the consequences of the unity assumption from studies of the spatial and temporal ventriloquism effects, from the McGurk effect, and from the Colavita visual dominance paradigm. The research that has been published to date using these tasks provides support for the claim that the unity assumption influences multisensory perception under at least a subset of experimental conditions. We then consider whether the notion has been superseded in recent years by the introduction of priors in Bayesian causal inference models of human multisensory perception. We suggest that the prior of common cause (that is, the prior concerning whether multisensory signals originate from the same source or not) offers the most useful way to quantify the unity assumption as a continuous cognitive variable.
Collapse
Affiliation(s)
| | - Charles Spence
- Crossmodal Research Laboratory, Department of Experimental Psychology, Oxford UniversityOxford, UK
| |
Collapse
|
26
|
Metacognition in Multisensory Perception. Trends Cogn Sci 2016; 20:736-747. [DOI: 10.1016/j.tics.2016.08.006] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Revised: 08/09/2016] [Accepted: 08/09/2016] [Indexed: 11/19/2022]
|
27
|
Abstract
In the McGurk effect, incongruent auditory and visual syllables are perceived as a third, completely different syllable. This striking illusion has become a popular assay of multisensory integration for individuals and clinical populations. However, there is enormous variability in how often the illusion is evoked by different stimuli and how often the illusion is perceived by different individuals. Most studies of the McGurk effect have used only one stimulus, making it impossible to separate stimulus and individual differences. We created a probabilistic model to separately estimate stimulus and individual differences in behavioral data from 165 individuals viewing up to 14 different McGurk stimuli. The noisy encoding of disparity (NED) model characterizes stimuli by their audiovisual disparity and characterizes individuals by how noisily they encode the stimulus disparity and by their disparity threshold for perceiving the illusion. The model accurately described perception of the McGurk effect in our sample, suggesting that differences between individuals are stable across stimulus differences. The most important benefit of the NED model is that it provides a method to compare multisensory integration across individuals and groups without the confound of stimulus differences. An added benefit is the ability to predict frequency of the McGurk effect for stimuli never before seen by an individual.
Collapse
|
28
|
The McGurk effect: An investigation of attentional capacity employing response times. Atten Percept Psychophys 2016; 78:1712-27. [DOI: 10.3758/s13414-016-1133-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
29
|
Audio Visual Integration with Competing Sources in the Framework of Audio Visual Speech Scene Analysis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016. [DOI: 10.1007/978-3-319-25474-6_42] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register]
|
30
|
ten Oever S, Romei V, van Atteveldt N, Soto-Faraco S, Murray MM, Matusz PJ. The COGs (context, object, and goals) in multisensory processing. Exp Brain Res 2016; 234:1307-23. [PMID: 26931340 DOI: 10.1007/s00221-016-4590-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 01/30/2016] [Indexed: 12/20/2022]
Abstract
Our understanding of how perception operates in real-world environments has been substantially advanced by studying both multisensory processes and "top-down" control processes influencing sensory processing via activity from higher-order brain areas, such as attention, memory, and expectations. As the two topics have been traditionally studied separately, the mechanisms orchestrating real-world multisensory processing remain unclear. Past work has revealed that the observer's goals gate the influence of many multisensory processes on brain and behavioural responses, whereas some other multisensory processes might occur independently of these goals. Consequently, other forms of top-down control beyond goal dependence are necessary to explain the full range of multisensory effects currently reported at the brain and the cognitive level. These forms of control include sensitivity to stimulus context as well as the detection of matches (or lack thereof) between a multisensory stimulus and categorical attributes of naturalistic objects (e.g. tools, animals). In this review we discuss and integrate the existing findings that demonstrate the importance of such goal-, object- and context-based top-down control over multisensory processing. We then put forward a few principles emerging from this literature review with respect to the mechanisms underlying multisensory processing and discuss their possible broader implications.
Collapse
Affiliation(s)
- Sanne ten Oever
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Vincenzo Romei
- Department of Psychology, Centre for Brain Science, University of Essex, Colchester, UK
| | - Nienke van Atteveldt
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands.,Department of Educational Neuroscience, Faculty of Psychology and Education and Institute LEARN!, VU University Amsterdam, Amsterdam, The Netherlands
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Micah M Murray
- The Laboratory for Investigative Neurophysiology (The LINE), Neuropsychology and Neurorehabilitation Service and Department of Radiology, Centre Hospitalier Universitaire Vaudois (CHUV), University Hospital Center and University of Lausanne, BH7.081, rue du Bugnon 46, 1011, Lausanne, Switzerland.,EEG Brain Mapping Core, Center for Biomedical Imaging (CIBM) of Lausanne and Geneva, Lausanne, Switzerland.,Department of Ophthalmology, Jules-Gonin Eye Hospital, University of Lausanne, Lausanne, Switzerland
| | - Pawel J Matusz
- The Laboratory for Investigative Neurophysiology (The LINE), Neuropsychology and Neurorehabilitation Service and Department of Radiology, Centre Hospitalier Universitaire Vaudois (CHUV), University Hospital Center and University of Lausanne, BH7.081, rue du Bugnon 46, 1011, Lausanne, Switzerland. .,Attention, Brain, and Cognitive Development Group, Department of Experimental Psychology, University of Oxford, Oxford, UK.
| |
Collapse
|
31
|
Using EEG and stimulus context to probe the modelling of auditory-visual speech. Cortex 2016; 75:220-230. [DOI: 10.1016/j.cortex.2015.03.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 03/20/2015] [Accepted: 03/20/2015] [Indexed: 01/22/2023]
|
32
|
Bizley JK, Maddox RK, Lee AKC. Defining Auditory-Visual Objects: Behavioral Tests and Physiological Mechanisms. Trends Neurosci 2016; 39:74-85. [PMID: 26775728 PMCID: PMC4738154 DOI: 10.1016/j.tins.2015.12.007] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Revised: 12/03/2015] [Accepted: 12/11/2015] [Indexed: 11/30/2022]
Abstract
Crossmodal integration is a term applicable to many phenomena in which one sensory modality influences task performance or perception in another sensory modality. We distinguish the term binding as one that should be reserved specifically for the process that underpins perceptual object formation. To unambiguously differentiate binding form other types of integration, behavioral and neural studies must investigate perception of a feature orthogonal to the features that link the auditory and visual stimuli. We argue that supporting true perceptual binding (as opposed to other processes such as decision-making) is one role for cross-sensory influences in early sensory cortex. These early multisensory interactions may therefore form a physiological substrate for the bottom-up grouping of auditory and visual stimuli into auditory-visual (AV) objects. Crossmodal integration and binding have been treated as synonymous in the literature, with no clear delineation between perceptual changes and other interactions such as decision-making. Crossmodal binding is proposed as a distinct form of integration leading to multisensory object formation. Multisensory stimuli are most beneficial in noisy situations, but few studies use stimulus competition to investigate the processes underpinning multisensory integration. Evidence suggests that both visual and auditory attention is object-based – all features within an object are enhanced and there is a cost to attending features across versus within objects. Multisensory interactions can be observed throughout the brain, including early sensory cortex. The role of early sensory cortex in multisensory integration is unknown, but may underlie crossmodal binding.
Collapse
Affiliation(s)
- Jennifer K Bizley
- University College London (UCL) Ear Institute, 332 Gray's Inn Road, London, WC1X 8EE, UK.
| | - Ross K Maddox
- Institute for Learning and Brain Sciences, University of Washington, 1715 NE Columbia Road, Portage Bay Building, Box 357988, Seattle, WA 98195, USA
| | - Adrian K C Lee
- Institute for Learning and Brain Sciences, University of Washington, 1715 NE Columbia Road, Portage Bay Building, Box 357988, Seattle, WA 98195, USA; Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd Street, Eagleson Hall, Box 354875, Seattle, WA 98105, USA.
| |
Collapse
|
33
|
Gau R, Noppeney U. How prior expectations shape multisensory perception. Neuroimage 2016; 124:876-886. [DOI: 10.1016/j.neuroimage.2015.09.045] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Accepted: 09/20/2015] [Indexed: 11/24/2022] Open
|
34
|
Morís Fernández L, Visser M, Ventura-Campos N, Ávila C, Soto-Faraco S. Top-down attention regulates the neural expression of audiovisual integration. Neuroimage 2015; 119:272-85. [PMID: 26119022 DOI: 10.1016/j.neuroimage.2015.06.052] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Revised: 06/16/2015] [Accepted: 06/18/2015] [Indexed: 10/23/2022] Open
Abstract
The interplay between attention and multisensory integration has proven to be a difficult question to tackle. There are almost as many studies showing that multisensory integration occurs independently from the focus of attention as studies implying that attention has a profound effect on integration. Addressing the neural expression of multisensory integration for attended vs. unattended stimuli can help disentangle this apparent contradiction. In the present study, we examine if selective attention to sound pitch influences the expression of audiovisual integration in both behavior and neural activity. Participants were asked to attend to one of two auditory speech streams while watching a pair of talking lips that could be congruent or incongruent with the attended speech stream. We measured behavioral and neural responses (fMRI) to multisensory stimuli under attended and unattended conditions while physical stimulation was kept constant. Our results indicate that participants recognized words more accurately from an auditory stream that was both attended and audiovisually (AV) congruent, thus reflecting a benefit due to AV integration. On the other hand, no enhancement was found for AV congruency when it was unattended. Furthermore, the fMRI results indicated that activity in the superior temporal sulcus (an area known to be related to multisensory integration) was contingent on attention as well as on audiovisual congruency. This attentional modulation extended beyond heteromodal areas to affect processing in areas classically recognized as unisensory, such as the superior temporal gyrus or the extrastriate cortex, and to non-sensory areas such as the motor cortex. Interestingly, attention to audiovisual incongruence triggered responses in brain areas related to conflict processing (i.e., the anterior cingulate cortex and the anterior insula). Based on these results, we hypothesize that AV speech integration can take place automatically only when both modalities are sufficiently processed, and that if a mismatch is detected between the AV modalities, feedback from conflict areas minimizes the influence of this mismatch by reducing the processing of the least informative modality.
Collapse
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Maya Visser
- Departament de Psicología Básica, Clínica y Psicobiología, Universitat Jaume I, Castelló de la Plana, Spain
| | - Noelia Ventura-Campos
- Departament de Psicología Básica, Clínica y Psicobiología, Universitat Jaume I, Castelló de la Plana, Spain
| | - César Ávila
- Departament de Psicología Básica, Clínica y Psicobiología, Universitat Jaume I, Castelló de la Plana, Spain
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
35
|
Andersen TS. The early maximum likelihood estimation model of audiovisual integration in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:2884-2891. [PMID: 25994715 DOI: 10.1121/1.4916691] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk-MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely focused on the fuzzy logical model of perception (FLMP), which provides excellent fits to experimental observations but also has been criticized for being too flexible, post hoc and difficult to interpret. The current study introduces the early maximum likelihood estimation (MLE) model of audiovisual integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures favored more complex models. This difference between conventional error measures and cross-validation was found to be indicative of over-fitting in more complex models such as the FLMP.
Collapse
Affiliation(s)
- Tobias S Andersen
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby, Denmark
| |
Collapse
|
36
|
Talsma D. Predictive coding and multisensory integration: an attentional account of the multisensory mind. Front Integr Neurosci 2015; 9:19. [PMID: 25859192 PMCID: PMC4374459 DOI: 10.3389/fnint.2015.00019] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 03/03/2015] [Indexed: 11/13/2022] Open
Abstract
Multisensory integration involves a host of different cognitive processes, occurring at different stages of sensory processing. Here I argue that, despite recent insights suggesting that multisensory interactions can occur at very early latencies, the actual integration of individual sensory traces into an internally consistent mental representation is dependent on both top–down and bottom–up processes. Moreover, I argue that this integration is not limited to just sensory inputs, but that internal cognitive processes also shape the resulting mental representation. Studies showing that memory recall is affected by the initial multisensory context in which the stimuli were presented will be discussed, as well as several studies showing that mental imagery can affect multisensory illusions. This empirical evidence will be discussed from a predictive coding perspective, in which a central top–down attentional process is proposed to play a central role in coordinating the integration of all these inputs into a coherent mental representation.
Collapse
Affiliation(s)
- Durk Talsma
- Department of Experimental Psychology, Ghent University Ghent, Belgium
| |
Collapse
|
37
|
Maddox RK, Atilgan H, Bizley JK, Lee AKC. Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners. eLife 2015; 4:e04995. [PMID: 25654748 PMCID: PMC4337603 DOI: 10.7554/elife.04995] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 12/27/2014] [Indexed: 11/22/2022] Open
Abstract
In noisy settings, listening is aided by correlated dynamic visual cues gleaned from a talker's face-an improvement often attributed to visually reinforced linguistic information. In this study, we aimed to test the effect of audio-visual temporal coherence alone on selective listening, free of linguistic confounds. We presented listeners with competing auditory streams whose amplitude varied independently and a visual stimulus with varying radius, while manipulating the cross-modal temporal relationships. Performance improved when the auditory target's timecourse matched that of the visual stimulus. The fact that the coherence was between task-irrelevant stimulus features suggests that the observed improvement stemmed from the integration of auditory and visual streams into cross-modal objects, enabling listeners to better attend the target. These findings suggest that in everyday conditions, where listeners can often see the source of a sound, temporal cues provided by vision can help listeners to select one sound source from a mixture.
Collapse
Affiliation(s)
- Ross K Maddox
- Institute for Learning and Brain Sciences, University of Washington, Seattle, United States
| | - Huriye Atilgan
- Ear Institute, University College London, London, United Kingdom
| | | | - Adrian KC Lee
- Institute for Learning and Brain Sciences, University of Washington, Seattle, United States
- Department of Speech and Hearing Sciences, University of Washington, Seattle, United States
| |
Collapse
|
38
|
Nahorna O, Berthommier F, Schwartz JL. Audio-visual speech scene analysis: characterization of the dynamics of unbinding and rebinding the McGurk effect. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:362-377. [PMID: 25618066 DOI: 10.1121/1.4904536] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
While audiovisual interactions in speech perception have long been considered as automatic, recent data suggest that this is not the case. In a previous study, Nahorna et al. [(2012). J. Acoust. Soc. Am. 132, 1061-1077] showed that the McGurk effect is reduced by a previous incoherent audiovisual context. This was interpreted as showing the existence of an audiovisual binding stage controlling the fusion process. Incoherence would produce unbinding and decrease the weight of the visual input in fusion. The present paper explores the audiovisual binding system to characterize its dynamics. A first experiment assesses the dynamics of unbinding, and shows that it is rapid: An incoherent context less than 0.5 s long (typically one syllable) suffices to produce a maximal reduction in the McGurk effect. A second experiment tests the rebinding process, by presenting a short period of either coherent material or silence after the incoherent unbinding context. Coherence provides rebinding, with a recovery of the McGurk effect, while silence provides no rebinding and hence freezes the unbinding process. These experiments are interpreted in the framework of an audiovisual speech scene analysis process assessing the perceptual organization of an audiovisual speech input before decision takes place at a higher processing stage.
Collapse
Affiliation(s)
- Olha Nahorna
- GIPSA-Lab, Speech and Cognition Department, UMR 5216, CNRS, Grenoble University, Grenoble, France
| | - Frédéric Berthommier
- GIPSA-Lab, Speech and Cognition Department, UMR 5216, CNRS, Grenoble University, Grenoble, France
| | - Jean-Luc Schwartz
- GIPSA-Lab, Speech and Cognition Department, UMR 5216, CNRS, Grenoble University, Grenoble, France
| |
Collapse
|
39
|
Ganesh AC, Berthommier F, Vilain C, Sato M, Schwartz JL. A possible neurophysiological correlate of audiovisual binding and unbinding in speech perception. Front Psychol 2014; 5:1340. [PMID: 25505438 PMCID: PMC4244540 DOI: 10.3389/fpsyg.2014.01340] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Accepted: 11/03/2014] [Indexed: 11/13/2022] Open
Abstract
Audiovisual (AV) speech integration of auditory and visual streams generally ends up in a fusion into a single percept. One classical example is the McGurk effect in which incongruent auditory and visual speech signals may lead to a fused percept different from either visual or auditory inputs. In a previous set of experiments, we showed that if a McGurk stimulus is preceded by an incongruent AV context (composed of incongruent auditory and visual speech materials) the amount of McGurk fusion is largely decreased. We interpreted this result in the framework of a two-stage "binding and fusion" model of AV speech perception, with an early AV binding stage controlling the fusion/decision process and likely to produce "unbinding" with less fusion if the context is incoherent. In order to provide further electrophysiological evidence for this binding/unbinding stage, early auditory evoked N1/P2 responses were here compared during auditory, congruent and incongruent AV speech perception, according to either prior coherent or incoherent AV contexts. Following the coherent context, in line with previous electroencephalographic/magnetoencephalographic studies, visual information in the congruent AV condition was found to modify auditory evoked potentials, with a latency decrease of P2 responses compared to the auditory condition. Importantly, both P2 amplitude and latency in the congruent AV condition increased from the coherent to the incoherent context. Although potential contamination by visual responses from the visual cortex cannot be discarded, our results might provide a possible neurophysiological correlate of early binding/unbinding process applied on AV interactions.
Collapse
Affiliation(s)
- Attigodu C Ganesh
- CNRS, Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, UMR 5216, Grenoble University, Grenoble France
| | - Frédéric Berthommier
- CNRS, Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, UMR 5216, Grenoble University, Grenoble France
| | - Coriandre Vilain
- CNRS, Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, UMR 5216, Grenoble University, Grenoble France
| | - Marc Sato
- CNRS, Laboratoire Parole et Langage, Brain and Language Research Institute, UMR 7309, Aix-Marseille University, Aix-en-Provence France
| | - Jean-Luc Schwartz
- CNRS, Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, UMR 5216, Grenoble University, Grenoble France
| |
Collapse
|
40
|
Alsius A, Möttönen R, Sams ME, Soto-Faraco S, Tiippana K. Effect of attentional load on audiovisual speech perception: evidence from ERPs. Front Psychol 2014; 5:727. [PMID: 25076922 PMCID: PMC4097954 DOI: 10.3389/fpsyg.2014.00727] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 06/23/2014] [Indexed: 11/13/2022] Open
Abstract
Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs) generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual, and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e., a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University Kingston, ON, Canada
| | - Riikka Möttönen
- Department of Experimental Psychology, University of Oxford Oxford, UK
| | - Mikko E Sams
- Brain and Mind Laboratory, School of Science, Aalto University Espoo, Finland
| | - Salvador Soto-Faraco
- Institut Català de Recerca i Estudis Avançats Barcelona, Spain ; Brain and Cognition Center, Universitat Pompeu Fabra Barcelona, Spain
| | - Kaisa Tiippana
- Institute of Behavioural Sciences, University of Helsinki Helsinki, Finland
| |
Collapse
|
41
|
Kumpik DP, Roberts HE, King AJ, Bizley JK. Visual sensitivity is a stronger determinant of illusory processes than auditory cue parameters in the sound-induced flash illusion. J Vis 2014; 14:14.7.12. [PMID: 24961249 DOI: 10.1167/14.7.12] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The sound-induced flash illusion (SIFI) is a multisensory perceptual phenomenon in which the number of brief visual stimuli perceived by an observer is influenced by the number of concurrently presented sounds. While the strength of this illusion has been shown to be modulated by the temporal congruence of the stimuli from each modality, there is conflicting evidence regarding its dependence upon their spatial congruence. We addressed this question by examining SIFIs under conditions in which the spatial reliability of the visual stimuli was degraded and different sound localization cues were presented using either free-field or closed-field stimulation. The likelihood of reporting a SIFI varied with the spatial cue composition of the auditory stimulus and was highest when binaural cues were presented over headphones. SIFIs were more common for small flashes than for large flashes, and for small flashes at peripheral locations, subjects experienced a greater number of illusory fusion events than fission events. However, the SIFI was not dependent on the spatial proximity of the audiovisual stimuli, but was instead determined primarily by differences in subjects' underlying sensitivity across the visual field to the number of flashes presented. Our findings indicate that the influence of auditory stimulation on visual numerosity judgments can occur independently of the spatial relationship between the stimuli.
Collapse
Affiliation(s)
- Daniel P Kumpik
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK
| | - Helen E Roberts
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK
| | - Andrew J King
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK
| | - Jennifer K Bizley
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UKUCL Ear Institute, London, UK
| |
Collapse
|
42
|
Erickson LC, Zielinski BA, Zielinski JEV, Liu G, Turkeltaub PE, Leaver AM, Rauschecker JP. Distinct cortical locations for integration of audiovisual speech and the McGurk effect. Front Psychol 2014; 5:534. [PMID: 24917840 PMCID: PMC4040936 DOI: 10.3389/fpsyg.2014.00534] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 05/14/2014] [Indexed: 11/13/2022] Open
Abstract
Audiovisual (AV) speech integration is often studied using the McGurk effect, where the combination of specific incongruent auditory and visual speech cues produces the perception of a third illusory speech percept. Recently, several studies have implicated the posterior superior temporal sulcus (pSTS) in the McGurk effect; however, the exact roles of the pSTS and other brain areas in "correcting" differing AV sensory inputs remain unclear. Using functional magnetic resonance imaging (fMRI) in ten participants, we aimed to isolate brain areas specifically involved in processing congruent AV speech and the McGurk effect. Speech stimuli were composed of sounds and/or videos of consonant-vowel tokens resulting in four stimulus classes: congruent AV speech (AVCong), incongruent AV speech resulting in the McGurk effect (AVMcGurk), acoustic-only speech (AO), and visual-only speech (VO). In group- and single-subject analyses, left pSTS exhibited significantly greater fMRI signal for congruent AV speech (i.e., AVCong trials) than for both AO and VO trials. Right superior temporal gyrus, medial prefrontal cortex, and cerebellum were also identified. For McGurk speech (i.e., AVMcGurk trials), two clusters in the left posterior superior temporal gyrus (pSTG), just posterior to Heschl's gyrus or on its border, exhibited greater fMRI signal than both AO and VO trials. We propose that while some brain areas, such as left pSTS, may be more critical for the integration of AV speech, other areas, such as left pSTG, may generate the "corrected" or merged percept arising from conflicting auditory and visual cues (i.e., as in the McGurk effect). These findings are consistent with the concept that posterior superior temporal areas represent part of a "dorsal auditory stream," which is involved in multisensory integration, sensorimotor control, and optimal state estimation (Rauschecker and Scott, 2009).
Collapse
Affiliation(s)
- Laura C Erickson
- Department of Neuroscience, Georgetown University Medical Center, Washington DC, USA ; Department of Neurology, Georgetown University Medical Center, Washington DC, USA
| | - Brandon A Zielinski
- Department of Physiology and Biophysics, Georgetown University Medical Center, Washington DC, USA ; Departments of Pediatrics and Neurology, Division of Child Neurology, University of Utah, Salt Lake City UT, USA
| | - Jennifer E V Zielinski
- Department of Physiology and Biophysics, Georgetown University Medical Center, Washington DC, USA
| | - Guoying Liu
- Department of Physiology and Biophysics, Georgetown University Medical Center, Washington DC, USA ; National Institutes of Health, Bethesda MD, USA
| | - Peter E Turkeltaub
- Department of Neurology, Georgetown University Medical Center, Washington DC, USA ; MedStar National Rehabilitation Hospital, Washington DC, USA
| | - Amber M Leaver
- Department of Neuroscience, Georgetown University Medical Center, Washington DC, USA ; Department of Neurology, University of California Los Angeles, Los Angeles CA, USA
| | - Josef P Rauschecker
- Department of Neuroscience, Georgetown University Medical Center, Washington DC, USA ; Department of Physiology and Biophysics, Georgetown University Medical Center, Washington DC, USA
| |
Collapse
|
43
|
Altieri N. Multisensory integration, learning, and the predictive coding hypothesis. Front Psychol 2014; 5:257. [PMID: 24715884 PMCID: PMC3970030 DOI: 10.3389/fpsyg.2014.00257] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 03/10/2014] [Indexed: 11/13/2022] Open
Affiliation(s)
- Nicholas Altieri
- ISU Multimodal Language Processing Lab, Department of Communication Sciences and Disorders, Idaho State University Pocatello, Idaho, USA
| |
Collapse
|
44
|
Paris T, Kim J, Davis C. Visual speech form influences the speed of auditory speech processing. BRAIN AND LANGUAGE 2013; 126:350-356. [PMID: 23942046 DOI: 10.1016/j.bandl.2013.06.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Revised: 05/28/2013] [Accepted: 06/29/2013] [Indexed: 06/02/2023]
Abstract
An important property of visual speech (movements of the lips and mouth) is that it generally begins before auditory speech. Research using brain-based paradigms has demonstrated that seeing visual speech speeds up the activation of the listener's auditory cortex but it is not clear whether these observed neural processes link to behaviour. It was hypothesized that the very early portion of visual speech (occurring before auditory speech) will allow listeners to predict the following auditory event and so facilitate the speed of speech perception. This was tested in the current behavioural experiments. Further, we tested whether the salience of the visual speech played a role in this speech facilitation effect (Experiment 1). We also determined the relative contributions that visual form (what) and temporal (when) cues made (Experiment 2). The results showed that visual speech cues facilitated response times and that this was based on form rather than temporal cues.
Collapse
Affiliation(s)
- Tim Paris
- The MARCS Institute, University of Western Sydney, Sydney, Australia.
| | | | | |
Collapse
|