1
|
Lee HH, Groves K, Ripollés P, Carrasco M. Audiovisual integration in the McGurk effect is impervious to music training. Sci Rep 2024; 14:3262. [PMID: 38332159 PMCID: PMC10853564 DOI: 10.1038/s41598-024-53593-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 02/01/2024] [Indexed: 02/10/2024] Open
Abstract
The McGurk effect refers to an audiovisual speech illusion where the discrepant auditory and visual syllables produce a fused percept between the visual and auditory component. However, little is known about how individual differences contribute to the McGurk effect. Here, we examined whether music training experience-which involves audiovisual integration-can modulate the McGurk effect. Seventy-three participants completed the Goldsmiths Musical Sophistication Index (Gold-MSI) questionnaire to evaluate their music expertise on a continuous scale. Gold-MSI considers participants' daily-life exposure to music learning experiences (formal and informal), instead of merely classifying people into different groups according to how many years they have been trained in music. Participants were instructed to report, via a 3-alternative forced choice task, "what a person said": /Ba/, /Ga/ or /Da/. The experiment consisted of 96 audiovisual congruent trials and 96 audiovisual incongruent (McGurk) trials. We observed no significant correlations between the susceptibility of the McGurk effect and the different subscales of the Gold-MSI (active engagement, perceptual abilities, music training, singing abilities, emotion) or the general musical sophistication composite score. Together, these findings suggest that music training experience does not modulate audiovisual integration in speech as reflected by the McGurk effect.
Collapse
Affiliation(s)
- Hsing-Hao Lee
- Department of Psychology, New York University, New York, USA.
| | - Karleigh Groves
- Department of Psychology, New York University, New York, USA
- Center for Language, Music, and Emotion (CLaME), New York University, New York, USA
- Music and Audio Research Lab (MARL), New York University, New York, USA
| | - Pablo Ripollés
- Department of Psychology, New York University, New York, USA
- Center for Language, Music, and Emotion (CLaME), New York University, New York, USA
- Music and Audio Research Lab (MARL), New York University, New York, USA
| | - Marisa Carrasco
- Department of Psychology, New York University, New York, USA
- Center for Neural Science, New York University, New York, USA
| |
Collapse
|
2
|
Zeng B, Yu G, Hasshim N, Hong S. Primacy of mouth over eyes to perceive audiovisual Mandarin lexical tones. J Eye Mov Res 2023; 16:10.16910/jemr.16.4.4. [PMID: 38585238 PMCID: PMC10997307 DOI: 10.16910/jemr.16.4.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024] Open
Abstract
The visual cues of lexical tones are more implicit and much less investigated than consonants and vowels, and it is still unclear what facial areas contribute to facial tones identification. This study investigated Chinese and English speakers' eye movements when they were asked to identify audiovisual Mandarin lexical tones. The Chinese and English speakers were presented with an audiovisual clip of Mandarin monosyllables (for instance, /ă/, /à/, /ĭ/, /ì/) and were asked to identify whether the syllables were a dipping tone (/ă/, / ĭ/) or a falling tone (/ à/, /ì/). These audiovisual syllables were presented in clear, noisy and silent (absence of audio signal) conditions. An eye-tracker recorded the participants' eye movements. Results showed that the participants gazed more at the mouth than the eyes. In addition, when acoustic conditions became adverse, both the Chinese and English speakers increased their gaze duration at the mouth rather than at the eyes. The findings suggested that the mouth is the primary area that listeners utilise in their perception of audiovisual lexical tones. The similar eye movements between the Chinese and English speakers imply that the mouth acts as a perceptual cue that provides articulatory information, as opposed to social and pragmatic information.
Collapse
Affiliation(s)
- Biao Zeng
- University of South Wales, Pontypridd, UK
| | | | | | - Shanhu Hong
- Quanzhou Preschool Education College, Quanzhou, China
| |
Collapse
|
3
|
Saalasti S, Alho J, Lahnakoski JM, Bacha-Trams M, Glerean E, Jääskeläinen IP, Hasson U, Sams M. Lipreading a naturalistic narrative in a female population: Neural characteristics shared with listening and reading. Brain Behav 2023; 13:e2869. [PMID: 36579557 PMCID: PMC9927859 DOI: 10.1002/brb3.2869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 12/30/2022] Open
Abstract
INTRODUCTION Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.
Collapse
Affiliation(s)
- Satu Saalasti
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.,Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Advanced Magnetic Imaging (AMI) Centre, Aalto NeuroImaging, School of Science, Aalto University, Espoo, Finland
| | - Jussi Alho
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Juha M Lahnakoski
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Munich, Germany.,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany.,Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mareike Bacha-Trams
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Enrico Glerean
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Iiro P Jääskeläinen
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto Studios - MAGICS, Aalto University, Espoo, Finland
| |
Collapse
|
4
|
Banks B, Gowen E, Munro KJ, Adank P. Eye Gaze and Perceptual Adaptation to Audiovisual Degraded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3432-3445. [PMID: 34463528 DOI: 10.1044/2021_jslhr-21-00106] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose Visual cues from a speaker's face may benefit perceptual adaptation to degraded speech, but current evidence is limited. We aimed to replicate results from previous studies to establish the extent to which visual speech cues can lead to greater adaptation over time, extending existing results to a real-time adaptation paradigm (i.e., without a separate training period). A second aim was to investigate whether eye gaze patterns toward the speaker's mouth were related to better perception, hypothesizing that listeners who looked more at the speaker's mouth would show greater adaptation. Method A group of listeners (n = 30) was presented with 90 noise-vocoded sentences in audiovisual format, whereas a control group (n = 29) was presented with the audio signal only. Recognition accuracy was measured throughout and eye tracking was used to measure fixations toward the speaker's eyes and mouth in the audiovisual group. Results Previous studies were partially replicated: The audiovisual group had better recognition throughout and adapted slightly more rapidly, but both groups showed an equal amount of improvement overall. Longer fixations on the speaker's mouth in the audiovisual group were related to better overall accuracy. An exploratory analysis further demonstrated that the duration of fixations to the speaker's mouth decreased over time. Conclusions The results suggest that visual cues may not benefit adaptation to degraded speech as much as previously thought. Longer fixations on a speaker's mouth may play a role in successfully decoding visual speech cues; however, this will need to be confirmed in future research to fully understand how patterns of eye gaze are related to audiovisual speech recognition. All materials, data, and code are available at https://osf.io/2wqkf/.
Collapse
Affiliation(s)
- Briony Banks
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Emma Gowen
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
- Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
5
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
6
|
Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect. Atten Percept Psychophys 2020; 82:3544-3557. [PMID: 32533526 PMCID: PMC7788022 DOI: 10.3758/s13414-020-02042-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seeing a talker’s face can aid audiovisual (AV) integration when speech is presented in noise. However, few studies have simultaneously manipulated auditory and visual degradation. We aimed to establish how degrading the auditory and visual signal affected AV integration. Where people look on the face in this context is also of interest; Buchan, Paré and Munhall (Brain Research, 1242, 162–171, 2008) found fixations on the mouth increased in the presence of auditory noise whilst Wilson, Alsius, Paré and Munhall (Journal of Speech, Language, and Hearing Research, 59(4), 601–615, 2016) found mouth fixations decreased with decreasing visual resolution. In Condition 1, participants listened to clear speech, and in Condition 2, participants listened to vocoded speech designed to simulate the information provided by a cochlear implant. Speech was presented in three levels of auditory noise and three levels of visual blurring. Adding noise to the auditory signal increased McGurk responses, while blurring the visual signal decreased McGurk responses. Participants fixated the mouth more on trials when the McGurk effect was perceived. Adding auditory noise led to people fixating the mouth more, while visual degradation led to people fixating the mouth less. Combined, the results suggest that modality preference and where people look during AV integration of incongruent syllables varies according to the quality of information available.
Collapse
|
7
|
A value-driven McGurk effect: Value-associated faces enhance the influence of visual information on audiovisual speech perception and its eye movement pattern. Atten Percept Psychophys 2020; 82:1928-1941. [PMID: 31898072 DOI: 10.3758/s13414-019-01918-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This study investigates whether and how value-associated faces affect audiovisual speech perception and its eye movement pattern. Participants were asked to learn to associate particular faces with or without monetary reward in the training phase, and, in the subsequent test phase, to identify syllables that the talkers had said in video clips in which the talkers' faces had or had not been associated with reward. The syllables were either congruent or incongruent with the talkers' mouth movements. Crucially, in some cases, the incongruent syllables could elicit the McGurk effect. Results showed that the McGurk effect occurred more often for reward-associated faces than for non-reward-associated faces. Moreover, the signal detection analysis revealed that participants had lower criterion and higher discriminability for reward-associated faces than for non-reward-associated faces. Surprisingly, eye movement data showed that participants spent more time looking at and fixated more often on the extraoral (nose/cheek) area for reward-associated faces than for non-reward-associated faces, while the opposite pattern was observed on the oral (mouth) area. The correlation analysis demonstrated that, over participants, the more they looked at the extraoral area in the training phase because of reward, the larger the increase of McGurk proportion (and the less they looked at the oral area) in the test phase. These findings not only demonstrate that value-associated faces enhance the influence of visual information on audiovisual speech perception but also highlight the importance of the extraoral facial area in the value-driven McGurk effect.
Collapse
|
8
|
Abstract
Gaze-where one looks, how long, and when-plays an essential part in human social behavior. While many aspects of social gaze have been reviewed, there is no comprehensive review or theoretical framework that describes how gaze to faces supports face-to-face interaction. In this review, I address the following questions: (1) When does gaze need to be allocated to a particular region of a face in order to provide the relevant information for successful interaction; (2) How do humans look at other people, and faces in particular, regardless of whether gaze needs to be directed at a particular region to acquire the relevant visual information; (3) How does gaze support the regulation of interaction? The work reviewed spans psychophysical research, observational research, and eye-tracking research in both lab-based and interactive contexts. Based on the literature overview, I sketch a framework for future research based on dynamic systems theory. The framework holds that gaze should be investigated in relation to sub-states of the interaction, encompassing sub-states of the interactors, the content of the interaction as well as the interactive context. The relevant sub-states for understanding gaze in interaction vary over different timescales from microgenesis to ontogenesis and phylogenesis. The framework has important implications for vision science, psychopathology, developmental science, and social robotics.
Collapse
Affiliation(s)
- Roy S Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584CS, Utrecht, The Netherlands.
- Developmental Psychology, Heidelberglaan 1, 3584CS, Utrecht, The Netherlands.
| |
Collapse
|
9
|
Randazzo M, Priefer R, Smith PJ, Nagler A, Avery T, Froud K. Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm. Brain Sci 2020; 10:brainsci10060328. [PMID: 32481538 PMCID: PMC7348766 DOI: 10.3390/brainsci10060328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/15/2020] [Accepted: 05/25/2020] [Indexed: 11/16/2022] Open
Abstract
The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.
Collapse
Affiliation(s)
- Melissa Randazzo
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
- Correspondence: ; Tel.: +1-516-877-4769
| | - Ryan Priefer
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Paul J. Smith
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Amanda Nagler
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Trey Avery
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Karen Froud
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| |
Collapse
|
10
|
Fixating the eyes of a speaker provides sufficient visual information to modulate early auditory processing. Biol Psychol 2019; 146:107724. [PMID: 31323242 DOI: 10.1016/j.biopsycho.2019.107724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/26/2019] [Accepted: 06/29/2019] [Indexed: 11/23/2022]
Abstract
In face-to-face conversations, when listeners process and combine information obtained from hearing and seeing a speaker, they mostly look at the eyes rather than at the more informative mouth region. Measuring event-related potentials, we tested whether fixating the speaker's eyes is sufficient for gathering enough visual speech information to modulate early auditory processing, or whether covert attention to the speaker's mouth is needed. Results showed that when listeners fixated the eye region of the speaker, the amplitudes of the auditory evoked N1 and P2 were reduced when listeners heard and saw the speaker than when they only heard her. These cross-modal interactions also occurred when, in addition, attention was restricted to the speaker's eye region. Fixating the speaker's eyes thus provides listeners with sufficient visual information to facilitate early auditory processing. The spread of covert attention to the mouth area is not needed to observe audiovisual interactions.
Collapse
|
11
|
Jansen SD, Keebler JR, Chaparro A. Shifts in Maximum Audiovisual Integration with Age. Multisens Res 2018; 31:191-212. [DOI: 10.1163/22134808-00002599] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Accepted: 07/14/2017] [Indexed: 11/19/2022]
Abstract
Listeners attempting to understand speech in noisy environments rely on visual and auditory processes, typically referred to as audiovisual processing. Noise corrupts the auditory speech signal and listeners naturally leverage visual cues from the talker’s face in an attempt to interpret the degraded auditory signal. Studies of speech intelligibility in noise show that the maximum improvement in speech recognition performance (i.e., maximum visual enhancement or VEmax), derived from seeing an interlocutor’s face, is invariant with age. Several studies have reported that VEmaxis typically associated with a signal-to-noise (SNR) of −12 dB; however, few studies have systematically investigated whether the SNR associated with VEmaxchanges with age. We investigated if VEmaxchanges as a function of age, whether the SNR at VEmaxchanges as a function of age, and what perceptual/cognitive abilities account for or mediate such relationships. We measured VEmaxon a nongeriatric adult sample () ranging in age from 20 to 59 years old. We found that VEmaxwas age-invariant, replicating earlier studies. No perceptual/cognitive measures predicted VEmax, most likely due to limited variance in VEmaxscores. Importantly, we found that the SNR at VEmaxshifts toward higher (quieter) SNR levels with increasing age; however, this relationship is partially mediated by working memory capacity, where those with larger working memory capacities (WMCs) can identify speech under lower (louder) SNR levels than their age equivalents with smaller WMCs. The current study is the first to report that individual differences in WMC partially mediate the age-related shift in SNR at VEmax.
Collapse
Affiliation(s)
| | - Joseph R. Keebler
- Department of Human Factors and Behavioral Neurobiology, Embry-Riddle Aeronautical University, Daytona Beach, FL, USA
| | - Alex Chaparro
- Department of Human Factors and Behavioral Neurobiology, Embry-Riddle Aeronautical University, Daytona Beach, FL, USA
| |
Collapse
|
12
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
13
|
Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017; 38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open
Abstract
There are two main behavioral expressions of multisensory integration (MSI) in speech; the perceptual enhancement produced by the sight of the congruent lip movements of the speaker, and the illusory sound perceived when a speech syllable is dubbed with incongruent lip movements, in the McGurk effect. These two models have been used very often to study MSI. Here, we contend that, unlike congruent audiovisually (AV) speech, the McGurk effect involves brain areas related to conflict detection and resolution. To test this hypothesis, we used fMRI to measure blood oxygen level dependent responses to AV speech syllables. We analyzed brain activity as a function of the nature of the stimuli-McGurk or non-McGurk-and the perceptual outcome regarding MSI-integrated or not integrated response-in a 2 × 2 factorial design. The results showed that, regardless of perceptual outcome, AV mismatch activated general-purpose conflict areas (e.g., anterior cingulate cortex) as well as specific AV speech conflict areas (e.g., inferior frontal gyrus), compared with AV matching stimuli. Moreover, these conflict areas showed stronger activation on trials where the McGurk illusion was perceived compared with non-illusory trials, despite the stimuli where physically identical. We conclude that the AV incongruence in McGurk stimuli triggers the activation of conflict processing areas and that the process of resolving the cross-modal conflict is critical for the McGurk illusion to arise. Hum Brain Mapp 38:5691-5705, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy.,ImpAct Team, Lyon Neuroscience Research Center (UCBL1, INSERM 1028, CNRS 5292), Lyon, France
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
14
|
High visual resolution matters in audiovisual speech perception, but only for some. Atten Percept Psychophys 2016; 78:1472-87. [DOI: 10.3758/s13414-016-1109-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|