1
|
Magnotti JF, Lado A, Beauchamp MS. The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers. Front Neurosci 2024; 18:1421713. [PMID: 38988770 PMCID: PMC11233445 DOI: 10.3389/fnins.2024.1421713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/28/2024] [Indexed: 07/12/2024] Open
Abstract
In the McGurk effect, visual speech from the face of the talker alters the perception of auditory speech. The diversity of human languages has prompted many intercultural studies of the effect in both Western and non-Western cultures, including native Japanese speakers. Studies of large samples of native English speakers have shown that the McGurk effect is characterized by high variability in the susceptibility of different individuals to the illusion and in the strength of different experimental stimuli to induce the illusion. The noisy encoding of disparity (NED) model of the McGurk effect uses principles from Bayesian causal inference to account for this variability, separately estimating the susceptibility and sensory noise for each individual and the strength of each stimulus. To determine whether variation in McGurk perception is similar between Western and non-Western cultures, we applied the NED model to data collected from 80 native Japanese-speaking participants. Fifteen different McGurk stimuli that varied in syllable content (unvoiced auditory "pa" + visual "ka" or voiced auditory "ba" + visual "ga") were presented interleaved with audiovisual congruent stimuli. The McGurk effect was highly variable across stimuli and participants, with the percentage of illusory fusion responses ranging from 3 to 78% across stimuli and from 0 to 91% across participants. Despite this variability, the NED model accurately predicted perception, predicting fusion rates for individual stimuli with 2.1% error and for individual participants with 2.4% error. Stimuli containing the unvoiced pa/ka pairing evoked more fusion responses than the voiced ba/ga pairing. Model estimates of sensory noise were correlated with participant age, with greater sensory noise in older participants. The NED model of the McGurk effect offers a principled way to account for individual and stimulus differences when examining the McGurk effect in different cultures.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Anastasia Lado
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
2
|
Tiippana K, Ujiie Y, Peromaa T, Takahashi K. Investigation of Cross-Language and Stimulus-Dependent Effects on the McGurk Effect with Finnish and Japanese Speakers and Listeners. Brain Sci 2023; 13:1198. [PMID: 37626554 PMCID: PMC10452414 DOI: 10.3390/brainsci13081198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023] Open
Abstract
In the McGurk effect, perception of a spoken consonant is altered when an auditory (A) syllable is presented with an incongruent visual (V) syllable (e.g., A/pa/V/ka/ is often heard as /ka/ or /ta/). The McGurk effect provides a measure for visual influence on speech perception, becoming stronger the lower the proportion of auditory correct responses. Cross-language effects are studied to understand processing differences between one's own and foreign languages. Regarding the McGurk effect, it has sometimes been found to be stronger with foreign speakers. However, other studies have shown the opposite, or no difference between languages. Most studies have compared English with other languages. We investigated cross-language effects with native Finnish and Japanese speakers and listeners. Both groups of listeners had 49 participants. The stimuli (/ka/, /pa/, /ta/) were uttered by two female and male Finnish and Japanese speakers and presented in A, V and AV modality, including a McGurk stimulus A/pa/V/ka/. The McGurk effect was stronger with Japanese stimuli in both groups. Differences in speech perception were prominent between individual speakers but less so between native languages. Unisensory perception correlated with McGurk perception. These findings suggest that stimulus-dependent features contribute to the McGurk effect. This may have a stronger influence on syllable perception than cross-language factors.
Collapse
Affiliation(s)
- Kaisa Tiippana
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Yuta Ujiie
- Department of Psychology, College of Contemporary Psychology, Rikkyo University, Saitama 352-8558, Japan
- Research Organization of Open Innovation and Collaboration, Ritsumeikan University, Osaka 567-8570, Japan
| | - Tarja Peromaa
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Kohske Takahashi
- College of Comprehensive Psychology, Ritsumeikan University, Osaka 567-8570, Japan
| |
Collapse
|
3
|
Feldman JI, Tu A, Conrad JG, Kuang W, Santapuram P, Woynaroski TG. The Impact of Singing on Visual and Multisensory Speech Perception in Children on the Autism Spectrum. Multisens Res 2022; 36:57-74. [PMID: 36731528 PMCID: PMC9924934 DOI: 10.1163/22134808-bja10087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 11/22/2022] [Indexed: 12/31/2022]
Abstract
Autistic children show reduced multisensory integration of audiovisual speech stimuli in response to the McGurk illusion. Previously, it has been shown that adults can integrate sung McGurk tokens. These sung speech tokens offer more salient visual and auditory cues, in comparison to the spoken tokens, which may increase the identification and integration of visual speech cues in autistic children. Forty participants (20 autism, 20 non-autistic peers) aged 7-14 completed the study. Participants were presented with speech tokens in four modalities: auditory-only, visual-only, congruent audiovisual, and incongruent audiovisual (i.e., McGurk; auditory 'ba' and visual 'ga'). Tokens were also presented in two formats: spoken and sung. Participants indicated what they perceived via a four-button response box (i.e., 'ba', 'ga', 'da', or 'tha'). Accuracies and perception of the McGurk illusion were calculated for each modality and format. Analysis of visual-only identification indicated a significant main effect of format, whereby participants were more accurate in sung versus spoken trials, but no significant main effect of group or interaction effect. Analysis of the McGurk trials indicated no significant main effect of format or group and no significant interaction effect. Sung speech tokens improved identification of visual speech cues, but did not boost the integration of visual cues with heard speech across groups. Additional work is needed to determine what properties of spoken speech contributed to the observed improvement in visual accuracy and to evaluate whether more prolonged exposure to sung speech may yield effects on multisensory integration.
Collapse
Affiliation(s)
- Jacob I. Feldman
- Department of Hearing and Speech Sciences, Vanderbilt
University Medical Center, Nashville, TN, USA
- Frist Center for Autism and Innovation, Vanderbilt
University, Nashville, TN, USA
| | - Alexander Tu
- Neuroscience Undergraduate Program, Vanderbilt University,
Nashville, TN, USA
- Present Address: Department of Otolaryngology and
Communication Sciences, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Julie G. Conrad
- Neuroscience Undergraduate Program, Vanderbilt University,
Nashville, TN, USA
- Present Address: Department of Pediatrics, University of
Illinois, Chicago, IL, USA
| | - Wayne Kuang
- Neuroscience Undergraduate Program, Vanderbilt University,
Nashville, TN, USA
- Present Address: Department of Pediatrics, Los Angeles
County and University of Southern California (LAC+USC) Medical Center, University of
Southern California, Los Angeles, CA, USA
| | - Pooja Santapuram
- Neuroscience Undergraduate Program, Vanderbilt University,
Nashville, TN, USA
- Present Address: Department of Anesthesiology, Columbia
University Irving Medical Center, New York, NY, USA
| | - Tiffany G. Woynaroski
- Department of Hearing and Speech Sciences, Vanderbilt
University Medical Center, Nashville, TN, USA
- Frist Center for Autism and Innovation, Vanderbilt
University, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt University Medical
Center, Nashville, TN, USA
- Vanderbilt Brain Institute, Vanderbilt University,
Nashville, TN, USA
| |
Collapse
|
4
|
Diaz MT, Yalcinbas E. The neural bases of multimodal sensory integration in older adults. INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT 2021; 45:409-417. [PMID: 34650316 DOI: 10.1177/0165025420979362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although hearing often declines with age, prior research has shown that older adults may benefit from multisensory input to a greater extent when compared to younger adults, a concept known as inverse effectiveness. While there is behavioral evidence in support of this phenomenon, less is known about its neural basis. The present fMRI study examined how older and younger adults processed multimodal auditory-visual (AV) phonemic stimuli which were either congruent or incongruent across modalities. Incongruent AV pairs were designed to elicit the McGurk effect. Behaviorally, reaction times were significantly faster during congruent trials compared to incongruent trials for both age groups, and overall older adults responded more slowly. The interaction was not significant suggesting that older adults processed the AV stimuli similarly to younger adults. Although there were minimal behavioral differences, age-related differences in functional activation were identified: Younger adults elicited greater activation than older adults in primary sensory regions including superior temporal gyrus, the calcarine fissure, and left post-central gyrus. In contrast, older adults elicited greater activation than younger adults in dorsal frontal regions including middle and superior frontal gyri, as well as dorsal parietal regions. These data suggest that while there is age-related stability in behavioral sensitivity to multimodal stimuli, the neural bases for this effect differed between older and younger adults. Our results demonstrated that older adults underrecruited primary sensory cortices and had increased recruitment of regions involved in executive function, attention, and monitoring processes, which may reflect an attempt to compensate.
Collapse
Affiliation(s)
- Michele T Diaz
- Department of Psychology, The Pennsylvania State University
| | - Ege Yalcinbas
- Neurosciences Department, University of California, San Diego
| |
Collapse
|
5
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
6
|
Thézé R, Giraud AL, Mégevand P. The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech. SCIENCE ADVANCES 2020; 6:6/45/eabc6348. [PMID: 33148648 PMCID: PMC7673697 DOI: 10.1126/sciadv.abc6348] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 09/17/2020] [Indexed: 06/11/2023]
Abstract
When we see our interlocutor, our brain seamlessly extracts visual cues from their face and processes them along with the sound of their voice, making speech an intrinsically multimodal signal. Visual cues are especially important in noisy environments, when the auditory signal is less reliable. Neuronal oscillations might be involved in the cortical processing of audiovisual speech by selecting which sensory channel contributes more to perception. To test this, we designed computer-generated naturalistic audiovisual speech stimuli where one mismatched phoneme-viseme pair in a key word of sentences created bistable perception. Neurophysiological recordings (high-density scalp and intracranial electroencephalography) revealed that the precise phase angle of theta-band oscillations in posterior temporal and occipital cortex of the right hemisphere was crucial to select whether the auditory or the visual speech cue drove perception. We demonstrate that the phase of cortical oscillations acts as an instrument for sensory selection in audiovisual speech processing.
Collapse
Affiliation(s)
- Raphaël Thézé
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland
| | - Pierre Mégevand
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland.
- Division of Neurology, Department of Clinical Neurosciences, Geneva University Hospitals, 1205 Geneva, Switzerland
| |
Collapse
|
7
|
Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020; 133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.
Collapse
|
8
|
Thézé R, Gadiri MA, Albert L, Provost A, Giraud AL, Mégevand P. Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments. Sci Rep 2020; 10:15540. [PMID: 32968127 PMCID: PMC7511320 DOI: 10.1038/s41598-020-72375-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 08/31/2020] [Indexed: 11/09/2022] Open
Abstract
Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.
Collapse
Affiliation(s)
- Raphaël Thézé
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Mehdi Ali Gadiri
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Louis Albert
- Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
| | - Antoine Provost
- Human Neuroscience Platform, Fondation Campus Biotech Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland
| | - Pierre Mégevand
- Department of Basic Neurosciences, University of Geneva, Campus Biotech, Chemin des Mines 9, 1202, Geneva, Switzerland. .,Division of Neurology, Geneva University Hospitals, Geneva, Switzerland.
| |
Collapse
|
9
|
Feng G, Zhou B, Zhou W, Beauchamp MS, Magnotti JF. A Laboratory Study of the McGurk Effect in 324 Monozygotic and Dizygotic Twins. Front Neurosci 2019; 13:1029. [PMID: 31636529 PMCID: PMC6787151 DOI: 10.3389/fnins.2019.01029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 09/10/2019] [Indexed: 11/13/2022] Open
Abstract
Multisensory integration of information from the talker's voice and the talker's mouth facilitates human speech perception. A popular assay of audiovisual integration is the McGurk effect, an illusion in which incongruent visual speech information categorically changes the percept of auditory speech. There is substantial interindividual variability in susceptibility to the McGurk effect. To better understand possible sources of this variability, we examined the McGurk effect in 324 native Mandarin speakers, consisting of 73 monozygotic (MZ) and 89 dizygotic (DZ) twin pairs. When tested with 9 different McGurk stimuli, some participants never perceived the illusion and others always perceived it. Within participants, perception was similar across time (r = 0.55 at a 2-year retest in 150 participants) suggesting that McGurk susceptibility reflects a stable trait rather than short-term perceptual fluctuations. To examine the effects of shared genetics and prenatal environment, we compared McGurk susceptibility between MZ and DZ twins. Both twin types had significantly greater correlation than unrelated pairs (r = 0.28 for MZ twins and r = 0.21 for DZ twins) suggesting that the genes and environmental factors shared by twins contribute to individual differences in multisensory speech perception. Conversely, the existence of substantial differences within twin pairs (even MZ co-twins) and the overall low percentage of explained variance (5.5%) argues against a deterministic view of individual differences in multisensory integration.
Collapse
Affiliation(s)
- Guo Feng
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Psychological Research and Counseling Center, Southwest Jiaotong University, Chengdu, China
| | - Bin Zhou
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Wen Zhou
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Michael S. Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, United States
| | - John F. Magnotti
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, United States
| |
Collapse
|
10
|
Lalonde K, Werner LA. Perception of incongruent audiovisual English consonants. PLoS One 2019; 14:e0213588. [PMID: 30897109 PMCID: PMC6428273 DOI: 10.1371/journal.pone.0213588] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 02/25/2019] [Indexed: 11/21/2022] Open
Abstract
Causal inference—the process of deciding whether two incoming signals come from the same source—is an important step in audiovisual (AV) speech perception. This research explored causal inference and perception of incongruent AV English consonants. Nine adults were presented auditory, visual, congruent AV, and incongruent AV consonant-vowel syllables. Incongruent AV stimuli included auditory and visual syllables with matched vowels, but mismatched consonants. Open-set responses were collected. For most incongruent syllables, participants were aware of the mismatch between auditory and visual signals (59.04%) or reported the auditory syllable (33.73%). Otherwise, participants reported the visual syllable (1.13%) or some other syllable (6.11%). Statistical analyses were used to assess whether visual distinctiveness and place, voice, and manner features predicted responses. Mismatch responses occurred more when the auditory and visual consonants were visually distinct, when place and manner differed across auditory and visual consonants, and for consonants with high visual accuracy. Auditory responses occurred more when the auditory and visual consonants were visually similar, when place and manner were the same across auditory and visual stimuli, and with consonants produced further back in the mouth. Visual responses occurred more when voicing and manner were the same across auditory and visual stimuli, and for front and middle consonants. Other responses were variable, but typically matched the visual place, auditory voice, and auditory manner of the input. Overall, results indicate that causal inference and incongruent AV consonant perception depend on salience and reliability of auditory and visual inputs and degree of redundancy between auditory and visual inputs. A parameter-free computational model of incongruent AV speech perception based on unimodal confusions, with a causal inference rule, was applied. Data from the current study present an opportunity to test and improve the generalizability of current AV speech integration models.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| | - Lynne A. Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
11
|
Devaraju DS, U AK, Maruthy S. Comparison of McGurk Effect across Three Consonant-Vowel Combinations in Kannada. J Audiol Otol 2019; 23:39-43. [PMID: 30518196 PMCID: PMC6348306 DOI: 10.7874/jao.2018.00234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 06/16/2018] [Accepted: 07/16/2018] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND AND OBJECTIVES The influence of visual stimulus on the auditory component in the perception of auditory-visual (AV) consonant-vowel syllables has been demonstrated in different languages. Inherent properties of unimodal stimuli are known to modulate AV integration. The present study investigated how the amount of McGurk effect (an outcome of AV integration) varies across three different consonant combinations in Kannada language. The importance of unimodal syllable identification on the amount of McGurk effect was also seen. Subjects and. METHODS Twenty-eight individuals performed an AV identification task with ba/ ga, pa/ka and ma/n· a consonant combinations in AV congruent, AV incongruent (McGurk combination), audio alone and visual alone condition. Cluster analysis was performed using the identification scores for the incongruent stimuli, to classify the individuals into two groups; one with high and the other with low McGurk scores. The differences in the audio alone and visual alone scores between these groups were compared. RESULTS The results showed significantly higher McGurk scores for ma/n· a compared to ba/ga and pa/ka combinations in both high and low McGurk score groups. No significant difference was noted between ba/ga and pa/ka combinations in either group. Identification of /n· a/ presented in the visual alone condition correlated negatively with the higher McGurk scores. CONCLUSIONS The results suggest that the final percept following the AV integration is not exclusively explained by the unimodal identification of the syllables. But there are other factors which may also contribute to making inferences about the final percept.
Collapse
Affiliation(s)
- Dhatri S Devaraju
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| | - Ajith Kumar U
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| | - Santosh Maruthy
- Department of Speech-Language Sciences, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| |
Collapse
|
12
|
Magnotti JF, Smith KB, Salinas M, Mays J, Zhu LL, Beauchamp MS. A causal inference explanation for enhancement of multisensory integration by co-articulation. Sci Rep 2018; 8:18032. [PMID: 30575791 PMCID: PMC6303389 DOI: 10.1038/s41598-018-36772-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 11/22/2018] [Indexed: 11/09/2022] Open
Abstract
The McGurk effect is a popular assay of multisensory integration in which participants report the illusory percept of "da" when presented with incongruent auditory "ba" and visual "ga" (AbaVga). While the original publication describing the effect found that 98% of participants perceived it, later studies reported much lower prevalence, ranging from 17% to 81%. Understanding the source of this variability is important for interpreting the panoply of studies that examine McGurk prevalence between groups, including clinical populations such as individuals with autism or schizophrenia. The original publication used stimuli consisting of multiple repetitions of a co-articulated syllable (three repetitions, AgagaVbaba). Later studies used stimuli without repetition or co-articulation (AbaVga) and used congruent syllables from the same talker as a control. In three experiments, we tested how stimulus repetition, co-articulation, and talker repetition affect McGurk prevalence. Repetition with co-articulation increased prevalence by 20%, while repetition without co-articulation and talker repetition had no effect. A fourth experiment compared the effect of the on-line testing used in the first three experiments with the in-person testing used in the original publication; no differences were observed. We interpret our results in the framework of causal inference: co-articulation increases the evidence that auditory and visual speech tokens arise from the same talker, increasing tolerance for content disparity and likelihood of integration. The results provide a principled explanation for how co-articulation aids multisensory integration and can explain the high prevalence of the McGurk effect in the initial publication.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA.
| | - Kristen B Smith
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Marcelo Salinas
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Jacqunae Mays
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
| | - Lin L Zhu
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Michael S Beauchamp
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA.
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
13
|
Proverbio AM, Raso G, Zani A. Electrophysiological Indexes of Incongruent Audiovisual Phonemic Processing: Unraveling the McGurk Effect. Neuroscience 2018; 385:215-226. [PMID: 29932985 DOI: 10.1016/j.neuroscience.2018.06.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 06/11/2018] [Accepted: 06/12/2018] [Indexed: 11/15/2022]
Abstract
In this study the timing of electromagnetic signals recorded during incongruent and congruent audiovisual (AV) stimulation in 14 Italian healthy volunteers was examined. In a previous study (Proverbio et al., 2016) we investigated the McGurk effect in the Italian language and found out which visual and auditory inputs provided the most compelling illusory effects (e.g., bilabial phonemes presented acoustically and paired with non-labials, especially alveolar-nasal and velar-occlusive phonemes). In this study EEG was recorded from 128 scalp sites while participants observed a female and a male actor uttering 288 syllables selected on the basis of the previous investigation (lasting approximately 600 ms) and responded to rare targets (/re/, /ri/, /ro/, /ru/). In half of the cases the AV information was incongruent, except for targets that were always congruent. A pMMN (phonological Mismatch Negativity) to incongruent AV stimuli was identified 500 ms after voice onset time. This automatic response indexed the detection of an incongruity between the labial and phonetic information. SwLORETA (Low-Resolution Electromagnetic Tomography) analysis applied to the difference voltage incongruent-congruent in the same time window revealed that the strongest sources of this activity were the right superior temporal (STG) and superior frontal gyri, which supports their involvement in AV integration.
Collapse
Affiliation(s)
- Alice Mado Proverbio
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy.
| | - Giulia Raso
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy
| | | |
Collapse
|
14
|
Bernstein LE. Response Errors in Females' and Males' Sentence Lipreading Necessitate Structurally Different Models for Predicting Lipreading Accuracy. LANGUAGE LEARNING 2018; 68:127-158. [PMID: 31485084 PMCID: PMC6724546 DOI: 10.1111/lang.12281] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Lipreaders recognize words with phonetically impoverished stimuli, an ability that is generally poor in normal-hearing adults. Individual sentence lipreading trials from 341 young adults were modeled to predict words and phonemes correct in terms of measures of phoneme response dissimilarity (PRD), number of inserted incorrect response phonemes, lipreader gender, and a measure of speech perception in noise. Interactions with lipreaders' gender necessitated structurally different models of males' and females' lipreading. Overall, female lipreaders are more accurate, their ability to recognize words with impoverished or degraded input is consistent across visual and auditory modalities, and they amplify their correct responding through top-down insertion of text. Males' responses suggest that individuals with poorer auditory speech perception in noise amplify their responses by shifting towards including text in their response that is more perceptually discrepant from the stimulus. Attention to gender differences merits attention in future studies that use visual speech stimuli.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Department of Speech, Language, and Hearing Science, George Washington University, 2121 I St NW, Washington, DC 20052
| |
Collapse
|
15
|
Brooks CJ, Chan YM, Anderson AJ, McKendrick AM. Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss. Front Hum Neurosci 2018; 12:192. [PMID: 29867415 PMCID: PMC5954093 DOI: 10.3389/fnhum.2018.00192] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 04/20/2018] [Indexed: 11/26/2022] Open
Abstract
Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information.
Collapse
Affiliation(s)
- Cassandra J Brooks
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Yu Man Chan
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Andrew J Anderson
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Allison M McKendrick
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
16
|
Neural networks supporting audiovisual integration for speech: A large-scale lesion study. Cortex 2018; 103:360-371. [PMID: 29705718 DOI: 10.1016/j.cortex.2018.03.030] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/05/2017] [Accepted: 03/23/2018] [Indexed: 10/17/2022]
Abstract
Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration.
Collapse
|
17
|
McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The Oldenburg Audio Visual Speech Stimuli (OLAVS). Psychon Bull Rev 2018; 24:863-872. [PMID: 27562763 DOI: 10.3758/s13423-016-1148-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The concurrent presentation of different auditory and visual syllables may result in the perception of a third syllable, reflecting an illusory fusion of visual and auditory information. This well-known McGurk effect is frequently used for the study of audio-visual integration. Recently, it was shown that the McGurk effect is strongly stimulus-dependent, which complicates comparisons across perceivers and inferences across studies. To overcome this limitation, we developed the freely available Oldenburg audio-visual speech stimuli (OLAVS), consisting of 8 different talkers and 12 different syllable combinations. The quality of the OLAVS set was evaluated with 24 normal-hearing subjects. All 96 stimuli were characterized based on their stimulus disparity, which was obtained from a probabilistic model (cf. Magnotti & Beauchamp, 2015). Moreover, the McGurk effect was studied in eight adult cochlear implant (CI) users. By applying the individual, stimulus-independent parameters of the probabilistic model, the predicted effect of stronger audio-visual integration in CI users could be confirmed, demonstrating the validity of the new stimulus material.
Collapse
|
18
|
Morís Fernández L, Torralba M, Soto-Faraco S. Theta oscillations reflect conflict processing in the perception of the McGurk illusion. Eur J Neurosci 2018; 48:2630-2641. [DOI: 10.1111/ejn.13804] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 12/12/2017] [Accepted: 12/12/2017] [Indexed: 11/27/2022]
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Mireia Torralba
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
| | - Salvador Soto-Faraco
- Multisensory Research Group; Center for Brain and Cognition; Dept. de Tecnologies de la Informació i les Comunicacions; Universitat Pompeu Fabra; Office 55.128., Roc Boronat, 138 08018 Barcelona Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
19
|
Magnotti JF, Basu Mallick D, Beauchamp MS. Reducing Playback Rate of Audiovisual Speech Leads to a Surprising Decrease in the McGurk Effect. Multisens Res 2018; 31:19-38. [DOI: 10.1163/22134808-00002586] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 06/03/2017] [Indexed: 11/19/2022]
Abstract
We report the unexpected finding that slowing video playback decreases perception of the McGurk effect. This reduction is counter-intuitive because the illusion depends on visual speech influencing the perception of auditory speech, and slowing speech should increase the amount of visual information available to observers. We recorded perceptual data from 110 subjects viewing audiovisual syllables (either McGurk or congruent control stimuli) played back at one of three rates: the rate used by the talker during recording (the natural rate), a slow rate (50% of natural), or a fast rate (200% of natural). We replicated previous studies showing dramatic variability in McGurk susceptibility at the natural rate, ranging from 0–100% across subjects and from 26–76% across the eight McGurk stimuli tested. Relative to the natural rate, slowed playback reduced the frequency of McGurk responses by 11% (79% of subjects showed a reduction) and reduced congruent accuracy by 3% (25% of subjects showed a reduction). Fast playback rate had little effect on McGurk responses or congruent accuracy. To determine whether our results are consistent with Bayesian integration, we constructed a Bayes-optimal model that incorporated two assumptions: individuals combine auditory and visual information according to their reliability, and changing playback rate affects sensory reliability. The model reproduced both our findings of large individual differences and the playback rate effect. This work illustrates that surprises remain in the McGurk effect and that Bayesian integration provides a useful framework for understanding audiovisual speech perception.
Collapse
Affiliation(s)
- John F. Magnotti
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| | | | - Michael S. Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
20
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
21
|
Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017; 38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open
Abstract
There are two main behavioral expressions of multisensory integration (MSI) in speech; the perceptual enhancement produced by the sight of the congruent lip movements of the speaker, and the illusory sound perceived when a speech syllable is dubbed with incongruent lip movements, in the McGurk effect. These two models have been used very often to study MSI. Here, we contend that, unlike congruent audiovisually (AV) speech, the McGurk effect involves brain areas related to conflict detection and resolution. To test this hypothesis, we used fMRI to measure blood oxygen level dependent responses to AV speech syllables. We analyzed brain activity as a function of the nature of the stimuli-McGurk or non-McGurk-and the perceptual outcome regarding MSI-integrated or not integrated response-in a 2 × 2 factorial design. The results showed that, regardless of perceptual outcome, AV mismatch activated general-purpose conflict areas (e.g., anterior cingulate cortex) as well as specific AV speech conflict areas (e.g., inferior frontal gyrus), compared with AV matching stimuli. Moreover, these conflict areas showed stronger activation on trials where the McGurk illusion was perceived compared with non-illusory trials, despite the stimuli where physically identical. We conclude that the AV incongruence in McGurk stimuli triggers the activation of conflict processing areas and that the process of resolving the cross-modal conflict is critical for the McGurk illusion to arise. Hum Brain Mapp 38:5691-5705, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy.,ImpAct Team, Lyon Neuroscience Research Center (UCBL1, INSERM 1028, CNRS 5292), Lyon, France
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
22
|
A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech. PLoS Comput Biol 2017; 13:e1005229. [PMID: 28207734 PMCID: PMC5312805 DOI: 10.1371/journal.pcbi.1005229] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 11/01/2016] [Indexed: 11/19/2022] Open
Abstract
Audiovisual speech integration combines information from auditory speech (talker’s voice) and visual speech (talker’s mouth movements) to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory “ba” + visual “ga” (AbaVga), that are integrated to produce a fused percept (“da”). This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba). We describe a simplified model of causal inference in multisensory speech perception (CIMS) that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others. During face-to-face conversations, we seamlessly integrate information from the talker’s voice with information from the talker’s face. This multisensory integration increases speech perception accuracy and can be critical for understanding speech in noisy environments with many people talking simultaneously. A major challenge for models of multisensory speech perception is thus deciding which voices and faces should be integrated. Our solution to this problem is based on the idea of causal inference—given a particular pair of auditory and visual syllables, the brain calculates the likelihood they are from a single vs. multiple talkers and uses this likelihood to determine the final speech percept. We compared our model with an alternative model that is identical, except that it always integrated the available cues. Using behavioral speech perception data from a large number of subjects, the model with causal inference better predicted how humans would (or would not) integrate audiovisual speech syllables. Our results suggest a fundamental role for a causal inference type calculation in multisensory speech perception.
Collapse
|
23
|
Wilson AH, Alsius A, Paré M, Munhall KG. Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:601-15. [PMID: 27537379 PMCID: PMC5280058 DOI: 10.1044/2016_jslhr-s-15-0092] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 09/16/2015] [Accepted: 10/07/2015] [Indexed: 06/06/2023]
Abstract
PURPOSE The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. METHOD We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent conditions (Experiment 1; N = 66). In Experiment 2 (N = 20), participants performed a visual-only speech perception task and in Experiment 3 (N = 20) an audiovisual task while having their gaze behavior monitored using eye-tracking equipment. RESULTS In the visual-only condition, increasing image resolution led to monotonic increases in performance, and proficient speechreaders were more affected by the removal of high spatial information than were poor speechreaders. The McGurk effect also increased with increasing visual resolution, although it was less affected by the removal of high-frequency information. Observers tended to fixate on the mouth more in visual-only perception, but gaze toward the mouth did not correlate with accuracy of silent speechreading or the magnitude of the McGurk effect. CONCLUSIONS The results suggest that individual differences in silent speechreading and the McGurk effect are not related. This conclusion is supported by differential influences of high-resolution visual information on the 2 tasks and differences in the pattern of gaze.
Collapse
Affiliation(s)
- Amanda H. Wilson
- Psychology Department, Queen's University, Kingston, Ontario, Canada
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Agnès Alsius
- Psychology Department, Queen's University, Kingston, Ontario, Canada
| | - Martin Paré
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Kevin G. Munhall
- Psychology Department, Queen's University, Kingston, Ontario, Canada
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
24
|
Skilled musicians are not subject to the McGurk effect. Sci Rep 2016; 6:30423. [PMID: 27453363 PMCID: PMC4958963 DOI: 10.1038/srep30423] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 07/05/2016] [Indexed: 11/25/2022] Open
Abstract
The McGurk effect is a compelling illusion in which humans auditorily perceive mismatched audiovisual speech as a completely different syllable. In this study evidences are provided that professional musicians are not subject to this illusion, possibly because of their finer auditory or attentional abilities. 80 healthy age-matched graduate students volunteered to the study. 40 were musicians of Brescia Luca Marenzio Conservatory of Music with at least 8–13 years of musical academic studies. /la/, /da/, /ta/, /ga/, /ka/, /na/, /ba/, /pa/ phonemes were presented to participants in audiovisual congruent and incongruent conditions, or in unimodal (only visual or only auditory) conditions while engaged in syllable recognition tasks. Overall musicians showed no significant McGurk effect for any of the phonemes. Controls showed a marked McGurk effect for several phonemes (including alveolar-nasal, velar-occlusive and bilabial ones). The results indicate that the early and intensive musical training might affect the way the auditory cortex process phonetic information.
Collapse
|
25
|
Abstract
In the McGurk effect, incongruent auditory and visual syllables are perceived as a third, completely different syllable. This striking illusion has become a popular assay of multisensory integration for individuals and clinical populations. However, there is enormous variability in how often the illusion is evoked by different stimuli and how often the illusion is perceived by different individuals. Most studies of the McGurk effect have used only one stimulus, making it impossible to separate stimulus and individual differences. We created a probabilistic model to separately estimate stimulus and individual differences in behavioral data from 165 individuals viewing up to 14 different McGurk stimuli. The noisy encoding of disparity (NED) model characterizes stimuli by their audiovisual disparity and characterizes individuals by how noisily they encode the stimulus disparity and by their disparity threshold for perceiving the illusion. The model accurately described perception of the McGurk effect in our sample, suggesting that differences between individuals are stable across stimulus differences. The most important benefit of the NED model is that it provides a method to compare multisensory integration across individuals and groups without the confound of stimulus differences. An added benefit is the ability to predict frequency of the McGurk effect for stimuli never before seen by an individual.
Collapse
|
26
|
Variability and stability in the McGurk effect: contributions of participants, stimuli, time, and response type. Psychon Bull Rev 2016; 22:1299-307. [PMID: 25802068 DOI: 10.3758/s13423-015-0817-4] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In the McGurk effect, pairing incongruent auditory and visual syllables produces a percept different from the component syllables. Although it is a popular assay of audiovisual speech integration, little is known about the distribution of responses to the McGurk effect in the population. In our first experiment, we measured McGurk perception using 12 different McGurk stimuli in a sample of 165 English-speaking adults, 40 of whom were retested following a one-year interval. We observed dramatic differences both in how frequently different individuals perceived the illusion (from 0 % to 100 %) and in how frequently the illusion was perceived across different stimuli (17 % to 58 %). For individual stimuli, the distributions of response frequencies deviated strongly from normality, with 77 % of participants almost never or almost always perceiving the effect (≤10 % or ≥90 %). This deviation suggests that the mean response frequency, the most commonly reported measure of the McGurk effect, is a poor measure of individual participants' responses, and that the assumptions made by parametric statistical tests are invalid. Despite the substantial variability across individuals and stimuli, there was little change in the frequency of the effect between initial testing and a one-year retest (mean change in frequency = 2 %; test-retest correlation, r = 0.91). In a second experiment, we replicated our findings of high variability using eight new McGurk stimuli and tested the effects of open-choice versus forced-choice responding. Forced-choice responding resulted in an estimated 18 % greater frequency of the McGurk effect but similar levels of interindividual variability. Our results highlight the importance of examining individual differences in McGurk perception instead of relying on summary statistics averaged across a population. However, individual variability in the McGurk effect does not preclude its use as a stable measure of audiovisual integration.
Collapse
|
27
|
Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions. J Neurosci 2016; 35:14195-204. [PMID: 26490860 DOI: 10.1523/jneurosci.1829-15.2015] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
UNLABELLED Congruent audiovisual speech enhances our ability to comprehend a speaker, even in noise-free conditions. When incongruent auditory and visual information is presented concurrently, it can hinder a listener's perception and even cause him or her to perceive information that was not presented in either modality. Efforts to investigate the neural basis of these effects have often focused on the special case of discrete audiovisual syllables that are spatially and temporally congruent, with less work done on the case of natural, continuous speech. Recent electrophysiological studies have demonstrated that cortical response measures to continuous auditory speech can be easily obtained using multivariate analysis methods. Here, we apply such methods to the case of audiovisual speech and, importantly, present a novel framework for indexing multisensory integration in the context of continuous speech. Specifically, we examine how the temporal and contextual congruency of ongoing audiovisual speech affects the cortical encoding of the speech envelope in humans using electroencephalography. We demonstrate that the cortical representation of the speech envelope is enhanced by the presentation of congruent audiovisual speech in noise-free conditions. Furthermore, we show that this is likely attributable to the contribution of neural generators that are not particularly active during unimodal stimulation and that it is most prominent at the temporal scale corresponding to syllabic rate (2-6 Hz). Finally, our data suggest that neural entrainment to the speech envelope is inhibited when the auditory and visual streams are incongruent both temporally and contextually. SIGNIFICANCE STATEMENT Seeing a speaker's face as he or she talks can greatly help in understanding what the speaker is saying. This is because the speaker's facial movements relay information about what the speaker is saying, but also, importantly, when the speaker is saying it. Studying how the brain uses this timing relationship to combine information from continuous auditory and visual speech has traditionally been methodologically difficult. Here we introduce a new approach for doing this using relatively inexpensive and noninvasive scalp recordings. Specifically, we show that the brain's representation of auditory speech is enhanced when the accompanying visual speech signal shares the same timing. Furthermore, we show that this enhancement is most pronounced at a time scale that corresponds to mean syllable length.
Collapse
|
28
|
Files BT, Tjan BS, Jiang J, Bernstein LE. Visual speech discrimination and identification of natural and synthetic consonant stimuli. Front Psychol 2015; 6:878. [PMID: 26217249 PMCID: PMC4499841 DOI: 10.3389/fpsyg.2015.00878] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 06/15/2015] [Indexed: 11/25/2022] Open
Abstract
From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d') and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d') increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d' with inverted stimuli but a persistent pattern of larger d' for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis.
Collapse
Affiliation(s)
- Benjamin T. Files
- U.S. Army Research Laboratory, Human Research and Engineering Directorate, Aberdeen Proving GroundMD, USA
| | - Bosco S. Tjan
- Department of Psychology, University of Southern California, Los AngelesCA, USA
| | | | - Lynne E. Bernstein
- Department of Speech and Hearing Science, George Washington University, WashingtonDC, USA
| |
Collapse
|
29
|
Magnotti JF, Basu Mallick D, Feng G, Zhou B, Zhou W, Beauchamp MS. Similar frequency of the McGurk effect in large samples of native Mandarin Chinese and American English speakers. Exp Brain Res 2015; 233:2581-6. [PMID: 26041554 DOI: 10.1007/s00221-015-4324-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 05/13/2015] [Indexed: 11/28/2022]
Abstract
Humans combine visual information from mouth movements with auditory information from the voice to recognize speech. A common method for assessing multisensory speech perception is the McGurk effect: When presented with particular pairings of incongruent auditory and visual speech syllables (e.g., the auditory speech sounds for "ba" dubbed onto the visual mouth movements for "ga"), individuals perceive a third syllable, distinct from the auditory and visual components. Chinese and American cultures differ in the prevalence of direct facial gaze and in the auditory structure of their languages, raising the possibility of cultural- and language-related group differences in the McGurk effect. There is no consensus in the literature about the existence of these group differences, with some studies reporting less McGurk effect in native Mandarin Chinese speakers than in English speakers and others reporting no difference. However, these studies sampled small numbers of participants tested with a small number of stimuli. Therefore, we collected data on the McGurk effect from large samples of Mandarin-speaking individuals from China and English-speaking individuals from the USA (total n = 307) viewing nine different stimuli. Averaged across participants and stimuli, we found similar frequencies of the McGurk effect between Chinese and American participants (48 vs. 44 %). In both groups, we observed a large range of frequencies both across participants (range from 0 to 100 %) and stimuli (15 to 83 %) with the main effect of culture and language accounting for only 0.3 % of the variance in the data. High individual variability in perception of the McGurk effect necessitates the use of large sample sizes to accurately estimate group differences.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Baylor College of Medicine, 1 Baylor Plaza, Suite 104, Houston, TX, USA,
| | | | | | | | | | | |
Collapse
|
30
|
Eberhardt SP, Auer ET, Bernstein LE. Multisensory training can promote or impede visual perceptual learning of speech stimuli: visual-tactile vs. visual-auditory training. Front Hum Neurosci 2014; 8:829. [PMID: 25400566 PMCID: PMC4215828 DOI: 10.3389/fnhum.2014.00829] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Accepted: 09/29/2014] [Indexed: 12/04/2022] Open
Abstract
In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee’s primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee’s lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).
Collapse
Affiliation(s)
- Silvio P Eberhardt
- Communication Neuroscience Laboratory, Department of Speech and Hearing Sciences, George Washington University Washington, DC, USA
| | - Edward T Auer
- Communication Neuroscience Laboratory, Department of Speech and Hearing Sciences, George Washington University Washington, DC, USA
| | - Lynne E Bernstein
- Communication Neuroscience Laboratory, Department of Speech and Hearing Sciences, George Washington University Washington, DC, USA
| |
Collapse
|
31
|
Bernstein LE, Eberhardt SP, Auer ET. Audiovisual spoken word training can promote or impede auditory-only perceptual learning: prelingually deafened adults with late-acquired cochlear implants versus normal hearing adults. Front Psychol 2014; 5:934. [PMID: 25206344 PMCID: PMC4144091 DOI: 10.3389/fpsyg.2014.00934] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 08/05/2014] [Indexed: 12/02/2022] Open
Abstract
Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training.
Collapse
Affiliation(s)
- Lynne E. Bernstein
- Communication Neuroscience Laboratory, Department of Speech and Hearing Science, George Washington UniversityWashington, DC, USA
| | | | | |
Collapse
|
32
|
Affiliation(s)
- Kaisa Tiippana
- Division of Cognitive Psychology and Neuropsychology, Institute of Behavioural Sciences, University of Helsinki Helsinki, Finland
| |
Collapse
|
33
|
Tjan BS, Chao E, Bernstein LE. A visual or tactile signal makes auditory speech detection more efficient by reducing uncertainty. Eur J Neurosci 2014; 39:1323-31. [PMID: 24400652 PMCID: PMC3997613 DOI: 10.1111/ejn.12471] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Revised: 12/01/2013] [Accepted: 12/02/2013] [Indexed: 11/28/2022]
Abstract
Acoustic speech is easier to detect in noise when the talker can be seen. This finding could be explained by integration of multisensory inputs or refinement of auditory processing from visual guidance. In two experiments, we studied two-interval forced-choice detection of an auditory 'ba' in acoustic noise, paired with various visual and tactile stimuli that were identically presented in the two observation intervals. Detection thresholds were reduced under the multisensory conditions vs. the auditory-only condition, even though the visual and/or tactile stimuli alone could not inform the correct response. Results were analysed relative to an ideal observer for which intrinsic (internal) noise and efficiency were independent contributors to detection sensitivity. Across experiments, intrinsic noise was unaffected by the multisensory stimuli, arguing against the merging (integrating) of multisensory inputs into a unitary speech signal, but sampling efficiency was increased to varying degrees, supporting refinement of knowledge about the auditory stimulus. The steepness of the psychometric functions decreased with increasing sampling efficiency, suggesting that the 'task-irrelevant' visual and tactile stimuli reduced uncertainty about the acoustic signal. Visible speech was not superior for enhancing auditory speech detection. Our results reject multisensory neuronal integration and speech-specific neural processing as explanations for the enhanced auditory speech detection under noisy conditions. Instead, they support a more rudimentary form of multisensory interaction: the otherwise task-irrelevant sensory systems inform the auditory system about when to listen.
Collapse
Affiliation(s)
- Bosco S Tjan
- Department of Psychology, Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, 90089, USA
| | | | | |
Collapse
|
34
|
Smith E, Duede S, Hanrahan S, Davis T, House P, Greger B. Seeing is believing: neural representations of visual stimuli in human auditory cortex correlate with illusory auditory perceptions. PLoS One 2013; 8:e73148. [PMID: 24023823 PMCID: PMC3762867 DOI: 10.1371/journal.pone.0073148] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 07/19/2013] [Indexed: 11/18/2022] Open
Abstract
In interpersonal communication, the listener can often see as well as hear the speaker. Visual stimuli can subtly change a listener's auditory perception, as in the McGurk illusion, in which perception of a phoneme's auditory identity is changed by a concurrent video of a mouth articulating a different phoneme. Studies have yet to link visual influences on the neural representation of language with subjective language perception. Here we show that vision influences the electrophysiological representation of phonemes in human auditory cortex prior to the presentation of the auditory stimulus. We used the McGurk effect to dissociate the subjective perception of phonemes from the auditory stimuli. With this paradigm we demonstrate that neural representations in auditory cortex are more closely correlated with the visual stimuli of mouth articulation, which drive the illusory subjective auditory perception, than the actual auditory stimuli. Additionally, information about visual and auditory stimuli transfer in the caudal-rostral direction along the superior temporal gyrus during phoneme perception as would be expected of visual information flowing from the occipital cortex into the ventral auditory processing stream. These results show that visual stimuli influence the neural representation in auditory cortex early in sensory processing and may override the subjective auditory perceptions normally generated by auditory stimuli. These findings depict a marked influence of vision on the neural processing of audition in tertiary auditory cortex and suggest a mechanistic underpinning for the McGurk effect.
Collapse
Affiliation(s)
- Elliot Smith
- Interdepartmental Program in Neuroscience, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Scott Duede
- Department of Linguistics, University of Utah, Salt Lake City, Utah, United States of America
| | - Sara Hanrahan
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Tyler Davis
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- Department of Neurosurgery, University of Utah, Salt Lake City, Utah, United States of America
| | - Paul House
- Department of Neurosurgery, University of Utah, Salt Lake City, Utah, United States of America
| | - Bradley Greger
- Interdepartmental Program in Neuroscience, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
35
|
Setti A, Burke KE, Kenny R, Newell FN. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes. Front Psychol 2013; 4:575. [PMID: 24027544 PMCID: PMC3760087 DOI: 10.3389/fpsyg.2013.00575] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 08/11/2013] [Indexed: 12/02/2022] Open
Abstract
Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the “McGurk illusion,” in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.
Collapse
Affiliation(s)
- Annalisa Setti
- Institute of Neuroscience, Trinity College Dublin Dublin, Ireland ; TRIL Centre, Trinity College Dublin Dublin, Ireland
| | | | | | | |
Collapse
|
36
|
Bernstein LE, Auer ET, Eberhardt SP, Jiang J. Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training. Front Neurosci 2013; 7:34. [PMID: 23515520 PMCID: PMC3600826 DOI: 10.3389/fnins.2013.00034] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 02/28/2013] [Indexed: 11/13/2022] Open
Abstract
Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called “reverse hierarchy theory” of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Communication Neuroscience Laboratory, Department of Speech and Hearing Science, George Washington University Washington, DC, USA
| | | | | | | |
Collapse
|