Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jiang J, Bernstein LE. Psychophysics of the McGurk and other audiovisual speech integration effects. J Exp Psychol Hum Percept Perform 2011;37:1193-209. [PMID: 21574741 DOI: 10.1037/a0023100] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

For:	Jiang J, Bernstein LE. Psychophysics of the McGurk and other audiovisual speech integration effects. J Exp Psychol Hum Percept Perform 2011;37:1193-209. [PMID: 21574741 DOI: 10.1037/a0023100] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Number

Cited by Other Article(s)

Magnotti JF, Lado A, Beauchamp MS. The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers. Front Neurosci 2024;18:1421713. [PMID: 38988770 PMCID: PMC11233445 DOI: 10.3389/fnins.2024.1421713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/28/2024] [Indexed: 07/12/2024] Open

Abstract

In the McGurk effect, visual speech from the face of the talker alters the perception of auditory speech. The diversity of human languages has prompted many intercultural studies of the effect in both Western and non-Western cultures, including native Japanese speakers. Studies of large samples of native English speakers have shown that the McGurk effect is characterized by high variability in the susceptibility of different individuals to the illusion and in the strength of different experimental stimuli to induce the illusion. The noisy encoding of disparity (NED) model of the McGurk effect uses principles from Bayesian causal inference to account for this variability, separately estimating the susceptibility and sensory noise for each individual and the strength of each stimulus. To determine whether variation in McGurk perception is similar between Western and non-Western cultures, we applied the NED model to data collected from 80 native Japanese-speaking participants. Fifteen different McGurk stimuli that varied in syllable content (unvoiced auditory "pa" + visual "ka" or voiced auditory "ba" + visual "ga") were presented interleaved with audiovisual congruent stimuli. The McGurk effect was highly variable across stimuli and participants, with the percentage of illusory fusion responses ranging from 3 to 78% across stimuli and from 0 to 91% across participants. Despite this variability, the NED model accurately predicted perception, predicting fusion rates for individual stimuli with 2.1% error and for individual participants with 2.4% error. Stimuli containing the unvoiced pa/ka pairing evoked more fusion responses than the voiced ba/ga pairing. Model estimates of sensory noise were correlated with participant age, with greater sensory noise in older participants. The NED model of the McGurk effect offers a principled way to account for individual and stimulus differences when examining the McGurk effect in different cultures.

Collapse

Tiippana K, Ujiie Y, Peromaa T, Takahashi K. Investigation of Cross-Language and Stimulus-Dependent Effects on the McGurk Effect with Finnish and Japanese Speakers and Listeners. Brain Sci 2023;13:1198. [PMID: 37626554 PMCID: PMC10452414 DOI: 10.3390/brainsci13081198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023] Open

Feldman JI, Tu A, Conrad JG, Kuang W, Santapuram P, Woynaroski TG. The Impact of Singing on Visual and Multisensory Speech Perception in Children on the Autism Spectrum. Multisens Res 2022;36:57-74. [PMID: 36731528 PMCID: PMC9924934 DOI: 10.1163/22134808-bja10087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 11/22/2022] [Indexed: 12/31/2022]

Diaz MT, Yalcinbas E. The neural bases of multimodal sensory integration in older adults. INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT 2021;45:409-417. [PMID: 34650316 DOI: 10.1177/0165025420979362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Rethinking the McGurk effect as a perceptual illusion. Atten Percept Psychophys 2021;83:2583-2598. [PMID: 33884572 DOI: 10.3758/s13414-021-02265-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/25/2021] [Indexed: 11/08/2022]

Thézé R, Giraud AL, Mégevand P. The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech. SCIENCE ADVANCES 2020;6:6/45/eabc6348. [PMID: 33148648 PMCID: PMC7673697 DOI: 10.1126/sciadv.abc6348] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 09/17/2020] [Indexed: 06/11/2023]

Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020;133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]

Abstract

The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.

Collapse

Thézé R, Gadiri MA, Albert L, Provost A, Giraud AL, Mégevand P. Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments. Sci Rep 2020;10:15540. [PMID: 32968127 PMCID: PMC7511320 DOI: 10.1038/s41598-020-72375-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 08/31/2020] [Indexed: 11/09/2022] Open

Feng G, Zhou B, Zhou W, Beauchamp MS, Magnotti JF. A Laboratory Study of the McGurk Effect in 324 Monozygotic and Dizygotic Twins. Front Neurosci 2019;13:1029. [PMID: 31636529 PMCID: PMC6787151 DOI: 10.3389/fnins.2019.01029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 09/10/2019] [Indexed: 11/13/2022] Open

Lalonde K, Werner LA. Perception of incongruent audiovisual English consonants. PLoS One 2019;14:e0213588. [PMID: 30897109 PMCID: PMC6428273 DOI: 10.1371/journal.pone.0213588] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 02/25/2019] [Indexed: 11/21/2022] Open

Abstract

Causal inference—the process of deciding whether two incoming signals come from the same source—is an important step in audiovisual (AV) speech perception. This research explored causal inference and perception of incongruent AV English consonants. Nine adults were presented auditory, visual, congruent AV, and incongruent AV consonant-vowel syllables. Incongruent AV stimuli included auditory and visual syllables with matched vowels, but mismatched consonants. Open-set responses were collected. For most incongruent syllables, participants were aware of the mismatch between auditory and visual signals (59.04%) or reported the auditory syllable (33.73%). Otherwise, participants reported the visual syllable (1.13%) or some other syllable (6.11%). Statistical analyses were used to assess whether visual distinctiveness and place, voice, and manner features predicted responses. Mismatch responses occurred more when the auditory and visual consonants were visually distinct, when place and manner differed across auditory and visual consonants, and for consonants with high visual accuracy. Auditory responses occurred more when the auditory and visual consonants were visually similar, when place and manner were the same across auditory and visual stimuli, and with consonants produced further back in the mouth. Visual responses occurred more when voicing and manner were the same across auditory and visual stimuli, and for front and middle consonants. Other responses were variable, but typically matched the visual place, auditory voice, and auditory manner of the input. Overall, results indicate that causal inference and incongruent AV consonant perception depend on salience and reliability of auditory and visual inputs and degree of redundancy between auditory and visual inputs. A parameter-free computational model of incongruent AV speech perception based on unimodal confusions, with a causal inference rule, was applied. Data from the current study present an opportunity to test and improve the generalizability of current AV speech integration models.

Collapse

Devaraju DS, U AK, Maruthy S. Comparison of McGurk Effect across Three Consonant-Vowel Combinations in Kannada. J Audiol Otol 2019;23:39-43. [PMID: 30518196 PMCID: PMC6348306 DOI: 10.7874/jao.2018.00234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 06/16/2018] [Accepted: 07/16/2018] [Indexed: 11/22/2022] Open

Magnotti JF, Smith KB, Salinas M, Mays J, Zhu LL, Beauchamp MS. A causal inference explanation for enhancement of multisensory integration by co-articulation. Sci Rep 2018;8:18032. [PMID: 30575791 PMCID: PMC6303389 DOI: 10.1038/s41598-018-36772-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 11/22/2018] [Indexed: 11/09/2022] Open

Proverbio AM, Raso G, Zani A. Electrophysiological Indexes of Incongruent Audiovisual Phonemic Processing: Unraveling the McGurk Effect. Neuroscience 2018;385:215-226. [PMID: 29932985 DOI: 10.1016/j.neuroscience.2018.06.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 06/11/2018] [Accepted: 06/12/2018] [Indexed: 11/15/2022]

Bernstein LE. Response Errors in Females' and Males' Sentence Lipreading Necessitate Structurally Different Models for Predicting Lipreading Accuracy. LANGUAGE LEARNING 2018;68:127-158. [PMID: 31485084 PMCID: PMC6724546 DOI: 10.1111/lang.12281] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Brooks CJ, Chan YM, Anderson AJ, McKendrick AM. Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss. Front Hum Neurosci 2018;12:192. [PMID: 29867415 PMCID: PMC5954093 DOI: 10.3389/fnhum.2018.00192] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 04/20/2018] [Indexed: 11/26/2022] Open

Neural networks supporting audiovisual integration for speech: A large-scale lesion study. Cortex 2018;103:360-371. [PMID: 29705718 DOI: 10.1016/j.cortex.2018.03.030] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/05/2017] [Accepted: 03/23/2018] [Indexed: 10/17/2022]

McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The Oldenburg Audio Visual Speech Stimuli (OLAVS). Psychon Bull Rev 2018;24:863-872. [PMID: 27562763 DOI: 10.3758/s13423-016-1148-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Morís Fernández L, Torralba M, Soto-Faraco S. Theta oscillations reflect conflict processing in the perception of the McGurk illusion. Eur J Neurosci 2018;48:2630-2641. [DOI: 10.1111/ejn.13804] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 12/12/2017] [Accepted: 12/12/2017] [Indexed: 11/27/2022]

Magnotti JF, Basu Mallick D, Beauchamp MS. Reducing Playback Rate of Audiovisual Speech Leads to a Surprising Decrease in the McGurk Effect. Multisens Res 2018;31:19-38. [DOI: 10.1163/22134808-00002586] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 06/03/2017] [Indexed: 11/19/2022]

Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018;31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]

Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017;38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open

A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech. PLoS Comput Biol 2017;13:e1005229. [PMID: 28207734 PMCID: PMC5312805 DOI: 10.1371/journal.pcbi.1005229] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 11/01/2016] [Indexed: 11/19/2022] Open

Abstract

Audiovisual speech integration combines information from auditory speech (talker’s voice) and visual speech (talker’s mouth movements) to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory “ba” + visual “ga” (AbaVga), that are integrated to produce a fused percept (“da”). This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba). We describe a simplified model of causal inference in multisensory speech perception (CIMS) that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.

During face-to-face conversations, we seamlessly integrate information from the talker’s voice with information from the talker’s face. This multisensory integration increases speech perception accuracy and can be critical for understanding speech in noisy environments with many people talking simultaneously. A major challenge for models of multisensory speech perception is thus deciding which voices and faces should be integrated. Our solution to this problem is based on the idea of causal inference—given a particular pair of auditory and visual syllables, the brain calculates the likelihood they are from a single vs. multiple talkers and uses this likelihood to determine the final speech percept. We compared our model with an alternative model that is identical, except that it always integrated the available cues. Using behavioral speech perception data from a large number of subjects, the model with causal inference better predicted how humans would (or would not) integrate audiovisual speech syllables. Our results suggest a fundamental role for a causal inference type calculation in multisensory speech perception.

Collapse

Wilson AH, Alsius A, Paré M, Munhall KG. Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016;59:601-15. [PMID: 27537379 PMCID: PMC5280058 DOI: 10.1044/2016_jslhr-s-15-0092] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 09/16/2015] [Accepted: 10/07/2015] [Indexed: 06/06/2023]

Skilled musicians are not subject to the McGurk effect. Sci Rep 2016;6:30423. [PMID: 27453363 PMCID: PMC4958963 DOI: 10.1038/srep30423] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 07/05/2016] [Indexed: 11/25/2022] Open

The noisy encoding of disparity model of the McGurk effect. Psychon Bull Rev 2016;22:701-9. [PMID: 25245268 DOI: 10.3758/s13423-014-0722-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Variability and stability in the McGurk effect: contributions of participants, stimuli, time, and response type. Psychon Bull Rev 2016;22:1299-307. [PMID: 25802068 DOI: 10.3758/s13423-015-0817-4] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Abstract

In the McGurk effect, pairing incongruent auditory and visual syllables produces a percept different from the component syllables. Although it is a popular assay of audiovisual speech integration, little is known about the distribution of responses to the McGurk effect in the population. In our first experiment, we measured McGurk perception using 12 different McGurk stimuli in a sample of 165 English-speaking adults, 40 of whom were retested following a one-year interval. We observed dramatic differences both in how frequently different individuals perceived the illusion (from 0 % to 100 %) and in how frequently the illusion was perceived across different stimuli (17 % to 58 %). For individual stimuli, the distributions of response frequencies deviated strongly from normality, with 77 % of participants almost never or almost always perceiving the effect (≤10 % or ≥90 %). This deviation suggests that the mean response frequency, the most commonly reported measure of the McGurk effect, is a poor measure of individual participants' responses, and that the assumptions made by parametric statistical tests are invalid. Despite the substantial variability across individuals and stimuli, there was little change in the frequency of the effect between initial testing and a one-year retest (mean change in frequency = 2 %; test-retest correlation, r = 0.91). In a second experiment, we replicated our findings of high variability using eight new McGurk stimuli and tested the effects of open-choice versus forced-choice responding. Forced-choice responding resulted in an estimated 18 % greater frequency of the McGurk effect but similar levels of interindividual variability. Our results highlight the importance of examining individual differences in McGurk perception instead of relying on summary statistics averaged across a population. However, individual variability in the McGurk effect does not preclude its use as a stable measure of audiovisual integration.

Collapse

Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions. J Neurosci 2016;35:14195-204. [PMID: 26490860 DOI: 10.1523/jneurosci.1829-15.2015] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Abstract

UNLABELLED

Congruent audiovisual speech enhances our ability to comprehend a speaker, even in noise-free conditions. When incongruent auditory and visual information is presented concurrently, it can hinder a listener's perception and even cause him or her to perceive information that was not presented in either modality. Efforts to investigate the neural basis of these effects have often focused on the special case of discrete audiovisual syllables that are spatially and temporally congruent, with less work done on the case of natural, continuous speech. Recent electrophysiological studies have demonstrated that cortical response measures to continuous auditory speech can be easily obtained using multivariate analysis methods. Here, we apply such methods to the case of audiovisual speech and, importantly, present a novel framework for indexing multisensory integration in the context of continuous speech. Specifically, we examine how the temporal and contextual congruency of ongoing audiovisual speech affects the cortical encoding of the speech envelope in humans using electroencephalography. We demonstrate that the cortical representation of the speech envelope is enhanced by the presentation of congruent audiovisual speech in noise-free conditions. Furthermore, we show that this is likely attributable to the contribution of neural generators that are not particularly active during unimodal stimulation and that it is most prominent at the temporal scale corresponding to syllabic rate (2-6 Hz). Finally, our data suggest that neural entrainment to the speech envelope is inhibited when the auditory and visual streams are incongruent both temporally and contextually.

SIGNIFICANCE STATEMENT

Seeing a speaker's face as he or she talks can greatly help in understanding what the speaker is saying. This is because the speaker's facial movements relay information about what the speaker is saying, but also, importantly, when the speaker is saying it. Studying how the brain uses this timing relationship to combine information from continuous auditory and visual speech has traditionally been methodologically difficult. Here we introduce a new approach for doing this using relatively inexpensive and noninvasive scalp recordings. Specifically, we show that the brain's representation of auditory speech is enhanced when the accompanying visual speech signal shares the same timing. Furthermore, we show that this enhancement is most pronounced at a time scale that corresponds to mean syllable length.

Collapse

Files BT, Tjan BS, Jiang J, Bernstein LE. Visual speech discrimination and identification of natural and synthetic consonant stimuli. Front Psychol 2015;6:878. [PMID: 26217249 PMCID: PMC4499841 DOI: 10.3389/fpsyg.2015.00878] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 06/15/2015] [Indexed: 11/25/2022] Open

Abstract

From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d') and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d') increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d' with inverted stimuli but a persistent pattern of larger d' for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis.

Collapse

Magnotti JF, Basu Mallick D, Feng G, Zhou B, Zhou W, Beauchamp MS. Similar frequency of the McGurk effect in large samples of native Mandarin Chinese and American English speakers. Exp Brain Res 2015;233:2581-6. [PMID: 26041554 DOI: 10.1007/s00221-015-4324-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 05/13/2015] [Indexed: 11/28/2022]

Eberhardt SP, Auer ET, Bernstein LE. Multisensory training can promote or impede visual perceptual learning of speech stimuli: visual-tactile vs. visual-auditory training. Front Hum Neurosci 2014;8:829. [PMID: 25400566 PMCID: PMC4215828 DOI: 10.3389/fnhum.2014.00829] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Accepted: 09/29/2014] [Indexed: 12/04/2022] Open

Abstract

In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee’s primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee’s lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).

Collapse

Bernstein LE, Eberhardt SP, Auer ET. Audiovisual spoken word training can promote or impede auditory-only perceptual learning: prelingually deafened adults with late-acquired cochlear implants versus normal hearing adults. Front Psychol 2014;5:934. [PMID: 25206344 PMCID: PMC4144091 DOI: 10.3389/fpsyg.2014.00934] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 08/05/2014] [Indexed: 12/02/2022] Open

Abstract

Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training.

Collapse

Tiippana K. What is the McGurk effect? Front Psychol 2014;5:725. [PMID: 25071686 PMCID: PMC4091305 DOI: 10.3389/fpsyg.2014.00725] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 06/23/2014] [Indexed: 11/16/2022] Open

Tjan BS, Chao E, Bernstein LE. A visual or tactile signal makes auditory speech detection more efficient by reducing uncertainty. Eur J Neurosci 2014;39:1323-31. [PMID: 24400652 PMCID: PMC3997613 DOI: 10.1111/ejn.12471] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Revised: 12/01/2013] [Accepted: 12/02/2013] [Indexed: 11/28/2022]

Smith E, Duede S, Hanrahan S, Davis T, House P, Greger B. Seeing is believing: neural representations of visual stimuli in human auditory cortex correlate with illusory auditory perceptions. PLoS One 2013;8:e73148. [PMID: 24023823 PMCID: PMC3762867 DOI: 10.1371/journal.pone.0073148] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 07/19/2013] [Indexed: 11/18/2022] Open

Setti A, Burke KE, Kenny R, Newell FN. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes. Front Psychol 2013;4:575. [PMID: 24027544 PMCID: PMC3760087 DOI: 10.3389/fpsyg.2013.00575] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 08/11/2013] [Indexed: 12/02/2022] Open

Bernstein LE, Auer ET, Eberhardt SP, Jiang J. Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training. Front Neurosci 2013;7:34. [PMID: 23515520 PMCID: PMC3600826 DOI: 10.3389/fnins.2013.00034] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 02/28/2013] [Indexed: 11/13/2022] Open

Abstract

Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called “reverse hierarchy theory” of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.

Collapse