Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Schwartz JL, Savariaux C. No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag. PLoS Comput Biol 2014;10:e1003743. [PMID: 25079216 PMCID: PMC4117430 DOI: 10.1371/journal.pcbi.1003743] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 06/09/2014] [Indexed: 11/18/2022] Open

For:	Schwartz JL, Savariaux C. No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag. PLoS Comput Biol 2014;10:e1003743. [PMID: 25079216 PMCID: PMC4117430 DOI: 10.1371/journal.pcbi.1003743] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 06/09/2014] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Karthik G, Cao CZ, Demidenko MI, Jahn A, Stacey WC, Wasade VS, Brang D. Auditory cortex encodes lipreading information through spatially distributed activity. Curr Biol 2024;34:4021-4032.e5. [PMID: 39153482 PMCID: PMC11387126 DOI: 10.1016/j.cub.2024.07.073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/29/2024] [Accepted: 07/19/2024] [Indexed: 08/19/2024]

Çetinçelik M, Jordan-Barros A, Rowland CF, Snijders TM. The effect of visual speech cues on neural tracking of speech in 10-month-old infants. Eur J Neurosci 2024. [PMID: 39188179 DOI: 10.1111/ejn.16492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 07/04/2024] [Accepted: 07/20/2024] [Indexed: 08/28/2024]

Sato M. Audiovisual speech asynchrony asymmetrically modulates neural binding. Neuropsychologia 2024;198:108866. [PMID: 38518889 DOI: 10.1016/j.neuropsychologia.2024.108866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/09/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024]

Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024;59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]

Shahin AJ, Gonzales MG, Dimitrijevic A. Cross-Modal Tinnitus Remediation: A Tentative Theoretical Framework. Brain Sci 2024;14:95. [PMID: 38275515 PMCID: PMC10813772 DOI: 10.3390/brainsci14010095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 01/27/2024] Open

Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023;35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]

Abstract

In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.

Collapse

Jiang Z, An X, Liu S, Yin E, Yan Y, Ming D. Neural oscillations reflect the individual differences in the temporal perception of audiovisual speech. Cereb Cortex 2023;33:10575-10583. [PMID: 37727958 DOI: 10.1093/cercor/bhad304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 09/21/2023] Open

Chalas N, Omigie D, Poeppel D, van Wassenhove V. Hierarchically nested networks optimize the analysis of audiovisual speech. iScience 2023;26:106257. [PMID: 36909667 PMCID: PMC9993032 DOI: 10.1016/j.isci.2023.106257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 12/22/2022] [Accepted: 02/17/2023] [Indexed: 02/22/2023] Open

Jiang Z, An X, Liu S, Wang L, Yin E, Yan Y, Ming D. The effect of prestimulus low-frequency neural oscillations on the temporal perception of audiovisual speech. Front Neurosci 2023;17:1067632. [PMID: 36816126 PMCID: PMC9935937 DOI: 10.3389/fnins.2023.1067632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/17/2023] [Indexed: 02/05/2023] Open

Abstract

Objective

Perceptual integration and segregation are modulated by the phase of ongoing neural oscillation whose frequency period is broader than the size of the temporal binding window (TBW). Studies have shown that the abstract beep-flash stimuli with about 100 ms TBW were modulated by the alpha band phase. Therefore, we hypothesize that the temporal perception of speech with about hundreds of milliseconds of TBW might be affected by the delta-theta phase.

Methods

Thus, we conducted a speech-stimuli-based audiovisual simultaneity judgment (SJ) experiment. Twenty human participants (12 females) attended this study, recording 62 channels of EEG.

Results

Behavioral results showed that the visual leading TBWs are broader than the auditory leading ones [273.37 ± 24.24 ms vs. 198.05 ± 19.28 ms, (mean ± sem)]. We used Phase Opposition Sum (POS) to quantify the differences in mean phase angles and phase concentrations between synchronous and asynchronous responses. The POS results indicated that the delta-theta phase was significantly different between synchronous and asynchronous responses in the A50V condition (50% synchronous responses in auditory leading SOA). However, in the V50A condition (50% synchronous responses in visual leading SOA), we only found the delta band effect. In the two conditions, we did not find a consistency of phases over subjects for both perceptual responses by the post hoc Rayleigh test (all ps > 0.05). The Rayleigh test results suggested that the phase might not reflect the neuronal excitability which assumed that the phases within a perceptual response across subjects concentrated on the same angle but were not uniformly distributed. But V-test showed the phase difference between synchronous and asynchronous responses across subjects had a significant phase opposition (all ps < 0.05) which is compatible with the POS result.

Conclusion

These results indicate that the speech temporal perception depends on the alignment of stimulus onset with an optimal phase of the neural oscillation whose frequency period might be broader than the size of TBW. The role of the oscillatory phase might be encoding the temporal information which varies across subjects rather than neuronal excitability. Given the enriched temporal structures of spoken language stimuli, the conclusion that phase encodes temporal information is plausible and valuable for future research.

Collapse

Zamuner TS, Rabideau T, McDonald M, Yeung HH. Developmental change in children's speech processing of auditory and visual cues: An eyetracking study. JOURNAL OF CHILD LANGUAGE 2023;50:27-51. [PMID: 36503546 DOI: 10.1017/s0305000921000684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]

Sato M. The timing of visual speech modulates auditory neural processing. BRAIN AND LANGUAGE 2022;235:105196. [PMID: 36343508 DOI: 10.1016/j.bandl.2022.105196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 10/15/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]

Begau A, Arnau S, Klatt LI, Wascher E, Getzmann S. Using visual speech at the cocktail-party: CNV evidence for early speech extraction in younger and older adults. Hear Res 2022;426:108636. [DOI: 10.1016/j.heares.2022.108636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/26/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022]

Begau A, Klatt LI, Schneider D, Wascher E, Getzmann S. The role of informational content of visual speech in an audiovisual cocktail party: Evidence from cortical oscillations in young and old participants. Eur J Neurosci 2022;56:5215-5234. [PMID: 36017762 DOI: 10.1111/ejn.15811] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/01/2022] [Accepted: 08/20/2022] [Indexed: 12/14/2022]

Abstract

Age-related differences in the processing of audiovisual speech in a multi-talker environment were investigated analysing event-related spectral perturbations (ERSPs), focusing on theta, alpha and beta oscillations that are assumed to reflect conflict processing, multisensory integration and attentional mechanisms, respectively. Eighteen older and 21 younger healthy adults completed a two-alternative forced-choice word discrimination task, responding to audiovisual speech stimuli. In a cocktail-party scenario with two competing talkers (located at -15° and 15° azimuth), target words (/yes/or/no/) appeared at a pre-defined (attended) position, distractor words at the other position. In two audiovisual conditions, acoustic speech was combined either with informative or uninformative visual speech. While a behavioural benefit for informative visual speech occurred for both age groups, differences between audiovisual conditions in the theta and beta band were only present for older adults. A stronger increase in theta perturbations for stimuli containing uninformative visual speech could be associated with early conflict processing, while a stronger suppression in beta perturbations for informative visual speech could be associated to audiovisual integration. Compared to the younger group, the older group showed generally stronger beta perturbations. No condition differences in the alpha band were found. Overall, the findings suggest age-related differences in audiovisual speech integration in a multi-talker environment. While the behavioural benefit of informative visual speech was unaffected by age, older adults had a stronger need for cognitive control when processing conflicting audiovisual speech input. Furthermore, mechanisms of audiovisual integration are differently activated depending on the informational content of the visual information.

Collapse

Modulation transfer functions for audiovisual speech. PLoS Comput Biol 2022;18:e1010273. [PMID: 35852989 PMCID: PMC9295967 DOI: 10.1371/journal.pcbi.1010273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 06/01/2022] [Indexed: 11/19/2022] Open

Abstract Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope information is synchronized with motion in different parts of the face are less clear. Here, we used regularized canonical correlation analysis (rCCA) to learn speech envelope filters whose outputs correlate with motion in different parts of the speakers face. We leveraged recent advances in video-based 3D facial landmark estimation allowing us to examine statistical envelope-face correlations across a large number of speakers (∼4000). Specifically, rCCA was used to learn modulation transfer functions (MTFs) for the speech envelope that significantly predict correlation with facial motion across different speakers. The AV analysis revealed bandpass speech envelope filters at distinct temporal scales. A first set of MTFs showed peaks around 3-4 Hz and were correlated with mouth movements. A second set of MTFs captured envelope fluctuations in the 1-2 Hz range correlated with more global face and head motion. These two distinctive timescales emerged only as a property of natural AV speech statistics across many speakers. A similar analysis of fewer speakers performing a controlled speech task highlighted only the well-known temporal modulations around 4 Hz correlated with orofacial motion. The different bandpass ranges of AV correlation align notably with the average rates at which syllables (3-4 Hz) and phrases (1-2 Hz) are produced in natural speech. Whereas periodicities at the syllable rate are evident in the envelope spectrum of the speech signal itself, slower 1-2 Hz regularities thus only become prominent when considering crossmodal signal statistics. This may indicate a motor origin of temporal regularities at the timescales of syllables and phrases in natural speech. Collapse

Gonzales MG, Backer KC, Yan Y, Miller LM, Bortfeld H, Shahin AJ. Audition controls the flow of visual time during multisensory perception. iScience 2022;25:104671. [PMID: 35845168 PMCID: PMC9283509 DOI: 10.1016/j.isci.2022.104671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/06/2022] [Accepted: 06/21/2022] [Indexed: 12/02/2022] Open

Chalas N, Karagiorgis A, Bamidis P, Paraskevopoulos E. The impact of musical training in symbolic and non-symbolic audiovisual judgements of magnitude. PLoS One 2022;17:e0266165. [PMID: 35511806 PMCID: PMC9070945 DOI: 10.1371/journal.pone.0266165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 03/15/2022] [Indexed: 11/30/2022] Open

Sato M. Motor and visual influences on auditory neural processing during speaking and listening. Cortex 2022;152:21-35. [DOI: 10.1016/j.cortex.2022.03.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 02/02/2022] [Accepted: 03/15/2022] [Indexed: 11/03/2022]

Gordon-Salant S, Schwartz MS, Oppler KA, Yeni-Komshian GH. Detection and Recognition of Asynchronous Auditory/Visual Speech: Effects of Age, Hearing Loss, and Talker Accent. Front Psychol 2022;12:772867. [PMID: 35153900 PMCID: PMC8832148 DOI: 10.3389/fpsyg.2021.772867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 12/21/2021] [Indexed: 11/13/2022] Open

Abstract

This investigation examined age-related differences in auditory-visual (AV) integration as reflected on perceptual judgments of temporally misaligned AV English sentences spoken by native English and native Spanish talkers. In the detection task, it was expected that slowed auditory temporal processing of older participants, relative to younger participants, would be manifest as a shift in the range over which participants would judge asynchronous stimuli as synchronous (referred to as the "AV simultaneity window"). The older participants were also expected to exhibit greater declines in speech recognition for asynchronous AV stimuli than younger participants. Talker accent was hypothesized to influence listener performance, with older listeners exhibiting a greater narrowing of the AV simultaneity window and much poorer recognition of asynchronous AV foreign-accented speech compared to younger listeners. Participant groups included younger and older participants with normal hearing and older participants with hearing loss. Stimuli were video recordings of sentences produced by native English and native Spanish talkers. The video recordings were altered in 50 ms steps by delaying either the audio or video onset. Participants performed a detection task in which they judged whether the sentences were synchronous or asynchronous, and performed a recognition task for multiple synchronous and asynchronous conditions. Both the detection and recognition tasks were conducted at the individualized signal-to-noise ratio (SNR) corresponding to approximately 70% correct speech recognition performance for synchronous AV sentences. Older listeners with and without hearing loss generally showed wider AV simultaneity windows than younger listeners, possibly reflecting slowed auditory temporal processing in auditory lead conditions and reduced sensitivity to asynchrony in auditory lag conditions. However, older and younger listeners were affected similarly by misalignment of auditory and visual signal onsets on the speech recognition task. This suggests that older listeners are negatively impacted by temporal misalignments for speech recognition, even when they do not notice that the stimuli are asynchronous. Overall, the findings show that when listener performance is equated for simultaneous AV speech signals, age effects are apparent in detection judgments but not in recognition of asynchronous speech.

Collapse

Heins N, Pomp J, Kluger DS, Vinbrüx S, Trempler I, Kohler A, Kornysheva K, Zentgraf K, Raab M, Schubotz RI. Surmising synchrony of sound and sight: Factors explaining variance of audiovisual integration in hurdling, tap dancing and drumming. PLoS One 2021;16:e0253130. [PMID: 34293800 PMCID: PMC8298114 DOI: 10.1371/journal.pone.0253130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 05/31/2021] [Indexed: 11/18/2022] Open

Abstract

Auditory and visual percepts are integrated even when they are not perfectly temporally aligned with each other, especially when the visual signal precedes the auditory signal. This window of temporal integration for asynchronous audiovisual stimuli is relatively well examined in the case of speech, while other natural action-induced sounds have been widely neglected. Here, we studied the detection of audiovisual asynchrony in three different whole-body actions with natural action-induced sounds–hurdling, tap dancing and drumming. In Study 1, we examined whether audiovisual asynchrony detection, assessed by a simultaneity judgment task, differs as a function of sound production intentionality. Based on previous findings, we expected that auditory and visual signals should be integrated over a wider temporal window for actions creating sounds intentionally (tap dancing), compared to actions creating sounds incidentally (hurdling). While percentages of perceived synchrony differed in the expected way, we identified two further factors, namely high event density and low rhythmicity, to induce higher synchrony ratings as well. Therefore, we systematically varied event density and rhythmicity in Study 2, this time using drumming stimuli to exert full control over these variables, and the same simultaneity judgment tasks. Results suggest that high event density leads to a bias to integrate rather than segregate auditory and visual signals, even at relatively large asynchronies. Rhythmicity had a similar, albeit weaker effect, when event density was low. Our findings demonstrate that shorter asynchronies and visual-first asynchronies lead to higher synchrony ratings of whole-body action, pointing to clear parallels with audiovisual integration in speech perception. Overconfidence in the naturally expected, that is, synchrony of sound and sight, was stronger for intentional (vs. incidental) sound production and for movements with high (vs. low) rhythmicity, presumably because both encourage predictive processes. In contrast, high event density appears to increase synchronicity judgments simply because it makes the detection of audiovisual asynchrony more difficult. More studies using real-life audiovisual stimuli with varying event densities and rhythmicities are needed to fully uncover the general mechanisms of audiovisual integration.

Collapse

O'Sullivan AE, Crosse MJ, Liberto GMD, de Cheveigné A, Lalor EC. Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects. J Neurosci 2021;41:4991-5003. [PMID: 33824190 PMCID: PMC8197638 DOI: 10.1523/jneurosci.0906-20.2021] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 12/27/2022] Open

Abstract

Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.

Collapse

Mégevand P, Mercier MR, Groppe DM, Zion Golumbic E, Mesgarani N, Beauchamp MS, Schroeder CE, Mehta AD. Crossmodal Phase Reset and Evoked Responses Provide Complementary Mechanisms for the Influence of Visual Speech in Auditory Cortex. J Neurosci 2020;40:8530-8542. [PMID: 33023923 PMCID: PMC7605423 DOI: 10.1523/jneurosci.0555-20.2020] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 07/27/2020] [Accepted: 08/31/2020] [Indexed: 12/26/2022] Open

Abstract

Natural conversation is multisensory: when we can see the speaker's face, visual speech cues improve our comprehension. The neuronal mechanisms underlying this phenomenon remain unclear. The two main alternatives are visually mediated phase modulation of neuronal oscillations (excitability fluctuations) in auditory neurons and visual input-evoked responses in auditory neurons. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans of both sexes, we find evidence for both mechanisms. Remarkably, auditory cortical neurons track the temporal dynamics of purely visual speech using the phase of their slow oscillations and phase-related modulations in broadband high-frequency activity. Consistent with known perceptual enhancement effects, the visual phase reset amplifies the cortical representation of concomitant auditory speech. In contrast to this, and in line with earlier reports, visual input reduces the amplitude of evoked responses to concomitant auditory input. We interpret the combination of improved phase tracking and reduced response amplitude as evidence for more efficient and reliable stimulus processing in the presence of congruent auditory and visual speech inputs.SIGNIFICANCE STATEMENT Watching the speaker can facilitate our understanding of what is being said. The mechanisms responsible for this influence of visual cues on the processing of speech remain incompletely understood. We studied these mechanisms by recording the electrical activity of the human brain through electrodes implanted surgically inside the brain. We found that visual inputs can operate by directly activating auditory cortical areas, and also indirectly by modulating the strength of cortical responses to auditory input. Our results help to understand the mechanisms by which the brain merges auditory and visual speech into a unitary perception.

Collapse

Audio-visual combination of syllables involves time-sensitive dynamics following from fusion failure. Sci Rep 2020;10:18009. [PMID: 33093570 PMCID: PMC7583249 DOI: 10.1038/s41598-020-75201-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 10/05/2020] [Indexed: 11/08/2022] Open

Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. J Neurosci 2020;40:6938-6948. [PMID: 32727820 PMCID: PMC7470920 DOI: 10.1523/jneurosci.0279-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/22/2022] Open

Abstract

Experimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech, but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory-only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in which visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.SIGNIFICANCE STATEMENT Understanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.

Collapse

Randazzo M, Priefer R, Smith PJ, Nagler A, Avery T, Froud K. Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm. Brain Sci 2020;10:brainsci10060328. [PMID: 32481538 PMCID: PMC7348766 DOI: 10.3390/brainsci10060328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/15/2020] [Accepted: 05/25/2020] [Indexed: 11/16/2022] Open

Zhou HY, Cheung EFC, Chan RCK. Audiovisual temporal integration: Cognitive processing, neural mechanisms, developmental trajectory and potential interventions. Neuropsychologia 2020;140:107396. [PMID: 32087206 DOI: 10.1016/j.neuropsychologia.2020.107396] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 02/14/2020] [Accepted: 02/15/2020] [Indexed: 12/21/2022]

Hueber T, Tatulli E, Girin L, Schwartz JL. Evaluating the Potential Gain of Auditory and Audiovisual Speech-Predictive Coding Using Deep Learning. Neural Comput 2020;32:596-625. [PMID: 31951798 DOI: 10.1162/neco_a_01264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

The impact of when, what and how predictions on auditory speech perception. Exp Brain Res 2019;237:3143-3153. [DOI: 10.1007/s00221-019-05661-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 09/24/2019] [Indexed: 11/26/2022]

Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019;8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open

Bayard C, Machart L, Strauß A, Gerber S, Aubanel V, Schwartz JL. Cued Speech Enhances Speech-in-Noise Perception. JOURNAL OF DEAF STUDIES AND DEAF EDUCATION 2019;24:223-233. [PMID: 30809665 DOI: 10.1093/deafed/enz003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/28/2019] [Accepted: 01/31/2019] [Indexed: 06/09/2023]

O'Sullivan AE, Lim CY, Lalor EC. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations. Eur J Neurosci 2019;50:3282-3295. [PMID: 31013361 DOI: 10.1111/ejn.14425] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 03/25/2019] [Accepted: 04/17/2019] [Indexed: 11/30/2022]

Simon DM, Wallace MT. Integration and Temporal Processing of Asynchronous Audiovisual Speech. J Cogn Neurosci 2018;30:319-337. [DOI: 10.1162/jocn_a_01205] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Heckmann M. Audio-visual word prominence detection from clean and noisy speech. COMPUT SPEECH LANG 2018. [DOI: 10.1016/j.csl.2017.09.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Van Ackeren MJ, Barbero FM, Mattioni S, Bottini R, Collignon O. Neuronal populations in the occipital cortex of the blind synchronize to the temporal dynamics of speech. eLife 2018;7:e31640. [PMID: 29338838 PMCID: PMC5790372 DOI: 10.7554/elife.31640] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 01/16/2018] [Indexed: 11/13/2022] Open

Treille A, Vilain C, Schwartz JL, Hueber T, Sato M. Electrophysiological evidence for Audio-visuo-lingual speech integration. Neuropsychologia 2018;109:126-133. [DOI: 10.1016/j.neuropsychologia.2017.12.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 11/21/2017] [Accepted: 12/13/2017] [Indexed: 01/25/2023]

Sánchez-García C, Kandel S, Savariaux C, Soto-Faraco S. The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study. Multisens Res 2018;31:57-78. [DOI: 10.1163/22134808-00002560] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 02/20/2017] [Indexed: 11/19/2022]

Riecke L, Formisano E, Sorger B, Başkent D, Gaudrain E. Neural Entrainment to Speech Modulates Speech Intelligibility. Curr Biol 2017;28:161-169.e5. [PMID: 29290557 DOI: 10.1016/j.cub.2017.11.033] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 10/26/2017] [Accepted: 11/15/2017] [Indexed: 01/02/2023]

Cope TE, Sohoglu E, Sedley W, Patterson K, Jones PS, Wiggins J, Dawson C, Grube M, Carlyon RP, Griffiths TD, Davis MH, Rowe JB. Evidence for causal top-down frontal contributions to predictive processes in speech perception. Nat Commun 2017;8:2154. [PMID: 29255275 PMCID: PMC5735133 DOI: 10.1038/s41467-017-01958-7] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 10/27/2017] [Indexed: 11/09/2022] Open

De Niear MA, Gupta PB, Baum SH, Wallace MT. Perceptual training enhances temporal acuity for multisensory speech. Neurobiol Learn Mem 2017;147:9-17. [PMID: 29107704 DOI: 10.1016/j.nlm.2017.10.016] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 10/19/2017] [Accepted: 10/27/2017] [Indexed: 11/30/2022]

Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration. J Neurosci 2017;36:9888-95. [PMID: 27656026 DOI: 10.1523/jneurosci.1396-16.2016] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 08/03/2016] [Indexed: 11/21/2022] Open

Abstract

UNLABELLED

Speech comprehension is improved by viewing a speaker's face, especially in adverse hearing conditions, a principle known as inverse effectiveness. However, the neural mechanisms that help to optimize how we integrate auditory and visual speech in such suboptimal conversational environments are not yet fully understood. Using human EEG recordings, we examined how visual speech enhances the cortical representation of auditory speech at a signal-to-noise ratio that maximized the perceptual benefit conferred by multisensory processing relative to unisensory processing. We found that the influence of visual input on the neural tracking of the audio speech signal was significantly greater in noisy than in quiet listening conditions, consistent with the principle of inverse effectiveness. Although envelope tracking during audio-only speech was greatly reduced by background noise at an early processing stage, it was markedly restored by the addition of visual speech input. In background noise, multisensory integration occurred at much lower frequencies and was shown to predict the multisensory gain in behavioral performance at a time lag of ∼250 ms. Critically, we demonstrated that inverse effectiveness, in the context of natural audiovisual (AV) speech processing, relies on crossmodal integration over long temporal windows. Our findings suggest that disparate integration mechanisms contribute to the efficient processing of AV speech in background noise.

SIGNIFICANCE STATEMENT

The behavioral benefit of seeing a speaker's face during conversation is especially pronounced in challenging listening environments. However, the neural mechanisms underlying this phenomenon, known as inverse effectiveness, have not yet been established. Here, we examine this in the human brain using natural speech-in-noise stimuli that were designed specifically to maximize the behavioral benefit of audiovisual (AV) speech. We find that this benefit arises from our ability to integrate multimodal information over longer periods of time. Our data also suggest that the addition of visual speech restores early tracking of the acoustic speech signal during excessive background noise. These findings support and extend current mechanistic perspectives on AV speech perception.

Collapse

Gordon-Salant S, Yeni-Komshian GH, Fitzgibbons PJ, Willison HM, Freund MS. Recognition of asynchronous auditory-visual speech by younger and older listeners: A preliminary study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017;142:151. [PMID: 28764460 PMCID: PMC5507703 DOI: 10.1121/1.4992026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Revised: 03/20/2017] [Accepted: 06/23/2017] [Indexed: 05/15/2023]

Giordano BL, Ince RAA, Gross J, Schyns PG, Panzeri S, Kayser C. Contributions of local speech encoding and functional connectivity to audio-visual speech perception. eLife 2017;6. [PMID: 28590903 PMCID: PMC5462535 DOI: 10.7554/elife.24763] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/07/2017] [Indexed: 11/13/2022] Open

Abstract

Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments.

DOI:http://dx.doi.org/10.7554/eLife.24763.001

When listening to someone in a noisy environment, such as a cocktail party, we can understand the speaker more easily if we can also see his or her face. Movements of the lips and tongue convey additional information that helps the listener’s brain separate out syllables, words and sentences. However, exactly where in the brain this effect occurs and how it works remain unclear.

To find out, Giordano et al. scanned the brains of healthy volunteers as they watched clips of people speaking. The clarity of the speech varied between clips. Furthermore, in some of the clips the lip movements of the speaker corresponded to the speech in question, whereas in others the lip movements were nonsense babble. As expected, the volunteers performed better on a word recognition task when the speech was clear and when the lips movements agreed with the spoken dialogue.

Watching the video clips stimulated rhythmic activity in multiple regions of the volunteers’ brains, including areas that process sound and areas that plan movements. Speech is itself rhythmic, and the volunteers’ brain activity synchronized with the rhythms of the speech they were listening to. Seeing the speaker’s face increased this degree of synchrony. However, it also made it easier for sound-processing regions within the listeners’ brains to transfer information to one other. Notably, only the latter effect predicted improved performance on the word recognition task. This suggests that seeing a person’s face makes it easier to understand his or her speech by boosting communication between brain regions, rather than through effects on individual areas.

Further work is required to determine where and how the brain encodes lip movements and speech sounds. The next challenge will be to identify where these two sets of information interact, and how the brain merges them together to generate the impression of specific words.

DOI:http://dx.doi.org/10.7554/eLife.24763.002

Collapse

Being First Matters: Topographical Representational Similarity Analysis of ERP Signals Reveals Separate Networks for Audiovisual Temporal Binding Depending on the Leading Sense. J Neurosci 2017;37:5274-5287. [PMID: 28450537 PMCID: PMC5456109 DOI: 10.1523/jneurosci.2926-16.2017] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 02/20/2017] [Accepted: 02/25/2017] [Indexed: 11/30/2022] Open

Abstract

In multisensory integration, processing in one sensory modality is enhanced by complementary information from other modalities. Intersensory timing is crucial in this process because only inputs reaching the brain within a restricted temporal window are perceptually bound. Previous research in the audiovisual field has investigated various features of the temporal binding window, revealing asymmetries in its size and plasticity depending on the leading input: auditory–visual (AV) or visual–auditory (VA). Here, we tested whether separate neuronal mechanisms underlie this AV–VA dichotomy in humans. We recorded high-density EEG while participants performed an audiovisual simultaneity judgment task including various AV–VA asynchronies and unisensory control conditions (visual-only, auditory-only) and tested whether AV and VA processing generate different patterns of brain activity. After isolating the multisensory components of AV–VA event-related potentials (ERPs) from the sum of their unisensory constituents, we ran a time-resolved topographical representational similarity analysis (tRSA) comparing the AV and VA ERP maps. Spatial cross-correlation matrices were built from real data to index the similarity between the AV and VA maps at each time point (500 ms window after stimulus) and then correlated with two alternative similarity model matrices: AV_maps = VA_maps versus AV_maps ≠ VA_maps. The tRSA results favored the AV_maps ≠ VA_maps model across all time points, suggesting that audiovisual temporal binding (indexed by synchrony perception) engages different neural pathways depending on the leading sense. The existence of such dual route supports recent theoretical accounts proposing that multiple binding mechanisms are implemented in the brain to accommodate different information parsing strategies in auditory and visual sensory systems.

SIGNIFICANCE STATEMENT Intersensory timing is a crucial aspect of multisensory integration, determining whether and how inputs in one modality enhance stimulus processing in another modality. Our research demonstrates that evaluating synchrony of auditory-leading (AV) versus visual-leading (VA) audiovisual stimulus pairs is characterized by two distinct patterns of brain activity. This suggests that audiovisual integration is not a unitary process and that different binding mechanisms are recruited in the brain based on the leading sense. These mechanisms may be relevant for supporting different classes of multisensory operations, for example, auditory enhancement of visual attention (AV) and visual enhancement of auditory speech (VA).

Collapse

Shahin AJ, Shen S, Kerlin JR. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech. LANGUAGE, COGNITION AND NEUROSCIENCE 2017;32:1102-1118. [PMID: 28966930 PMCID: PMC5617130 DOI: 10.1080/23273798.2017.1283428] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/07/2017] [Indexed: 06/07/2023]

O'Sullivan AE, Crosse MJ, Di Liberto GM, Lalor EC. Visual Cortical Entrainment to Motion and Categorical Speech Features during Silent Lipreading. Front Hum Neurosci 2017;10:679. [PMID: 28123363 PMCID: PMC5225113 DOI: 10.3389/fnhum.2016.00679] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 12/20/2016] [Indexed: 11/13/2022] Open

Lüttke CS, Ekman M, van Gerven MAJ, de Lange FP. McGurk illusion recalibrates subsequent auditory perception. Sci Rep 2016;6:32891. [PMID: 27611960 PMCID: PMC5017187 DOI: 10.1038/srep32891] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 08/08/2016] [Indexed: 11/09/2022] Open

Atypical audiovisual word processing in school-age children with a history of specific language impairment: an event-related potential study. J Neurodev Disord 2016;8:33. [PMID: 27597881 PMCID: PMC5011345 DOI: 10.1186/s11689-016-9168-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 08/17/2016] [Indexed: 11/12/2022] Open

Abstract

Background

Visual speech cues influence different aspects of language acquisition. However, whether developmental language disorders may be associated with atypical processing of visual speech is unknown. In this study, we used behavioral and ERP measures to determine whether children with a history of SLI (H-SLI) differ from their age-matched typically developing (TD) peers in the ability to match auditory words with corresponding silent visual articulations.

Methods

Nineteen 7–13-year-old H-SLI children and 19 age-matched TD children participated in the study. Children first heard a word and then saw a speaker silently articulating a word. In half of trials, the articulated word matched the auditory word (congruent trials), while in another half, it did not (incongruent trials). Children specified whether the auditory and the articulated words matched. We examined ERPs elicited by the onset of visual stimuli (visual P1, N1, and P2) as well as ERPs elicited by the articulatory movements themselves—namely, N400 to incongruent articulations and late positive complex (LPC) to congruent articulations. We also examined whether ERP measures of visual speech processing could predict (1) children’s linguistic skills and (2) the use of visual speech cues when listening to speech-in-noise (SIN).

Results

H-SLI children were less accurate in matching auditory words with visual articulations. They had a significantly reduced P1 to the talker’s face and a smaller N400 to incongruent articulations. In contrast, congruent articulations elicited LPCs of similar amplitude in both groups of children. The P1 and N400 amplitude was significantly correlated with accuracy enhancement on the SIN task when seeing the talker’s face.

Conclusions

H-SLI children have poorly defined correspondences between speech sounds and visually observed articulatory movements that produce them.

Collapse

Baart M. Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays. Psychophysiology 2016;53:1295-306. [DOI: 10.1111/psyp.12683] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 05/09/2016] [Indexed: 11/29/2022]

Kaganovich N, Schumaker J, Rowland C. Matching heard and seen speech: An ERP study of audiovisual word recognition. BRAIN AND LANGUAGE 2016;157-158:14-24. [PMID: 27155219 PMCID: PMC4915735 DOI: 10.1016/j.bandl.2016.04.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 03/23/2016] [Accepted: 04/10/2016] [Indexed: 06/05/2023]

Park H, Kayser C, Thut G, Gross J. Lip movements entrain the observers' low-frequency brain oscillations to facilitate speech intelligibility. eLife 2016;5. [PMID: 27146891 PMCID: PMC4900800 DOI: 10.7554/elife.14521] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Accepted: 05/03/2016] [Indexed: 12/02/2022] Open

Abstract

During continuous speech, lip movements provide visual temporal signals that facilitate speech processing. Here, using MEG we directly investigated how these visual signals interact with rhythmic brain activity in participants listening to and seeing the speaker. First, we investigated coherence between oscillatory brain activity and speaker’s lip movements and demonstrated significant entrainment in visual cortex. We then used partial coherence to remove contributions of the coherent auditory speech signal from the lip-brain coherence. Comparing this synchronization between different attention conditions revealed that attending visual speech enhances the coherence between activity in visual cortex and the speaker’s lips. Further, we identified a significant partial coherence between left motor cortex and lip movements and this partial coherence directly predicted comprehension accuracy. Our results emphasize the importance of visually entrained and attention-modulated rhythmic brain activity for the enhancement of audiovisual speech processing.

DOI:http://dx.doi.org/10.7554/eLife.14521.001

People are able communicate effectively with each other even in very noisy places where it is difficult to actually hear what others are saying. In a face-to-face conversation, people detect and respond to many physical cues – including body posture, facial expressions, head and eye movements and gestures – alongside the sound cues. Lip movements are particularly important and contain enough information to allow trained observers to understand speech even if they cannot hear the speech itself.

It is known that brain waves in listeners are synchronized with the rhythms in a speech, especially the syllables. This is thought to establish a channel for communication – similar to tuning a radio to a certain frequency to listen to a certain radio station. Park et al. studied if listeners’ brain waves also align to the speaker’s lip movements during continuous speech and if this is important for understanding the speech.

The experiments reveal that a part of the brain that processes visual information – called the visual cortex – produces brain waves that are synchronized to the rhythm of syllables in continuous speech. This synchronization was more precise in a complex situation where lip movements would be more important to understand speech. Park et al. also found that the area of the observer’s brain that controls the lips (the motor cortex) also produced brain waves that were synchronized to lip movements. Volunteers whose motor cortex was more synchronized to the lip movements understood speech better. This supports the idea that brain areas that are used for producing speech are also important for understanding speech.

Future challenges include understanding how synchronization of brain waves with the rhythms of speech helps us to understand speech, and how the brain waves produced by the visual and motor areas interact.

DOI:http://dx.doi.org/10.7554/eLife.14521.002

Collapse

Yovel G, O’Toole AJ. Recognizing People in Motion. Trends Cogn Sci 2016;20:383-395. [DOI: 10.1016/j.tics.2016.02.005] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Revised: 02/18/2016] [Accepted: 02/18/2016] [Indexed: 11/15/2022]