Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Plass J, Brang D, Suzuki S, Grabowecky M. Vision perceptually restores auditory spectral dynamics in speech. Proc Natl Acad Sci U S A 2020;117:16920-16927. [PMID: 32632010 PMCID: PMC7382243 DOI: 10.1073/pnas.2002887117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

For:	Plass J, Brang D, Suzuki S, Grabowecky M. Vision perceptually restores auditory spectral dynamics in speech. Proc Natl Acad Sci U S A 2020;117:16920-16927. [PMID: 32632010 PMCID: PMC7382243 DOI: 10.1073/pnas.2002887117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Dorsi J, Lacey S, Sathian K. Multisensory and lexical information in speech perception. Front Hum Neurosci 2024;17:1331129. [PMID: 38259332 PMCID: PMC10800662 DOI: 10.3389/fnhum.2023.1331129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open

Ahn E, Majumdar A, Lee T, Brang D. Evidence for a Causal Dissociation of the McGurk Effect and Congruent Audiovisual Speech Perception via TMS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568892. [PMID: 38077093 PMCID: PMC10705272 DOI: 10.1101/2023.11.27.568892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]

Abstract

Congruent visual speech improves speech perception accuracy, particularly in noisy environments. Conversely, mismatched visual speech can alter what is heard, leading to an illusory percept known as the McGurk effect. This illusion has been widely used to study audiovisual speech integration, illustrating that auditory and visual cues are combined in the brain to generate a single coherent percept. While prior transcranial magnetic stimulation (TMS) and neuroimaging studies have identified the left posterior superior temporal sulcus (pSTS) as a causal region involved in the generation of the McGurk effect, it remains unclear whether this region is critical only for this illusion or also for the more general benefits of congruent visual speech (e.g., increased accuracy and faster reaction times). Indeed, recent correlative research suggests that the benefits of congruent visual speech and the McGurk effect reflect largely independent mechanisms. To better understand how these different features of audiovisual integration are causally generated by the left pSTS, we used single-pulse TMS to temporarily impair processing while subjects were presented with either incongruent (McGurk) or congruent audiovisual combinations. Consistent with past research, we observed that TMS to the left pSTS significantly reduced the strength of the McGurk effect. Importantly, however, left pSTS stimulation did not affect the positive benefits of congruent audiovisual speech (increased accuracy and faster reaction times), demonstrating a causal dissociation between the two processes. Our results are consistent with models proposing that the pSTS is but one of multiple critical areas supporting audiovisual speech interactions. Moreover, these data add to a growing body of evidence suggesting that the McGurk effect is an imperfect surrogate measure for more general and ecologically valid audiovisual speech behaviors.

Collapse

Nidiffer AR, Cao CZ, O'Sullivan A, Lalor EC. A representation of abstract linguistic categories in the visual system underlies successful lipreading. Neuroimage 2023;282:120391. [PMID: 37757989 DOI: 10.1016/j.neuroimage.2023.120391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 09/29/2023] Open

Yamoah EN, Pavlinkova G, Fritzsch B. The Development of Speaking and Singing in Infants May Play a Role in Genomics and Dementia in Humans. Brain Sci 2023;13:1190. [PMID: 37626546 PMCID: PMC10452560 DOI: 10.3390/brainsci13081190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 08/04/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open

Giovanelli E, Gianfreda G, Gessa E, Valzolgher C, Lamano L, Lucioli T, Tomasuolo E, Rinaldi P, Pavani F. The effect of face masks on sign language comprehension: performance and metacognitive dimensions. Conscious Cogn 2023;109:103490. [PMID: 36842317 DOI: 10.1016/j.concog.2023.103490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 02/16/2023] [Accepted: 02/16/2023] [Indexed: 02/27/2023]

Suess N, Hauswald A, Reisinger P, Rösch S, Keitel A, Weisz N. Cortical tracking of formant modulations derived from silently presented lip movements and its decline with age. Cereb Cortex 2022;32:4818-4833. [PMID: 35062025 PMCID: PMC9627034 DOI: 10.1093/cercor/bhab518] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 12/15/2021] [Accepted: 12/16/2021] [Indexed: 11/26/2022] Open

Abstract

The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers' lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations.

Collapse

Fritzsch B, Elliott KL, Yamoah EN. Neurosensory development of the four brainstem-projecting sensory systems and their integration in the telencephalon. Front Neural Circuits 2022;16:913480. [PMID: 36213204 PMCID: PMC9539932 DOI: 10.3389/fncir.2022.913480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 08/23/2022] [Indexed: 11/18/2022] Open

Haider CL, Suess N, Hauswald A, Park H, Weisz N. Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker. Neuroimage 2022;252:119044. [PMID: 35240298 DOI: 10.1016/j.neuroimage.2022.119044] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/26/2022] [Accepted: 02/27/2022] [Indexed: 11/29/2022] Open

Rennig J, Beauchamp MS. Intelligibility of audiovisual sentences drives multivoxel response patterns in human superior temporal cortex. Neuroimage 2022;247:118796. [PMID: 34906712 PMCID: PMC8819942 DOI: 10.1016/j.neuroimage.2021.118796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 11/18/2021] [Accepted: 12/08/2021] [Indexed: 11/18/2022] Open

Kaganovich N, Christ S. Event-related potentials evidence for long-term audiovisual representations of phonemes in adults. Eur J Neurosci 2021;54:7860-7875. [PMID: 34750895 DOI: 10.1111/ejn.15519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 11/03/2021] [Accepted: 11/05/2021] [Indexed: 10/19/2022]

Karthik G, Plass J, Beltz AM, Liu Z, Grabowecky M, Suzuki S, Stacey WC, Wasade VS, Towle VL, Tao JX, Wu S, Issa NP, Brang D. Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex. Eur J Neurosci 2021;54:7301-7317. [PMID: 34587350 DOI: 10.1111/ejn.15482] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 08/20/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022]

O'Sullivan AE, Crosse MJ, Liberto GMD, de Cheveigné A, Lalor EC. Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects. J Neurosci 2021;41:4991-5003. [PMID: 33824190 PMCID: PMC8197638 DOI: 10.1523/jneurosci.0906-20.2021] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 12/27/2022] Open

Abstract

Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.

Collapse

Keitel A, Gross J, Kayser C. Shared and modality-specific brain regions that mediate auditory and visual word comprehension. eLife 2020;9:e56972. [PMID: 32831168 PMCID: PMC7470824 DOI: 10.7554/elife.56972] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 08/18/2020] [Indexed: 12/22/2022] Open

Plass J, Brang D, Suzuki S, Grabowecky M. Vision perceptually restores auditory spectral dynamics in speech. Proc Natl Acad Sci U S A 2020;117:16920-16927. [PMID: 32632010 PMCID: PMC7382243 DOI: 10.1073/pnas.2002887117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

Collapse