1
|
Ahn E, Majumdar A, Lee T, Brang D. Evidence for a Causal Dissociation of the McGurk Effect and Congruent Audiovisual Speech Perception via TMS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568892. [PMID: 38077093 PMCID: PMC10705272 DOI: 10.1101/2023.11.27.568892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Congruent visual speech improves speech perception accuracy, particularly in noisy environments. Conversely, mismatched visual speech can alter what is heard, leading to an illusory percept known as the McGurk effect. This illusion has been widely used to study audiovisual speech integration, illustrating that auditory and visual cues are combined in the brain to generate a single coherent percept. While prior transcranial magnetic stimulation (TMS) and neuroimaging studies have identified the left posterior superior temporal sulcus (pSTS) as a causal region involved in the generation of the McGurk effect, it remains unclear whether this region is critical only for this illusion or also for the more general benefits of congruent visual speech (e.g., increased accuracy and faster reaction times). Indeed, recent correlative research suggests that the benefits of congruent visual speech and the McGurk effect reflect largely independent mechanisms. To better understand how these different features of audiovisual integration are causally generated by the left pSTS, we used single-pulse TMS to temporarily impair processing while subjects were presented with either incongruent (McGurk) or congruent audiovisual combinations. Consistent with past research, we observed that TMS to the left pSTS significantly reduced the strength of the McGurk effect. Importantly, however, left pSTS stimulation did not affect the positive benefits of congruent audiovisual speech (increased accuracy and faster reaction times), demonstrating a causal dissociation between the two processes. Our results are consistent with models proposing that the pSTS is but one of multiple critical areas supporting audiovisual speech interactions. Moreover, these data add to a growing body of evidence suggesting that the McGurk effect is an imperfect surrogate measure for more general and ecologically valid audiovisual speech behaviors.
Collapse
Affiliation(s)
- EunSeon Ahn
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Areti Majumdar
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Taraz Lee
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
2
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
3
|
Saalasti S, Alho J, Lahnakoski JM, Bacha-Trams M, Glerean E, Jääskeläinen IP, Hasson U, Sams M. Lipreading a naturalistic narrative in a female population: Neural characteristics shared with listening and reading. Brain Behav 2023; 13:e2869. [PMID: 36579557 PMCID: PMC9927859 DOI: 10.1002/brb3.2869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 12/30/2022] Open
Abstract
INTRODUCTION Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.
Collapse
Affiliation(s)
- Satu Saalasti
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.,Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Advanced Magnetic Imaging (AMI) Centre, Aalto NeuroImaging, School of Science, Aalto University, Espoo, Finland
| | - Jussi Alho
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Juha M Lahnakoski
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Munich, Germany.,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany.,Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mareike Bacha-Trams
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Enrico Glerean
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Iiro P Jääskeläinen
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto Studios - MAGICS, Aalto University, Espoo, Finland
| |
Collapse
|
4
|
Fisher VL, Dean CL, Nave CS, Parkins EV, Kerkhoff WG, Kwakye LD. Increases in sensory noise predict attentional disruptions to audiovisual speech perception. Front Hum Neurosci 2023; 16:1027335. [PMID: 36684833 PMCID: PMC9846366 DOI: 10.3389/fnhum.2022.1027335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/05/2022] [Indexed: 01/06/2023] Open
Abstract
We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD.
Collapse
Affiliation(s)
- Victoria L. Fisher
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
- Yale University School of Medicine and the Connecticut Mental Health Center, New Haven, CT, United States
| | - Cassandra L. Dean
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
- Roche/Genentech Neurodevelopment & Psychiatry Teams Product Development, Neuroscience, South San Francisco, CA, United States
| | - Claire S. Nave
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
| | - Emma V. Parkins
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
- Neuroscience Graduate Program, University of Cincinnati, Cincinnati, OH, United States
| | - Willa G. Kerkhoff
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Leslie D. Kwakye
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
| |
Collapse
|
5
|
Sheffield SW, Wheeler HJ, Brungart DS, Bernstein JGW. The Effect of Sound Localization on Auditory-Only and Audiovisual Speech Recognition in a Simulated Multitalker Environment. Trends Hear 2023; 27:23312165231186040. [PMID: 37415497 PMCID: PMC10331332 DOI: 10.1177/23312165231186040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/13/2023] [Accepted: 06/17/2023] [Indexed: 07/08/2023] Open
Abstract
Information regarding sound-source spatial location provides several speech-perception benefits, including auditory spatial cues for perceptual talker separation and localization cues to face the talker to obtain visual speech information. These benefits have typically been examined separately. A real-time processing algorithm for sound-localization degradation (LocDeg) was used to investigate how spatial-hearing benefits interact in a multitalker environment. Normal-hearing adults performed auditory-only and auditory-visual sentence recognition with target speech and maskers presented from loudspeakers at -90°, -36°, 36°, or 90° azimuths. For auditory-visual conditions, one target and three masking talker videos (always spatially separated) were rendered virtually in rectangular windows at these locations on a head-mounted display. Auditory-only conditions presented blank windows at these locations. Auditory target speech (always spatially aligned with the target video) was presented in co-located speech-shaped noise (experiment 1) or with three co-located or spatially separated auditory interfering talkers corresponding to the masker videos (experiment 2). In the co-located conditions, the LocDeg algorithm did not affect auditory-only performance but reduced target orientation accuracy, reducing auditory-visual benefit. In the multitalker environment, two spatial-hearing benefits were observed: perceptually separating competing speech based on auditory spatial differences and orienting to the target talker to obtain visual speech cues. These two benefits were additive, and both were diminished by the LocDeg algorithm. Although visual cues always improved performance when the target was accurately localized, there was no strong evidence that they provided additional assistance in perceptually separating co-located competing speech. These results highlight the importance of sound localization in everyday communication.
Collapse
Affiliation(s)
- Sterling W. Sheffield
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, USA
| | - Harley J. Wheeler
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, MN, USA
| | - Douglas S. Brungart
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD, USA
| | - Joshua G. W. Bernstein
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD, USA
| |
Collapse
|
6
|
Begau A, Arnau S, Klatt LI, Wascher E, Getzmann S. Using visual speech at the cocktail-party: CNV evidence for early speech extraction in younger and older adults. Hear Res 2022; 426:108636. [DOI: 10.1016/j.heares.2022.108636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/26/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022]
|
7
|
Beadle J, Kim J, Davis C. Effects of Age and Uncertainty on the Visual Speech Benefit in Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:5041-5060. [PMID: 34762813 DOI: 10.1044/2021_jslhr-20-00495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
PURPOSE Listeners understand significantly more speech in noise when the talker's face can be seen (visual speech) in comparison to an auditory-only baseline (a visual speech benefit). This study investigated whether the visual speech benefit is reduced when the correspondence between auditory and visual speech is uncertain and whether any reduction is affected by listener age (older vs. younger) and how severe the auditory signal is masked. METHOD Older and younger adults completed a speech recognition in noise task that included an auditory-only condition and four auditory-visual (AV) conditions in which one, two, four, or six silent talking face videos were presented. One face always matched the auditory signal; the other face(s) did not. Auditory speech was presented in noise at -6 and -1 dB signal-to-noise ratio (SNR). RESULTS When the SNR was -6 dB, for both age groups, the standard-sized visual speech benefit reduced as more talking faces were presented. When the SNR was -1 dB, younger adults received the standard-sized visual speech benefit even when two talking faces were presented, whereas older adults did not. CONCLUSIONS The size of the visual speech benefit obtained by older adults was always smaller when AV correspondence was uncertain; this was not the case for younger adults. Difficulty establishing AV correspondence may be a factor that limits older adults' speech recognition in noisy AV environments. Supplemental Material https://doi.org/10.23641/asha.16879549.
Collapse
Affiliation(s)
- Julie Beadle
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
- The HEARing Cooperative Research Centre, Carlton, Victoria, Australia
| | - Jeesun Kim
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
| | - Chris Davis
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
- The HEARing Cooperative Research Centre, Carlton, Victoria, Australia
| |
Collapse
|
8
|
Trotter AS, Banks B, Adank P. The Relevance of the Availability of Visual Speech Cues During Adaptation to Noise-Vocoded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2513-2528. [PMID: 34161748 DOI: 10.1044/2021_jslhr-20-00575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose This study first aimed to establish whether viewing specific parts of the speaker's face (eyes or mouth), compared to viewing the whole face, affected adaptation to distorted noise-vocoded sentences. Second, this study also aimed to replicate results on processing of distorted speech from lab-based experiments in an online setup. Method We monitored recognition accuracy online while participants were listening to noise-vocoded sentences. We first established if participants were able to perceive and adapt to audiovisual four-band noise-vocoded sentences when the entire moving face was visible (AV Full). Four further groups were then tested: a group in which participants viewed the moving lower part of the speaker's face (AV Mouth), a group in which participants only see the moving upper part of the face (AV Eyes), a group in which participants could not see the moving lower or upper face (AV Blocked), and a group in which participants saw an image of a still face (AV Still). Results Participants repeated around 40% of the key words correctly and adapted during the experiment, but only when the moving mouth was visible. In contrast, performance was at floor level, and no adaptation took place, in conditions when the moving mouth was occluded. Conclusions The results show the importance of being able to observe relevant visual speech information from the speaker's mouth region, but not the eyes/upper face region, when listening and adapting to distorted sentences online. Second, the results also demonstrated that it is feasible to run speech perception and adaptation studies online, but that not all findings reported for lab studies replicate. Supplemental Material https://doi.org/10.23641/asha.14810523.
Collapse
Affiliation(s)
- Antony S Trotter
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Briony Banks
- Department of Psychology, Lancaster University, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
9
|
Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect. Atten Percept Psychophys 2020; 82:3544-3557. [PMID: 32533526 PMCID: PMC7788022 DOI: 10.3758/s13414-020-02042-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seeing a talker’s face can aid audiovisual (AV) integration when speech is presented in noise. However, few studies have simultaneously manipulated auditory and visual degradation. We aimed to establish how degrading the auditory and visual signal affected AV integration. Where people look on the face in this context is also of interest; Buchan, Paré and Munhall (Brain Research, 1242, 162–171, 2008) found fixations on the mouth increased in the presence of auditory noise whilst Wilson, Alsius, Paré and Munhall (Journal of Speech, Language, and Hearing Research, 59(4), 601–615, 2016) found mouth fixations decreased with decreasing visual resolution. In Condition 1, participants listened to clear speech, and in Condition 2, participants listened to vocoded speech designed to simulate the information provided by a cochlear implant. Speech was presented in three levels of auditory noise and three levels of visual blurring. Adding noise to the auditory signal increased McGurk responses, while blurring the visual signal decreased McGurk responses. Participants fixated the mouth more on trials when the McGurk effect was perceived. Adding auditory noise led to people fixating the mouth more, while visual degradation led to people fixating the mouth less. Combined, the results suggest that modality preference and where people look during AV integration of incongruent syllables varies according to the quality of information available.
Collapse
|
10
|
A value-driven McGurk effect: Value-associated faces enhance the influence of visual information on audiovisual speech perception and its eye movement pattern. Atten Percept Psychophys 2020; 82:1928-1941. [PMID: 31898072 DOI: 10.3758/s13414-019-01918-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This study investigates whether and how value-associated faces affect audiovisual speech perception and its eye movement pattern. Participants were asked to learn to associate particular faces with or without monetary reward in the training phase, and, in the subsequent test phase, to identify syllables that the talkers had said in video clips in which the talkers' faces had or had not been associated with reward. The syllables were either congruent or incongruent with the talkers' mouth movements. Crucially, in some cases, the incongruent syllables could elicit the McGurk effect. Results showed that the McGurk effect occurred more often for reward-associated faces than for non-reward-associated faces. Moreover, the signal detection analysis revealed that participants had lower criterion and higher discriminability for reward-associated faces than for non-reward-associated faces. Surprisingly, eye movement data showed that participants spent more time looking at and fixated more often on the extraoral (nose/cheek) area for reward-associated faces than for non-reward-associated faces, while the opposite pattern was observed on the oral (mouth) area. The correlation analysis demonstrated that, over participants, the more they looked at the extraoral area in the training phase because of reward, the larger the increase of McGurk proportion (and the less they looked at the oral area) in the test phase. These findings not only demonstrate that value-associated faces enhance the influence of visual information on audiovisual speech perception but also highlight the importance of the extraoral facial area in the value-driven McGurk effect.
Collapse
|
11
|
Abstract
Gaze-where one looks, how long, and when-plays an essential part in human social behavior. While many aspects of social gaze have been reviewed, there is no comprehensive review or theoretical framework that describes how gaze to faces supports face-to-face interaction. In this review, I address the following questions: (1) When does gaze need to be allocated to a particular region of a face in order to provide the relevant information for successful interaction; (2) How do humans look at other people, and faces in particular, regardless of whether gaze needs to be directed at a particular region to acquire the relevant visual information; (3) How does gaze support the regulation of interaction? The work reviewed spans psychophysical research, observational research, and eye-tracking research in both lab-based and interactive contexts. Based on the literature overview, I sketch a framework for future research based on dynamic systems theory. The framework holds that gaze should be investigated in relation to sub-states of the interaction, encompassing sub-states of the interactors, the content of the interaction as well as the interactive context. The relevant sub-states for understanding gaze in interaction vary over different timescales from microgenesis to ontogenesis and phylogenesis. The framework has important implications for vision science, psychopathology, developmental science, and social robotics.
Collapse
Affiliation(s)
- Roy S Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584CS, Utrecht, The Netherlands.
- Developmental Psychology, Heidelberglaan 1, 3584CS, Utrecht, The Netherlands.
| |
Collapse
|
12
|
Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech. J Neurosci 2019; 40:1053-1065. [PMID: 31889007 DOI: 10.1523/jneurosci.1101-19.2019] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 11/28/2019] [Accepted: 12/04/2019] [Indexed: 11/21/2022] Open
Abstract
Lip-reading is crucial for understanding speech in challenging conditions. But how the brain extracts meaning from, silent, visual speech is still under debate. Lip-reading in silence activates the auditory cortices, but it is not known whether such activation reflects immediate synthesis of the corresponding auditory stimulus or imagery of unrelated sounds. To disentangle these possibilities, we used magnetoencephalography to evaluate how cortical activity in 28 healthy adult humans (17 females) entrained to the auditory speech envelope and lip movements (mouth opening) when listening to a spoken story without visual input (audio-only), and when seeing a silent video of a speaker articulating another story (video-only). In video-only, auditory cortical activity entrained to the absent auditory signal at frequencies <1 Hz more than to the seen lip movements. This entrainment process was characterized by an auditory-speech-to-brain delay of ∼70 ms in the left hemisphere, compared with ∼20 ms in audio-only. Entrainment to mouth opening was found in the right angular gyrus at <1 Hz, and in early visual cortices at 1-8 Hz. These findings demonstrate that the brain can use a silent lip-read signal to synthesize a coarse-grained auditory speech representation in early auditory cortices. Our data indicate the following underlying oscillatory mechanism: seeing lip movements first modulates neuronal activity in early visual cortices at frequencies that match articulatory lip movements; the right angular gyrus then extracts slower features of lip movements, mapping them onto the corresponding speech sound features; this information is fed to auditory cortices, most likely facilitating speech parsing.SIGNIFICANCE STATEMENT Lip-reading consists in decoding speech based on visual information derived from observation of a speaker's articulatory facial gestures. Lip-reading is known to improve auditory speech understanding, especially when speech is degraded. Interestingly, lip-reading in silence still activates the auditory cortices, even when participants do not know what the absent auditory signal should be. However, it was uncertain what such activation reflected. Here, using magnetoencephalographic recordings, we demonstrate that it reflects fast synthesis of the auditory stimulus rather than mental imagery of unrelated, speech or non-speech, sounds. Our results also shed light on the oscillatory dynamics underlying lip-reading.
Collapse
|
13
|
Fixating the eyes of a speaker provides sufficient visual information to modulate early auditory processing. Biol Psychol 2019; 146:107724. [PMID: 31323242 DOI: 10.1016/j.biopsycho.2019.107724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/26/2019] [Accepted: 06/29/2019] [Indexed: 11/23/2022]
Abstract
In face-to-face conversations, when listeners process and combine information obtained from hearing and seeing a speaker, they mostly look at the eyes rather than at the more informative mouth region. Measuring event-related potentials, we tested whether fixating the speaker's eyes is sufficient for gathering enough visual speech information to modulate early auditory processing, or whether covert attention to the speaker's mouth is needed. Results showed that when listeners fixated the eye region of the speaker, the amplitudes of the auditory evoked N1 and P2 were reduced when listeners heard and saw the speaker than when they only heard her. These cross-modal interactions also occurred when, in addition, attention was restricted to the speaker's eye region. Fixating the speaker's eyes thus provides listeners with sufficient visual information to facilitate early auditory processing. The spread of covert attention to the mouth area is not needed to observe audiovisual interactions.
Collapse
|
14
|
Rennig J, Beauchamp MS. Free viewing of talking faces reveals mouth and eye preferring regions of the human superior temporal sulcus. Neuroimage 2018; 183:25-36. [PMID: 30092347 PMCID: PMC6214361 DOI: 10.1016/j.neuroimage.2018.08.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 07/31/2018] [Accepted: 08/05/2018] [Indexed: 01/22/2023] Open
Abstract
During face-to-face communication, the mouth of the talker is informative about speech content, while the eyes of the talker convey other information, such as gaze location. Viewers most often fixate either the mouth or the eyes of the talker's face, presumably allowing them to sample these different sources of information. To study the neural correlates of this process, healthy humans freely viewed talking faces while brain activity was measured with BOLD fMRI and eye movements were recorded with a video-based eye tracker. Post hoc trial sorting was used to divide the data into trials in which participants fixated the mouth of the talker and trials in which they fixated the eyes. Although the audiovisual stimulus was identical, the two trials types evoked differing responses in subregions of the posterior superior temporal sulcus (pSTS). The anterior pSTS preferred trials in which participants fixated the mouth of the talker while the posterior pSTS preferred fixations on the eye of the talker. A second fMRI experiment demonstrated that anterior pSTS responded more strongly to auditory and audiovisual speech than posterior pSTS eye-preferring regions. These results provide evidence for functional specialization within the pSTS under more realistic viewing and stimulus conditions than in previous neuroimaging studies.
Collapse
Affiliation(s)
- Johannes Rennig
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| | - Michael S Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
15
|
Masapollo M, Polka L, Ménard L, Franklin L, Tiede M, Morgan J. Asymmetries in unimodal visual vowel perception: The roles of oral-facial kinematics, orientation, and configuration. J Exp Psychol Hum Percept Perform 2018; 44:1103-1118. [PMID: 29517257 PMCID: PMC6037555 DOI: 10.1037/xhp0000518] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Masapollo, Polka, and Ménard (2017) recently reported a robust directional asymmetry in unimodal visual vowel perception: Adult perceivers discriminate a change from an English /u/ viseme to a French /u/ viseme significantly better than a change in the reverse direction. This asymmetry replicates a frequent pattern found in unimodal auditory vowel perception that points to a universal bias favoring more extreme vocalic articulations, which lead to acoustic signals with increased formant convergence. In the present article, the authors report 5 experiments designed to investigate whether this asymmetry in the visual realm reflects a speech-specific or general processing bias. They successfully replicated the directional effect using Masapollo et al.'s dynamically articulating faces but failed to replicate the effect when the faces were shown under static conditions. Asymmetries also emerged during discrimination of canonically oriented point-light stimuli that retained the kinematics and configuration of the articulating mouth. In contrast, no asymmetries emerged during discrimination of rotated point-light stimuli or Lissajou patterns that retained the kinematics, but not the canonical orientation or spatial configuration, of the labial gestures. These findings suggest that the perceptual processes underlying asymmetries in unimodal visual vowel discrimination are sensitive to speech-specific motion and configural properties and raise foundational questions concerning the role of specialized and general processes in vowel perception. (PsycINFO Database Record
Collapse
Affiliation(s)
- Matthew Masapollo
- Brown University
- McGill University
- Centre for Research on Brain, Language, and Music
| | - Linda Polka
- McGill University
- Centre for Research on Brain, Language, and Music
| | - Lucie Ménard
- Centre for Research on Brain, Language, and Music
- University of Quebec at Montreal
| | | | | | | |
Collapse
|
16
|
Proverbio AM, Raso G, Zani A. Electrophysiological Indexes of Incongruent Audiovisual Phonemic Processing: Unraveling the McGurk Effect. Neuroscience 2018; 385:215-226. [PMID: 29932985 DOI: 10.1016/j.neuroscience.2018.06.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 06/11/2018] [Accepted: 06/12/2018] [Indexed: 11/15/2022]
Abstract
In this study the timing of electromagnetic signals recorded during incongruent and congruent audiovisual (AV) stimulation in 14 Italian healthy volunteers was examined. In a previous study (Proverbio et al., 2016) we investigated the McGurk effect in the Italian language and found out which visual and auditory inputs provided the most compelling illusory effects (e.g., bilabial phonemes presented acoustically and paired with non-labials, especially alveolar-nasal and velar-occlusive phonemes). In this study EEG was recorded from 128 scalp sites while participants observed a female and a male actor uttering 288 syllables selected on the basis of the previous investigation (lasting approximately 600 ms) and responded to rare targets (/re/, /ri/, /ro/, /ru/). In half of the cases the AV information was incongruent, except for targets that were always congruent. A pMMN (phonological Mismatch Negativity) to incongruent AV stimuli was identified 500 ms after voice onset time. This automatic response indexed the detection of an incongruity between the labial and phonetic information. SwLORETA (Low-Resolution Electromagnetic Tomography) analysis applied to the difference voltage incongruent-congruent in the same time window revealed that the strongest sources of this activity were the right superior temporal (STG) and superior frontal gyri, which supports their involvement in AV integration.
Collapse
Affiliation(s)
- Alice Mado Proverbio
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy.
| | - Giulia Raso
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy
| | | |
Collapse
|
17
|
Worster E, Pimperton H, Ralph-Lewis A, Monroy L, Hulme C, MacSweeney M. Eye Movements During Visual Speech Perception in Deaf and Hearing Children. LANGUAGE LEARNING 2018; 68:159-179. [PMID: 29937576 PMCID: PMC6001475 DOI: 10.1111/lang.12264] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 08/10/2017] [Accepted: 08/11/2017] [Indexed: 06/08/2023]
Abstract
For children who are born deaf, lipreading (speechreading) is an important source of access to spoken language. We used eye tracking to investigate the strategies used by deaf (n = 33) and hearing 5-8-year-olds (n = 59) during a sentence speechreading task. The proportion of time spent looking at the mouth during speech correlated positively with speechreading accuracy. In addition, all children showed a tendency to watch the mouth during speech and watch the eyes when the model was not speaking. The extent to which the children used this communicative pattern, which we refer to as social-tuning, positively predicted their speechreading performance, with the deaf children showing a stronger relationship than the hearing children. These data suggest that better speechreading skills are seen in those children, both deaf and hearing, who are able to guide their visual attention to the appropriate part of the image and in those who have a good understanding of conversational turn-taking.
Collapse
Affiliation(s)
| | | | - Amelia Ralph-Lewis
- Deafness, Cognition, and Language Research Centre University College London
| | - Laura Monroy
- Institute of Cognitive Neuroscience University College London
| | | | - Mairéad MacSweeney
- Institute of Cognitive Neuroscience University College London
- Deafness, Cognition, and Language Research Centre University College London
| |
Collapse
|
18
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
19
|
Looking Behavior and Audiovisual Speech Understanding in Children With Normal Hearing and Children With Mild Bilateral or Unilateral Hearing Loss. Ear Hear 2017; 39:783-794. [PMID: 29252979 DOI: 10.1097/aud.0000000000000534] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Visual information from talkers facilitates speech intelligibility for listeners when audibility is challenged by environmental noise and hearing loss. Less is known about how listeners actively process and attend to visual information from different talkers in complex multi-talker environments. This study tracked looking behavior in children with normal hearing (NH), mild bilateral hearing loss (MBHL), and unilateral hearing loss (UHL) in a complex multi-talker environment to examine the extent to which children look at talkers and whether looking patterns relate to performance on a speech-understanding task. It was hypothesized that performance would decrease as perceptual complexity increased and that children with hearing loss would perform more poorly than their peers with NH. Children with MBHL or UHL were expected to demonstrate greater attention to individual talkers during multi-talker exchanges, indicating that they were more likely to attempt to use visual information from talkers to assist in speech understanding in adverse acoustics. It also was of interest to examine whether MBHL, versus UHL, would differentially affect performance and looking behavior. DESIGN Eighteen children with NH, eight children with MBHL, and 10 children with UHL participated (8-12 years). They followed audiovisual instructions for placing objects on a mat under three conditions: a single talker providing instructions via a video monitor, four possible talkers alternately providing instructions on separate monitors in front of the listener, and the same four talkers providing both target and nontarget information. Multi-talker background noise was presented at a 5 dB signal-to-noise ratio during testing. An eye tracker monitored looking behavior while children performed the experimental task. RESULTS Behavioral task performance was higher for children with NH than for either group of children with hearing loss. There were no differences in performance between children with UHL and children with MBHL. Eye-tracker analysis revealed that children with NH looked more at the screens overall than did children with MBHL or UHL, though individual differences were greater in the groups with hearing loss. Listeners in all groups spent a small proportion of time looking at relevant screens as talkers spoke. Although looking was distributed across all screens, there was a bias toward the right side of the display. There was no relationship between overall looking behavior and performance on the task. CONCLUSIONS The present study examined the processing of audiovisual speech in the context of a naturalistic task. Results demonstrated that children distributed their looking to a variety of sources during the task, but that children with NH were more likely to look at screens than were those with MBHL/UHL. However, all groups looked at the relevant talkers as they were speaking only a small proportion of the time. Despite variability in looking behavior, listeners were able to follow the audiovisual instructions and children with NH demonstrated better performance than children with MBHL/UHL. These results suggest that performance on some challenging multi-talker audiovisual tasks is not dependent on visual fixation to relevant talkers for children with NH or with MBHL/UHL.
Collapse
|
20
|
Irwin J, Brancazio L, Volpe N. The development of gaze to a speaking face. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:3145. [PMID: 28599552 PMCID: PMC5422207 DOI: 10.1121/1.4982727] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 03/16/2017] [Accepted: 04/17/2017] [Indexed: 05/11/2023]
Abstract
When a speaker talks, the visible consequences of what they are saying can be seen. Listeners are influenced by this visible speech both in a noisy listening environment and even when auditory speech can easily be heard. While visible influence on heard speech has been reported to increase from early to late childhood, little is known about the mechanism that underlies this developmental trend. One possible account of developmental differences is that looking behavior to the face of a speaker changes with age. To examine this possibility, the gaze to a speaking face was examined in children from 5 to 10 yrs of age and adults. Participants viewed a speaker's face in a range of conditions that elicit looking: in a visual only (speech reading) condition, in the presence of auditory noise (speech in noise) condition, and in an audiovisual mismatch (McGurk) condition. Results indicate an increase in gaze on the face, and specifically, to the mouth of a speaker between the ages of 5 and 10 for all conditions. This change in looking behavior may help account for previous findings in the literature showing that visual influence on heard speech increases with development.
Collapse
Affiliation(s)
- Julia Irwin
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, USA
| | - Lawrence Brancazio
- Southern Connecticut State University, 501 Crescent Street, New Haven, Connecticut 06515, USA
| | - Nicole Volpe
- Southern Connecticut State University, 501 Crescent Street, New Haven, Connecticut 06515, USA
| |
Collapse
|
21
|
Irwin J, DiBlasi L. Audiovisual speech perception: A new approach and implications for clinical populations. LANGUAGE AND LINGUISTICS COMPASS 2017; 11:77-91. [PMID: 29520300 PMCID: PMC5839512 DOI: 10.1111/lnc3.12237] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 01/25/2017] [Indexed: 06/01/2023]
Abstract
This selected overview of audiovisual (AV) speech perception examines the influence of visible articulatory information on what is heard. Thought to be a cross-cultural phenomenon that emerges early in typical language development, variables that influence AV speech perception include properties of the visual and the auditory signal, attentional demands, and individual differences. A brief review of the existing neurobiological evidence on how visual information influences heard speech indicates potential loci, timing, and facilitatory effects of AV over auditory only speech. The current literature on AV speech in certain clinical populations (individuals with an autism spectrum disorder, developmental language disorder, or hearing loss) reveals differences in processing that may inform interventions. Finally, a new method of assessing AV speech that does not require obvious cross-category mismatch or auditory noise was presented as a novel approach for investigators.
Collapse
Affiliation(s)
- Julia Irwin
- LEARN Center, Haskins Laboratories Inc., USA
| | | |
Collapse
|
22
|
De Pascalis L, Kkeli N, Chakrabarti B, Dalton L, Vaillancourt K, Rayson H, Bicknell S, Goodacre T, Cooper P, Stein A, Murray L. Maternal gaze to the infant face: Effects of infant age and facial configuration during mother-infant engagement in the first nine weeks. Infant Behav Dev 2017; 46:91-99. [DOI: 10.1016/j.infbeh.2016.12.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2016] [Revised: 12/18/2016] [Accepted: 12/19/2016] [Indexed: 01/21/2023]
|
23
|
Visual speech influences speech perception immediately but not automatically. Atten Percept Psychophys 2016; 79:660-678. [DOI: 10.3758/s13414-016-1249-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
24
|
Abstract
Talkers automatically imitate aspects of perceived speech, a phenomenon known as phonetic convergence. Talkers have previously been found to converge to auditory and visual speech information. Furthermore, talkers converge more to the speech of a conversational partner who is seen and heard, relative to one who is just heard (Dias & Rosenblum Perception, 40, 1457-1466, 2011). A question raised by this finding is what visual information facilitates the enhancement effect. In the following experiments, we investigated the possible contributions of visible speech articulation to visual enhancement of phonetic convergence within the noninteractive context of a shadowing task. In Experiment 1, we examined the influence of the visibility of a talker on phonetic convergence when shadowing auditory speech either in the clear or in low-level auditory noise. The results suggest that visual speech can compensate for convergence that is reduced by auditory noise masking. Experiment 2 further established the visibility of articulatory mouth movements as being important to the visual enhancement of phonetic convergence. Furthermore, the word frequency and phonological neighborhood density characteristics of the words shadowed were found to significantly predict phonetic convergence in both experiments. Consistent with previous findings (e.g., Goldinger Psychological Review, 105, 251-279, 1998), phonetic convergence was greater when shadowing low-frequency words. Convergence was also found to be greater for low-density words, contrasting with previous predictions of the effect of phonological neighborhood density on auditory phonetic convergence (e.g., Pardo, Jordan, Mallari, Scanlon, & Lewandowski Journal of Memory and Language, 69, 183-195, 2013). Implications of the results for a gestural account of phonetic convergence are discussed.
Collapse
|
25
|
Wilson AH, Alsius A, Paré M, Munhall KG. Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:601-15. [PMID: 27537379 PMCID: PMC5280058 DOI: 10.1044/2016_jslhr-s-15-0092] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 09/16/2015] [Accepted: 10/07/2015] [Indexed: 06/06/2023]
Abstract
PURPOSE The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. METHOD We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent conditions (Experiment 1; N = 66). In Experiment 2 (N = 20), participants performed a visual-only speech perception task and in Experiment 3 (N = 20) an audiovisual task while having their gaze behavior monitored using eye-tracking equipment. RESULTS In the visual-only condition, increasing image resolution led to monotonic increases in performance, and proficient speechreaders were more affected by the removal of high spatial information than were poor speechreaders. The McGurk effect also increased with increasing visual resolution, although it was less affected by the removal of high-frequency information. Observers tended to fixate on the mouth more in visual-only perception, but gaze toward the mouth did not correlate with accuracy of silent speechreading or the magnitude of the McGurk effect. CONCLUSIONS The results suggest that individual differences in silent speechreading and the McGurk effect are not related. This conclusion is supported by differential influences of high-resolution visual information on the 2 tasks and differences in the pattern of gaze.
Collapse
Affiliation(s)
- Amanda H. Wilson
- Psychology Department, Queen's University, Kingston, Ontario, Canada
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Agnès Alsius
- Psychology Department, Queen's University, Kingston, Ontario, Canada
| | - Martin Paré
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Kevin G. Munhall
- Psychology Department, Queen's University, Kingston, Ontario, Canada
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
26
|
High visual resolution matters in audiovisual speech perception, but only for some. Atten Percept Psychophys 2016; 78:1472-87. [DOI: 10.3758/s13414-016-1109-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
27
|
Barenholtz E, Mavica L, Lewkowicz DJ. Language familiarity modulates relative attention to the eyes and mouth of a talker. Cognition 2015; 147:100-5. [PMID: 26649759 DOI: 10.1016/j.cognition.2015.11.013] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Revised: 10/08/2015] [Accepted: 11/23/2015] [Indexed: 10/22/2022]
Abstract
We investigated whether the audiovisual speech cues available in a talker's mouth elicit greater attention when adults have to process speech in an unfamiliar language vs. a familiar language. Participants performed a speech-encoding task while watching and listening to videos of a talker in a familiar language (English) or an unfamiliar language (Spanish or Icelandic). Attention to the mouth increased in monolingual subjects in response to an unfamiliar language condition but did not in bilingual subjects when the task required speech processing. In the absence of an explicit speech-processing task, subjects attended equally to the eyes and mouth in response to both familiar and unfamiliar languages. Overall, these results demonstrate that language familiarity modulates selective attention to the redundant audiovisual speech cues in a talker's mouth in adults. When our findings are considered together with similar findings from infants, they suggest that this attentional strategy emerges very early in life.
Collapse
Affiliation(s)
- Elan Barenholtz
- Department of Psychology/Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, United States.
| | - Lauren Mavica
- Department of Psychology, Florida Atlantic University, Boca Raton, FL, United States
| | - David J Lewkowicz
- Department of Communication Sciences and Disorders, Northeastern University, Boston, MA, United States
| |
Collapse
|
28
|
A link between individual differences in multisensory speech perception and eye movements. Atten Percept Psychophys 2015; 77:1333-41. [PMID: 25810157 DOI: 10.3758/s13414-014-0821-1] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The McGurk effect is an illusion in which visual speech information dramatically alters the perception of auditory speech. However, there is a high degree of individual variability in how frequently the illusion is perceived: some individuals almost always perceive the McGurk effect, while others rarely do. Another axis of individual variability is the pattern of eye movements make while viewing a talking face: some individuals often fixate the mouth of the talker, while others rarely do. Since the talker's mouth carries the visual speech necessary information to induce the McGurk effect, we hypothesized that individuals who frequently perceive the McGurk effect should spend more time fixating the talker's mouth. We used infrared eye tracking to study eye movements as 40 participants viewed audiovisual speech. Frequent perceivers of the McGurk effect were more likely to fixate the mouth of the talker, and there was a significant correlation between McGurk frequency and mouth looking time. The noisy encoding of disparity model of McGurk perception showed that individuals who frequently fixated the mouth had lower sensory noise and higher disparity thresholds than those who rarely fixated the mouth. Differences in eye movements when viewing the talker's face may be an important contributor to interindividual differences in multisensory speech perception.
Collapse
|
29
|
Ujiie Y, Asai T, Wakabayashi A. The relationship between level of autistic traits and local bias in the context of the McGurk effect. Front Psychol 2015; 6:891. [PMID: 26175705 PMCID: PMC4484977 DOI: 10.3389/fpsyg.2015.00891] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Accepted: 06/15/2015] [Indexed: 11/13/2022] Open
Abstract
The McGurk effect is a well-known illustration that demonstrates the influence of visual information on hearing in the context of speech perception. Some studies have reported that individuals with autism spectrum disorder (ASD) display abnormal processing of audio-visual speech integration, while other studies showed contradictory results. Based on the dimensional model of ASD, we administered two analog studies to examine the link between level of autistic traits, as assessed by the Autism Spectrum Quotient (AQ), and the McGurk effect among a sample of university students. In the first experiment, we found that autistic traits correlated negatively with fused (McGurk) responses. Then, we manipulated presentation types of visual stimuli to examine whether the local bias toward visual speech cues modulated individual differences in the McGurk effect. The presentation included four types of visual images, comprising no image, mouth only, mouth and eyes, and full face. The results revealed that global facial information facilitates the influence of visual speech cues on McGurk stimuli. Moreover, individual differences between groups with low and high levels of autistic traits appeared when the full-face visual speech cue with an incongruent voice condition was presented. These results suggest that individual differences in the McGurk effect might be due to a weak ability to process global facial information in individuals with high levels of autistic traits.
Collapse
Affiliation(s)
- Yuta Ujiie
- Information Processing and Computer Sciences, Graduate School of Advanced Integration Science, Chiba University Chiba, Japan ; Japan Society for the Promotion of Science Tokyo, Japan
| | - Tomohisa Asai
- NTT Communication Science Laboratories, NTT Corporation Kanagawa, Japan
| | | |
Collapse
|
30
|
Grossman RB, Steinhart E, Mitchell T, McIlvane W. "Look who's talking!" Gaze Patterns for Implicit and Explicit Audio-Visual Speech Synchrony Detection in Children With High-Functioning Autism. Autism Res 2015; 8:307-16. [PMID: 25620208 DOI: 10.1002/aur.1447] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 11/25/2014] [Indexed: 11/11/2022]
Abstract
Conversation requires integration of information from faces and voices to fully understand the speaker's message. To detect auditory-visual asynchrony of speech, listeners must integrate visual movements of the face, particularly the mouth, with auditory speech information. Individuals with autism spectrum disorder may be less successful at such multisensory integration, despite their demonstrated preference for looking at the mouth region of a speaker. We showed participants (individuals with and without high-functioning autism (HFA) aged 8-19) a split-screen video of two identical individuals speaking side by side. Only one of the speakers was in synchrony with the corresponding audio track and synchrony switched between the two speakers every few seconds. Participants were asked to watch the video without further instructions (implicit condition) or to specifically watch the in-synch speaker (explicit condition). We recorded which part of the screen and face their eyes targeted. Both groups looked at the in-synch video significantly more with explicit instructions. However, participants with HFA looked at the in-synch video less than typically developing (TD) peers and did not increase their gaze time as much as TD participants in the explicit task. Importantly, the HFA group looked significantly less at the mouth than their TD peers, and significantly more at non-face regions of the image. There were no between-group differences for eye-directed gaze. Overall, individuals with HFA spend less time looking at the crucially important mouth region of the face during auditory-visual speech integration, which is maladaptive gaze behavior for this type of task.
Collapse
Affiliation(s)
- Ruth B Grossman
- Emerson College, Department of Communication Sciences and Disorders, 120 Boylston Street, Boston, Massachusetts.,University of Massachusetts Medical School Shriver Center, 200 Trapelo Rd, Waltham, Massachusetts
| | - Erin Steinhart
- University of Massachusetts Medical School Shriver Center, 200 Trapelo Rd, Waltham, Massachusetts
| | - Teresa Mitchell
- University of Massachusetts Medical School Shriver Center, 200 Trapelo Rd, Waltham, Massachusetts
| | - William McIlvane
- University of Massachusetts Medical School Shriver Center, 200 Trapelo Rd, Waltham, Massachusetts
| |
Collapse
|
31
|
Woynaroski TG, Kwakye LD, Foss-Feig JH, Stevenson RA, Stone WL, Wallace MT. Multisensory speech perception in children with autism spectrum disorders. J Autism Dev Disord 2014; 43:2891-902. [PMID: 23624833 DOI: 10.1007/s10803-013-1836-5] [Citation(s) in RCA: 108] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
This study examined unisensory and multisensory speech perception in 8-17 year old children with autism spectrum disorders (ASD) and typically developing controls matched on chronological age, sex, and IQ. Consonant-vowel syllables were presented in visual only, auditory only, matched audiovisual, and mismatched audiovisual ("McGurk") conditions. Participants with ASD displayed deficits in visual only and matched audiovisual speech perception. Additionally, children with ASD reported a visual influence on heard speech in response to mismatched audiovisual syllables over a wider window of time relative to controls. Correlational analyses revealed associations between multisensory speech perception, communicative characteristics, and responses to sensory stimuli in ASD. Results suggest atypical speech perception is linked to broader behavioral characteristics of ASD.
Collapse
Affiliation(s)
- Tiffany G Woynaroski
- Department of Hearing and Speech Sciences, Vanderbilt University, 1211 Medical Center Drive, Nashville, TN, 37232, USA,
| | | | | | | | | | | |
Collapse
|
32
|
Kushnerenko E, Tomalski P, Ballieux H, Ribeiro H, Potton A, Axelsson EL, Murphy E, Moore DG. Brain responses to audiovisual speech mismatch in infants are associated with individual differences in looking behaviour. Eur J Neurosci 2013; 38:3363-9. [PMID: 23889202 DOI: 10.1111/ejn.12317] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Revised: 05/22/2013] [Accepted: 06/19/2013] [Indexed: 11/30/2022]
Abstract
Research on audiovisual speech integration has reported high levels of individual variability, especially among young infants. In the present study we tested the hypothesis that this variability results from individual differences in the maturation of audiovisual speech processing during infancy. A developmental shift in selective attention to audiovisual speech has been demonstrated between 6 and 9 months with an increase in the time spent looking to articulating mouths as compared to eyes (Lewkowicz & Hansen-Tift. (2012) Proc. Natl Acad. Sci. USA, 109, 1431-1436; Tomalski et al. (2012) Eur. J. Dev. Psychol., 1-14). In the present study we tested whether these changes in behavioural maturational level are associated with differences in brain responses to audiovisual speech across this age range. We measured high-density event-related potentials (ERPs) in response to videos of audiovisually matching and mismatched syllables /ba/ and /ga/, and subsequently examined visual scanning of the same stimuli with eye-tracking. There were no clear age-specific changes in ERPs, but the amplitude of audiovisual mismatch response (AVMMR) to the combination of visual /ba/ and auditory /ga/ was strongly negatively associated with looking time to the mouth in the same condition. These results have significant implications for our understanding of individual differences in neural signatures of audiovisual speech processing in infants, suggesting that they are not strictly related to chronological age but instead associated with the maturation of looking behaviour, and develop at individual rates in the second half of the first year of life.
Collapse
Affiliation(s)
- Elena Kushnerenko
- Institute for Research in Child Development, School of Psychology, University of East London, Water Lane, London, E15 4LZ, UK
| | | | | | | | | | | | | | | |
Collapse
|
33
|
van Wassenhove V. Speech through ears and eyes: interfacing the senses with the supramodal brain. Front Psychol 2013; 4:388. [PMID: 23874309 PMCID: PMC3709159 DOI: 10.3389/fpsyg.2013.00388] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Accepted: 06/10/2013] [Indexed: 12/02/2022] Open
Abstract
The comprehension of auditory-visual (AV) speech integration has greatly benefited from recent advances in neurosciences and multisensory research. AV speech integration raises numerous questions relevant to the computational rules needed for binding information (within and across sensory modalities), the representational format in which speech information is encoded in the brain (e.g., auditory vs. articulatory), or how AV speech ultimately interfaces with the linguistic system. The following non-exhaustive review provides a set of empirical findings and theoretical questions that have fed the original proposal for predictive coding in AV speech processing. More recently, predictive coding has pervaded many fields of inquiries and positively reinforced the need to refine the notion of internal models in the brain together with their implications for the interpretation of neural activity recorded with various neuroimaging techniques. However, it is argued here that the strength of predictive coding frameworks reside in the specificity of the generative internal models not in their generality; specifically, internal models come with a set of rules applied on particular representational formats themselves depending on the levels and the network structure at which predictive operations occur. As such, predictive coding in AV speech owes to specify the level(s) and the kinds of internal predictions that are necessary to account for the perceptual benefits or illusions observed in the field. Among those specifications, the actual content of a prediction comes first and foremost, followed by the representational granularity of that prediction in time. This review specifically presents a focused discussion on these issues.
Collapse
Affiliation(s)
- Virginie van Wassenhove
- Cognitive Neuroimaging Unit, Brain Dynamics, INSERM, U992 Gif/Yvette, France ; NeuroSpin Center, CEA, DSV/I2BM Gif/Yvette, France ; Cognitive Neuroimaging Unit, University Paris-Sud Gif/Yvette, France
| |
Collapse
|
34
|
Morris NL, Chaparro A, Downs D, Wood JM. Effects of simulated cataracts on speech intelligibility. Vision Res 2012; 66:49-54. [DOI: 10.1016/j.visres.2012.06.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Revised: 05/30/2012] [Accepted: 06/09/2012] [Indexed: 10/28/2022]
|
35
|
Abstract
Visual information augments our understanding of auditory speech. New evidence shows that infants' gaze fixations to the mouth and eye region shift predictably with changes in age and language familiarity.
Collapse
Affiliation(s)
- K G Munhall
- Departments of Psychology and Otolaryngology, Queen's University, 62 Arch Street, Kingston, Ontario K7L3N6, Canada.
| | | |
Collapse
|
36
|
Guiraud JA, Tomalski P, Kushnerenko E, Ribeiro H, Davies K, Charman T, Elsabbagh M, Johnson MH. Atypical audiovisual speech integration in infants at risk for autism. PLoS One 2012; 7:e36428. [PMID: 22615768 PMCID: PMC3352915 DOI: 10.1371/journal.pone.0036428] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2011] [Accepted: 04/02/2012] [Indexed: 01/17/2023] Open
Abstract
The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/− audio/ba/and the congruent visual/ba/− audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/− audio/ga/display compared with the congruent visual/ga/− audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16) = 17.153, p = 0.001). The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25) = 0.09, p = 0.767), in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41) = 4.466, p = 0.041). In some cases this reduced ability might lead to the poor communication skills characteristic of autism.
Collapse
Affiliation(s)
- Jeanne A Guiraud
- Centre for Brain and Cognitive Development, Department of Psychological Science, Birkbeck, University of London, London, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Stekelenburg JJ, Vroomen J. Electrophysiological evidence for a multisensory speech-specific mode of perception. Neuropsychologia 2012; 50:1425-31. [PMID: 22410413 DOI: 10.1016/j.neuropsychologia.2012.02.027] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Revised: 01/18/2012] [Accepted: 02/24/2012] [Indexed: 11/15/2022]
Abstract
We investigated whether the interpretation of auditory stimuli as speech or non-speech affects audiovisual (AV) speech integration at the neural level. Perceptually ambiguous sine-wave replicas (SWS) of natural speech were presented to listeners who were either in 'speech mode' or 'non-speech mode'. At the behavioral level, incongruent lipread information led to an illusory change of the sound only for listeners in speech mode. The neural correlates of this illusory change were examined in an audiovisual mismatch negativity (MMN) paradigm with SWS sounds. In an oddball sequence, 'standards' consisted of SWS/onso/coupled with lipread/onso/, and 'deviants' consisted of SWS/onso/coupled with lipread/omso/. The AV deviant induced a McGurk-MMN for listeners in speech mode, but not for listeners in non-speech mode. These results demonstrate that the illusory change in the sound by incongruent lipread information evoked an MMN which presumably takes place at a pre-attentive sensory processing stage.
Collapse
|
38
|
Bishop CW, Miller LM. Speech cues contribute to audiovisual spatial integration. PLoS One 2011; 6:e24016. [PMID: 21909378 PMCID: PMC3166076 DOI: 10.1371/journal.pone.0024016] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2011] [Accepted: 08/02/2011] [Indexed: 11/21/2022] Open
Abstract
Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral ‘what’ and dorsal ‘where’ pathways.
Collapse
Affiliation(s)
- Christopher W Bishop
- Center for Mind and Brain, University of California Davis, Davis, California, United States of America.
| | | |
Collapse
|
39
|
Searching for audiovisual correspondence in multiple speaker scenarios. Exp Brain Res 2011; 213:175-83. [DOI: 10.1007/s00221-011-2624-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Accepted: 03/04/2011] [Indexed: 10/18/2022]
|
40
|
Dickinson CM, Taylor J. The effect of simulated visual impairment on speech-reading ability. Ophthalmic Physiol Opt 2011; 31:249-57. [DOI: 10.1111/j.1475-1313.2010.00810.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
41
|
Buchan JN, Munhall KG. The Influence of Selective Attention to Auditory and Visual Speech on the Integration of Audiovisual Speech Information. Perception 2011; 40:1164-82. [DOI: 10.1068/p6939] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Abstract
Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.
Collapse
Affiliation(s)
| | - Kevin G Munhall
- Department of Otolaryngology, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
42
|
Abstract
The importance of visual cues in speech perception is illustrated by the McGurk effect, whereby a speaker's facial movements affect speech perception. The goal of the present study was to evaluate whether the McGurk effect is also observed for sung syllables. Participants heard and saw sung instances of the syllables /ba/ and /ga/ and then judged the syllable they perceived. Audio-visual stimuli were congruent or incongruent (e.g., auditory /ba/ presented with visual /ga/). The stimuli were presented as spoken, sung in an ascending and descending triad (C E G G E C), and sung in an ascending and descending triad that returned to a semitone above the tonic (C E G G E C#). Results revealed no differences in the proportion of fusion responses between spoken and sung conditions confirming that cross-modal phonemic information is integrated similarly in speech and song.
Collapse
|
43
|
Multistage audiovisual integration of speech: dissociating identification and detection. Exp Brain Res 2010; 208:447-57. [DOI: 10.1007/s00221-010-2495-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Accepted: 11/09/2010] [Indexed: 11/25/2022]
|
44
|
Perceptual congruency of audio-visual speech affects ventriloquism with bilateral visual stimuli. Psychon Bull Rev 2010; 18:123-8. [DOI: 10.3758/s13423-010-0027-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
45
|
Munhall KG, ten Hove MW, Brammer M, Paré M. Audiovisual integration of speech in a bistable illusion. Curr Biol 2009; 19:735-9. [PMID: 19345097 DOI: 10.1016/j.cub.2009.03.019] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Revised: 03/01/2009] [Accepted: 03/02/2009] [Indexed: 11/29/2022]
Abstract
Visible speech enhances the intelligibility of auditory speech when listening conditions are poor [1], and can modify the perception of otherwise perfectly audible utterances [2]. This audiovisual perception is our most natural form of communication and one of our most common multisensory phenomena. However, where and in what form the visual and auditory representations interact is still not completely understood. Although there are longstanding proposals that multisensory integration occurs relatively late in the speech-processing sequence [3], considerable neurophysiological evidence suggests that audiovisual interactions can occur in the brain stem and primary sensory cortices [4, 5]. A difficulty testing such hypotheses is that when the degree of integration is manipulated experimentally, the visual and/or auditory stimulus conditions are drastically modified [6, 7]; thus, the perceptual processing within a modality and the corresponding processing loads are affected [8]. Here, we used a bistable speech stimulus to examine the conditions under which there is a visual influence on auditory perception in speech. The results indicate that visual influences on auditory speech processing, at least for the McGurk illusion, necessitate the conscious perception of the visual speech gestures, thus supporting the hypothesis that multisensory speech integration is not completed in early processing stages.
Collapse
Affiliation(s)
- K G Munhall
- Department of Psychology, Queen's University, Kingston, ON K7L 3N6, Canada.
| | | | | | | |
Collapse
|
46
|
Buchan JN, Paré M, Munhall KG. The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception. Brain Res 2008; 1242:162-71. [PMID: 18621032 DOI: 10.1016/j.brainres.2008.06.083] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2008] [Revised: 05/16/2008] [Accepted: 06/14/2008] [Indexed: 10/21/2022]
Abstract
During face-to-face conversation the face provides auditory and visual linguistic information, and also conveys information about the identity of the speaker. This study investigated behavioral strategies involved in gathering visual information while watching talking faces. The effects of varying talker identity and varying the intelligibility of speech (by adding acoustic noise) on gaze behavior were measured with an eyetracker. Varying the intelligibility of the speech by adding noise had a noticeable effect on the location and duration of fixations. When noise was present subjects adopted a vantage point that was more centralized on the face by reducing the frequency of the fixations on the eyes and mouth and lengthening the duration of their gaze fixations on the nose and mouth. Varying talker identity resulted in a more modest change in gaze behavior that was modulated by the intelligibility of the speech. Although subjects generally used similar strategies to extract visual information in both talker variability conditions, when noise was absent there were more fixations on the mouth when viewing a different talker every trial as opposed to the same talker every trial. These findings provide a useful baseline for studies examining gaze behavior during audiovisual speech perception and perception of dynamic faces.
Collapse
Affiliation(s)
- Julie N Buchan
- Department of Psychology, Queen's University, Humphrey Hall, 62 Arch Street, Kingston, Ontario, Canada.
| | | | | |
Collapse
|
47
|
Everdell IT, Marsh HO, Yurick MD, Munhall KG, Paré M. Gaze behaviour in audiovisual speech perception: asymmetrical distribution of face-directed fixations. Perception 2008; 36:1535-45. [PMID: 18265836 DOI: 10.1068/p5852] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Speech perception under natural conditions entails integration of auditory and visual information. Understanding how visual and auditory speech information are integrated requires detailed descriptions of the nature and processing of visual speech information. To understand better the process of gathering visual information, we studied the distribution of face-directed fixations of humans performing an audiovisual speech perception task to characterise the degree of asymmetrical viewing and its relationship to speech intelligibility. Participants showed stronger gaze fixation asymmetries while viewing dynamic faces, compared to static faces or face-like objects, especially when gaze was directed to the talkers' eyes. Although speech perception accuracy was significantly enhanced by the viewing of congruent, dynamic faces, we found no correlation between task performance and gaze fixation asymmetry. Most participants preferentially fixated the right side of the faces and their preferences persisted while viewing horizontally mirrored stimuli, different talkers, or static faces. These results suggest that the asymmetrical distributions of gaze fixations reflect the participants' viewing preferences, rather than being a product of asymmetrical faces, but that this behavioural bias does not predict correct audiovisual speech perception.
Collapse
Affiliation(s)
- Ian T Everdell
- Biological Communication Centre, Queen's University, Kingston, ON K7L 3N6, Canada
| | | | | | | | | |
Collapse
|
48
|
Wilson A, Wilson A, Ten Hove MW, Paré M, Munhall KG. Loss of Central Vision and Audiovisual Speech Perception. VISUAL IMPAIRMENT RESEARCH 2008; 10:23-34. [PMID: 19440249 PMCID: PMC2680551 DOI: 10.1080/13882350802053731] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Communication impairments pose a major threat to an individual's quality of life. However, the impact of visual impairments on communication is not well understood, despite the important role that vision plays in the perception of speech. Here we present 2 experiments examining the impact of discrete central scotomas on speech perception. In the first experiment, 4 patients with central vision loss due to unilateral macular holes identified utterances with conflicting auditory-visual information, while simultaneously having their eye movements recorded. Each eye was tested individually. Three participants showed similar speech perception with both the impaired eye and the unaffected eye. For 1 participant, speech perception was disrupted by the scotoma because the participant did not shift gaze to avoid obscuring the talker's mouth with the scotoma. In the second experiment, 12 undergraduate students with gaze-contingent artificial scotomas (10 visual degrees in diameter) identified sentences in background noise. These larger scotomas disrupted speech perception, but some participants overcame this by adopting a gaze strategy whereby they shifted gaze to prevent obscuring important regions of the face such as the mouth. Participants who did not spontaneously adopt an adaptive gaze strategy did not learn to do so over the course of 5 days; however, participants who began with adaptive gaze strategies became more consistent in their gaze location. These findings confirm that peripheral vision is sufficient for perception of most visual information in speech, and suggest that training in gaze strategy may be worthwhile for individuals with communication deficits due to visual impairments.
Collapse
Affiliation(s)
- Amanda Wilson
- Department of Psychology and Queen's Biological Communication Centre, Queen's University, Kingston, Ontario, Canada
| | | | | | | | | |
Collapse
|
49
|
Ross LA, Saint-Amour D, Leavitt VM, Molholm S, Javitt DC, Foxe JJ. Impaired multisensory processing in schizophrenia: deficits in the visual enhancement of speech comprehension under noisy environmental conditions. Schizophr Res 2007; 97:173-83. [PMID: 17928202 DOI: 10.1016/j.schres.2007.08.008] [Citation(s) in RCA: 144] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Revised: 08/07/2007] [Accepted: 08/10/2007] [Indexed: 12/01/2022]
Abstract
BACKGROUND Viewing a speaker's articulatory movements substantially improves a listener's ability to understand spoken words, especially under noisy environmental conditions. In this study we investigated the ability of patients with schizophrenia to integrate visual and auditory speech. Our objective was to determine to what extent they experience benefit from visual articulation and to detail under what listening conditions they might show the greatest impairments. METHODS We assessed the ability to recognize auditory and audiovisual speech in different levels of noise in 18 patients with schizophrenia and compared their performance with that of 18 healthy volunteers. We used a large set of monosyllabic words as our stimuli in order to more closely approximate performance in everyday situations. RESULTS Patients with schizophrenia showed deficits in their ability to derive benefit from visual articulatory motion. This impairment was most pronounced at signal-to-noise levels where multisensory gain is known to be maximal in healthy control subjects. A surprising finding was that despite known early auditory sensory processing deficits and reports of impairments in speech processing in schizophrenia, patients' performance in unisensory auditory speech perception remained fully intact. CONCLUSIONS Thus, the results showed a specific deficit in multisensory speech processing in the absence of any measurable deficit in unisensory speech processing and suggest that sensory integration dysfunction may be an important and, to date, rather overlooked aspect of schizophrenia.
Collapse
Affiliation(s)
- Lars A Ross
- Program in Cognitive Neuroscience, Department of Psychology, The City College of City University of New York, 138th St. and Convent Avenue, New York, New York 10031, USA
| | | | | | | | | | | |
Collapse
|
50
|
Irwin JR, Whalen DH, Fowler CA. A sex difference in visual influence on heard speech. ACTA ACUST UNITED AC 2006; 68:582-92. [PMID: 16933423 DOI: 10.3758/bf03208760] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Reports of sex differences in language processing are inconsistent and are thought to vary by task type and difficulty. In two experiments, we investigated a sex difference in visual influence onheard speech (the McGurk effect). First, incongruent consonant-vowel stimuli were presented where the visual portion of the signal was brief (100 msec) or full (temporally equivalent to the auditory). Second, to determine whether men and women differed in their ability to extract visual speech information from these brief stimuli, the same stimuli were presented to new participants with an additional visual-only (lipread) condition. In both experiments, women showed a significantly greater visual influence on heard speech than did men for the brief visual stimuli. No sex differences for the full stimuli or in the ability to lipread were found. These findings indicate that the more challenging brief visual stimuli elicit sex differences in the processing of audiovisual speech.
Collapse
Affiliation(s)
- Julia R Irwin
- Haskins Laboratories, New Haven, Connecticut 06511, USA.
| | | | | |
Collapse
|