1
|
Magnotti JF, Lado A, Beauchamp MS. The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers. Front Neurosci 2024; 18:1421713. [PMID: 38988770 PMCID: PMC11233445 DOI: 10.3389/fnins.2024.1421713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/28/2024] [Indexed: 07/12/2024] Open
Abstract
In the McGurk effect, visual speech from the face of the talker alters the perception of auditory speech. The diversity of human languages has prompted many intercultural studies of the effect in both Western and non-Western cultures, including native Japanese speakers. Studies of large samples of native English speakers have shown that the McGurk effect is characterized by high variability in the susceptibility of different individuals to the illusion and in the strength of different experimental stimuli to induce the illusion. The noisy encoding of disparity (NED) model of the McGurk effect uses principles from Bayesian causal inference to account for this variability, separately estimating the susceptibility and sensory noise for each individual and the strength of each stimulus. To determine whether variation in McGurk perception is similar between Western and non-Western cultures, we applied the NED model to data collected from 80 native Japanese-speaking participants. Fifteen different McGurk stimuli that varied in syllable content (unvoiced auditory "pa" + visual "ka" or voiced auditory "ba" + visual "ga") were presented interleaved with audiovisual congruent stimuli. The McGurk effect was highly variable across stimuli and participants, with the percentage of illusory fusion responses ranging from 3 to 78% across stimuli and from 0 to 91% across participants. Despite this variability, the NED model accurately predicted perception, predicting fusion rates for individual stimuli with 2.1% error and for individual participants with 2.4% error. Stimuli containing the unvoiced pa/ka pairing evoked more fusion responses than the voiced ba/ga pairing. Model estimates of sensory noise were correlated with participant age, with greater sensory noise in older participants. The NED model of the McGurk effect offers a principled way to account for individual and stimulus differences when examining the McGurk effect in different cultures.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Anastasia Lado
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
2
|
Butcher N, Bennetts RJ, Sexton L, Barbanta A, Lander K. Eye movement differences when recognising and learning moving and static faces. Q J Exp Psychol (Hove) 2024:17470218241252145. [PMID: 38644390 DOI: 10.1177/17470218241252145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Seeing a face in motion can help subsequent face recognition. Several explanations have been proposed for this "motion advantage," but other factors that might play a role have received less attention. For example, facial movement might enhance recognition by attracting attention to the internal facial features, thereby facilitating identification. However, there is no direct evidence that motion increases attention to regions of the face that facilitate identification (i.e., internal features) compared with static faces. We tested this hypothesis by recording participants' eye movements while they completed the famous face recognition (Experiment 1, N = 32), and face-learning (Experiment 2, N = 60, Experiment 3, N = 68) tasks, with presentation style manipulated (moving or static). Across all three experiments, a motion advantage was found, and participants directed a higher proportion of fixations to the internal features (i.e., eyes, nose, and mouth) of moving faces versus static. Conversely, the proportion of fixations to the internal non-feature area (i.e., cheeks, forehead, chin) and external area (Experiment 3) was significantly reduced for moving compared with static faces (all ps < .05). Results suggest that during both familiar and unfamiliar face recognition, facial motion is associated with increased attention to internal facial features, but only during familiar face recognition is the magnitude of the motion advantage significantly related functionally to the proportion of fixations directed to the internal features.
Collapse
Affiliation(s)
- Natalie Butcher
- Department of Psychology, Teesside University, Middlesbrough, UK
| | | | - Laura Sexton
- Department of Psychology, Teesside University, Middlesbrough, UK
- School of Psychology, Faculty of Health Sciences and Wellbeing, University of Sunderland, Sunderland, UK
| | | | - Karen Lander
- Division of Psychology, Communication and Human Neuroscience, University of Manchester, Manchester, UK
| |
Collapse
|
3
|
Zheng H, Du Z, Wang S. Dynamic driving risk in highway tunnel groups based on pupillary oscillations. ACCIDENT; ANALYSIS AND PREVENTION 2024; 195:107414. [PMID: 38043212 DOI: 10.1016/j.aap.2023.107414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 10/27/2023] [Accepted: 11/26/2023] [Indexed: 12/05/2023]
Abstract
This study aims to understand the dynamic changes in driving risks in highway tunnel groups. Real-world driving experiments were conducted, collecting pupil area data to measure pupil size oscillations using the Percentage of Pupil Area Variable (PPAV) metric. The analysis focused on investigating relative pupil size fluctuations to explore trends in driving risk fluctuations within tunnel groups. The objective was to identify accident-prone areas and key factors influencing driving risks, providing insights for safety improvements. The findings revealed an overall "whipping effect" phenomenon in driving risk changes within tunnel groups. Differences were observed between interior tunnel areas and open sections, including adjacent, approach, and departure zones. Higher driving risks were associated with locations closer to the tail end of the tunnel group and shorter exit departure sections. Targeted safety improvement designs should consider fluctuation patterns in different directions, with attention to tunnels at the tail end. In open sections, increased travel distance and lengths of upstream and downstream tunnels raised driving risks, while longer open zones improved driving risks. Driving direction and sequence had minimal impact on risks. By integrating driver vision, tunnel characteristics, and the environment, this study identified high-risk areas and critical factors, providing guidance for monitoring and improving driving risks in tunnel groups. The findings have practical implications for the operation and safety management of tunnel groups.
Collapse
Affiliation(s)
- Haoran Zheng
- School of Transportation and Logistics Engineering, Wuhan University of Technology, 1178# Heping Road, Wuhan, Hubei, 430063, China; Department of the Built Environment, Eindhoven University of Technology, Groene Loper 3, 5612 AE, Eindhoven, Noord-Brabant, The Netherlands.
| | - Zhigang Du
- School of Transportation and Logistics Engineering, Wuhan University of Technology, 1178# Heping Road, Wuhan, Hubei, 430063, China.
| | - Shoushuo Wang
- School of Port and Shipping Management, Guangzhou Maritime University, 101#, Hongshan 3rd Road, Guangzhou, Guangdong, 510725, China.
| |
Collapse
|
4
|
Ahn E, Majumdar A, Lee T, Brang D. Evidence for a Causal Dissociation of the McGurk Effect and Congruent Audiovisual Speech Perception via TMS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568892. [PMID: 38077093 PMCID: PMC10705272 DOI: 10.1101/2023.11.27.568892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Congruent visual speech improves speech perception accuracy, particularly in noisy environments. Conversely, mismatched visual speech can alter what is heard, leading to an illusory percept known as the McGurk effect. This illusion has been widely used to study audiovisual speech integration, illustrating that auditory and visual cues are combined in the brain to generate a single coherent percept. While prior transcranial magnetic stimulation (TMS) and neuroimaging studies have identified the left posterior superior temporal sulcus (pSTS) as a causal region involved in the generation of the McGurk effect, it remains unclear whether this region is critical only for this illusion or also for the more general benefits of congruent visual speech (e.g., increased accuracy and faster reaction times). Indeed, recent correlative research suggests that the benefits of congruent visual speech and the McGurk effect reflect largely independent mechanisms. To better understand how these different features of audiovisual integration are causally generated by the left pSTS, we used single-pulse TMS to temporarily impair processing while subjects were presented with either incongruent (McGurk) or congruent audiovisual combinations. Consistent with past research, we observed that TMS to the left pSTS significantly reduced the strength of the McGurk effect. Importantly, however, left pSTS stimulation did not affect the positive benefits of congruent audiovisual speech (increased accuracy and faster reaction times), demonstrating a causal dissociation between the two processes. Our results are consistent with models proposing that the pSTS is but one of multiple critical areas supporting audiovisual speech interactions. Moreover, these data add to a growing body of evidence suggesting that the McGurk effect is an imperfect surrogate measure for more general and ecologically valid audiovisual speech behaviors.
Collapse
Affiliation(s)
- EunSeon Ahn
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Areti Majumdar
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Taraz Lee
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
5
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
6
|
Yoho SE, Barrett TS, Borrie SA. The Influence of Sensorineural Hearing Loss on the Relationship Between the Perception of Speech in Noise and Dysarthric Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:4025-4036. [PMID: 37652059 PMCID: PMC10713019 DOI: 10.1044/2023_jslhr-23-00115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/25/2023] [Accepted: 06/14/2023] [Indexed: 09/02/2023]
Abstract
PURPOSE The ability to understand speech under adverse listening conditions is highly variable across listeners. Despite this, studies have found that listeners with normal hearing display consistency in their ability to perceive speech across different types of degraded speech, suggesting that, for at least these listeners, global skills may be involved in navigating the ambiguity in speech signals. However, there are substantial differences in the perceptual challenges faced by listeners with normal and impaired hearing. This study examines whether listeners with sensorineural hearing loss demonstrate the same type of consistency as normal-hearing listeners when processing neurotypical (i.e., control) speech that has been degraded by external noise and speech that is neurologically degraded such as dysarthria. METHOD Listeners with normal hearing (n = 31) and listeners with sensorineural hearing loss (n = 36) completed an intelligibility task with neurotypical speech in noise and with dysarthric speech in quiet. RESULTS Findings were consistent with previous work demonstrating a relationship between the ability to perceive neurotypical speech in noise and dysarthric speech for listeners with normal hearing, albeit at a higher intelligibility level than previously observed. This relationship was also observed for listeners with hearing loss, although listeners with more severe hearing losses performed better with dysarthric speech than with neurotypical speech in noise. CONCLUSIONS This study demonstrated a high level of consistency in intelligibility performance for listeners across two different types of degraded speech, even when those listeners were further challenged by the presence of sensorineural hearing loss. Clinical implications for both listeners with hearing loss and their communication partners with dysarthria are discussed.
Collapse
Affiliation(s)
- Sarah E. Yoho
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
- Department of Speech and Hearing Science, The Ohio State University, Columbus
| | | | - Stephanie A. Borrie
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| |
Collapse
|
7
|
Fisher VL, Dean CL, Nave CS, Parkins EV, Kerkhoff WG, Kwakye LD. Increases in sensory noise predict attentional disruptions to audiovisual speech perception. Front Hum Neurosci 2023; 16:1027335. [PMID: 36684833 PMCID: PMC9846366 DOI: 10.3389/fnhum.2022.1027335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/05/2022] [Indexed: 01/06/2023] Open
Abstract
We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD.
Collapse
Affiliation(s)
- Victoria L. Fisher
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Yale University School of Medicine and the Connecticut Mental Health Center, New Haven, CT, United States
| | - Cassandra L. Dean
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Roche/Genentech Neurodevelopment & Psychiatry Teams Product Development, Neuroscience, South San Francisco, CA, United States
| | - Claire S. Nave
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States
| | - Emma V. Parkins
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Neuroscience Graduate Program, University of Cincinnati, Cincinnati, OH, United States
| | - Willa G. Kerkhoff
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Leslie D. Kwakye
- Department of Neuroscience, Oberlin College, Oberlin, OH, United States,*Correspondence: Leslie D. Kwakye,
| |
Collapse
|
8
|
Wilbiks JMP, Brown VA, Strand JF. Speech and non-speech measures of audiovisual integration are not correlated. Atten Percept Psychophys 2022; 84:1809-1819. [PMID: 35610409 PMCID: PMC10699539 DOI: 10.3758/s13414-022-02517-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2022] [Indexed: 11/08/2022]
Abstract
Many natural events generate both visual and auditory signals, and humans are remarkably adept at integrating information from those sources. However, individuals appear to differ markedly in their ability or propensity to combine what they hear with what they see. Individual differences in audiovisual integration have been established using a range of materials, including speech stimuli (seeing and hearing a talker) and simpler audiovisual stimuli (seeing flashes of light combined with tones). Although there are multiple tasks in the literature that are referred to as "measures of audiovisual integration," the tasks themselves differ widely with respect to both the type of stimuli used (speech versus non-speech) and the nature of the tasks themselves (e.g., some tasks use conflicting auditory and visual stimuli whereas others use congruent stimuli). It is not clear whether these varied tasks are actually measuring the same underlying construct: audiovisual integration. This study tested the relationships among four commonly-used measures of audiovisual integration, two of which use speech stimuli (susceptibility to the McGurk effect and a measure of audiovisual benefit), and two of which use non-speech stimuli (the sound-induced flash illusion and audiovisual integration capacity). We replicated previous work showing large individual differences in each measure but found no significant correlations among any of the measures. These results suggest that tasks that are commonly referred to as measures of audiovisual integration may be tapping into different parts of the same process or different constructs entirely.
Collapse
Affiliation(s)
| | - Violet A Brown
- Department of Psychological & Brain Sciences, Washington University in St. Louis, Saint Louis, MO, USA
| | - Julia F Strand
- Department of Psychology, Carleton College, Northfield, MN, USA
| |
Collapse
|
9
|
Jessica Tan SH, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: The Relationship between Cortical Tracking of Continuous Auditory-Visual Speech and Gaze Behaviour in Infants, Children and Adults. Neuroimage 2022; 256:119217. [PMID: 35436614 DOI: 10.1016/j.neuroimage.2022.119217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 11/24/2022] Open
Abstract
An auditory-visual speech benefit, the benefit that visual speech cues bring to auditory speech perception, is experienced from early on in infancy and continues to be experienced to an increasing degree with age. While there is both behavioural and neurophysiological evidence for children and adults, only behavioural evidence exists for infants - as no neurophysiological study has provided a comprehensive examination of the auditory-visual speech benefit in infants. It is also surprising that most studies on auditory-visual speech benefit do not concurrently report looking behaviour especially since the auditory-visual speech benefit rests on the assumption that listeners attend to a speaker's talking face and that there are meaningful individual differences in looking behaviour. To address these gaps, we simultaneously recorded electroencephalographic (EEG) and eye-tracking data of 5-month-olds, 4-year-olds and adults as they were presented with a speaker in auditory-only (AO), visual-only (VO), and auditory-visual (AV) modes. Cortical tracking analyses that involved forward encoding models of the speech envelope revealed that there was an auditory-visual speech benefit [i.e., AV > (A+V)], evident in 5-month-olds and adults but not 4-year-olds. Examination of cortical tracking accuracy in relation to looking behaviour, showed that infants' relative attention to the speaker's mouth (vs. eyes) was positively correlated with cortical tracking accuracy of VO speech, whereas adults' attention to the display overall was negatively correlated with cortical tracking accuracy of VO speech. This study provides the first neurophysiological evidence of auditory-visual speech benefit in infants and our results suggest ways in which current models of speech processing can be fine-tuned.
Collapse
Affiliation(s)
- S H Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University.
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language; IKERBASQUE, Basque Foundation for Science
| | | | - Michael J Crosse
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University
| |
Collapse
|
10
|
Åsberg Johnels J, Hadjikhani N, Sundqvist M, Galazka MA. Face Processing in School Children with Dyslexia: Neuropsychological and Eye-tracking Findings. Dev Neuropsychol 2022; 47:78-92. [PMID: 35148650 DOI: 10.1080/87565641.2022.2034828] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Dyslexia is a neurodevelopmental difficulty affecting reading, but recent data in adults suggest that difficulties also extend to face processing. Here, we tested face processing in school children with and without dyslexia, using eye-tracking and neuropsychological tests. Children with dyslexia didn't differ significantly from controls in face gaze patterns, face memory, or face identification accuracy. However, they were slower and more heterogeneous, with larger within-group variance than controls. Increased gaze patterns toward the eyes were associated with better face memory in controls. We discuss the possible role of experiential factors in prior research linking dyslexia and face processing differences.
Collapse
Affiliation(s)
- Jakob Åsberg Johnels
- Gillberg Neuropsychiatry Center, Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden.,Section of Speech and Language Pathology, Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden
| | - Nouchine Hadjikhani
- Gillberg Neuropsychiatry Center, Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden.,Harvard Medical School/MGH/MIT, Athinoula A. Martinos Center for Biomedical Imaging, Boston, Massachusetts, USA
| | - Maria Sundqvist
- Department of Education and Special Education, University of Gothenburg, Gothenburg, Sweden
| | - Martyna A Galazka
- Gillberg Neuropsychiatry Center, Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
11
|
Event-related potentials reveal early visual-tactile integration in the deaf. PSIHOLOGIJA 2022. [DOI: 10.2298/psi210407003l] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
This study examined visual-tactile perceptual integration in deaf and normal hearing individuals. Participants were presented with photos of faces or pictures of an oval in either a visual mode or a visual-tactile mode in a recognition learning task. Event-related potentials (ERPs) were recorded when participants recognized real faces and pictures of ovals in learning stage. Results from the parietal-occipital region showed that photos of faces accompanied with vibration elicited more positive-going ERP responses than photos of faces without vibration as indicated in the components of P1 and N170 in both deaf and hearing individuals. However, pictures of ovals accompanied with vibration produced more positive-going ERP responses than pictures of ovals without vibration in N170, which was only found in deaf individuals. A reversed pattern was shown in the temporal region indicating that real faces with vibration elicited less positive ERPs than photos of faces without vibration in both N170 and N300 for deaf, but such pattern did not appear in N170 and N300 for normal hearing. The results suggest that multisensory integration across the visual and tactile modality involves more fundamental perceptual regions than auditory regions. Moreover, auditory deprivation played an essential role at the perceptual encoding stage of the multisensory integration.
Collapse
|
12
|
Wahn B, Schmitz L, Kingstone A, Böckler-Raettig A. When eyes beat lips: speaker gaze affects audiovisual integration in the McGurk illusion. PSYCHOLOGICAL RESEARCH 2021; 86:1930-1943. [PMID: 34854983 PMCID: PMC9363401 DOI: 10.1007/s00426-021-01618-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 11/10/2021] [Indexed: 11/26/2022]
Abstract
Eye contact is a dynamic social signal that captures attention and plays a critical role in human communication. In particular, direct gaze often accompanies communicative acts in an ostensive function: a speaker directs her gaze towards the addressee to highlight the fact that this message is being intentionally communicated to her. The addressee, in turn, integrates the speaker’s auditory and visual speech signals (i.e., her vocal sounds and lip movements) into a unitary percept. It is an open question whether the speaker’s gaze affects how the addressee integrates the speaker’s multisensory speech signals. We investigated this question using the classic McGurk illusion, an illusory percept created by presenting mismatching auditory (vocal sounds) and visual information (speaker’s lip movements). Specifically, we manipulated whether the speaker (a) moved his eyelids up/down (i.e., open/closed his eyes) prior to speaking or did not show any eye motion, and (b) spoke with open or closed eyes. When the speaker’s eyes moved (i.e., opened or closed) before an utterance, and when the speaker spoke with closed eyes, the McGurk illusion was weakened (i.e., addressees reported significantly fewer illusory percepts). In line with previous research, this suggests that motion (opening or closing), as well as the closed state of the speaker’s eyes, captured addressees’ attention, thereby reducing the influence of the speaker’s lip movements on the addressees’ audiovisual integration process. Our findings reaffirm the power of speaker gaze to guide attention, showing that its dynamics can modulate low-level processes such as the integration of multisensory speech signals.
Collapse
Affiliation(s)
- Basil Wahn
- Department of Psychology, Leibniz Universität Hannover, Hannover, Germany.
| | - Laura Schmitz
- Institute of Sports Science, Leibniz Universität Hannover, Hannover, Germany
| | - Alan Kingstone
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | | |
Collapse
|
13
|
Banks B, Gowen E, Munro KJ, Adank P. Eye Gaze and Perceptual Adaptation to Audiovisual Degraded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3432-3445. [PMID: 34463528 DOI: 10.1044/2021_jslhr-21-00106] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose Visual cues from a speaker's face may benefit perceptual adaptation to degraded speech, but current evidence is limited. We aimed to replicate results from previous studies to establish the extent to which visual speech cues can lead to greater adaptation over time, extending existing results to a real-time adaptation paradigm (i.e., without a separate training period). A second aim was to investigate whether eye gaze patterns toward the speaker's mouth were related to better perception, hypothesizing that listeners who looked more at the speaker's mouth would show greater adaptation. Method A group of listeners (n = 30) was presented with 90 noise-vocoded sentences in audiovisual format, whereas a control group (n = 29) was presented with the audio signal only. Recognition accuracy was measured throughout and eye tracking was used to measure fixations toward the speaker's eyes and mouth in the audiovisual group. Results Previous studies were partially replicated: The audiovisual group had better recognition throughout and adapted slightly more rapidly, but both groups showed an equal amount of improvement overall. Longer fixations on the speaker's mouth in the audiovisual group were related to better overall accuracy. An exploratory analysis further demonstrated that the duration of fixations to the speaker's mouth decreased over time. Conclusions The results suggest that visual cues may not benefit adaptation to degraded speech as much as previously thought. Longer fixations on a speaker's mouth may play a role in successfully decoding visual speech cues; however, this will need to be confirmed in future research to fully understand how patterns of eye gaze are related to audiovisual speech recognition. All materials, data, and code are available at https://osf.io/2wqkf/.
Collapse
Affiliation(s)
- Briony Banks
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Emma Gowen
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
- Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
14
|
Ujiie Y, Takahashi K. Own-race faces promote integrated audiovisual speech information. Q J Exp Psychol (Hove) 2021; 75:924-935. [PMID: 34427494 DOI: 10.1177/17470218211044480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The other-race effect indicates a perceptual advantage when processing own-race faces. This effect has been demonstrated in individuals' recognition of facial identity and emotional expressions. However, it remains unclear whether the other-race effect also exists in multisensory domains. We conducted two experiments to provide evidence for the other-race effect in facial speech recognition, using the McGurk effect. Experiment 1 tested this issue among East Asian adults, examining the magnitude of the McGurk effect during stimuli using speakers from two different races (own-race vs. other-race). We found that own-race faces induced a stronger McGurk effect than other-race faces. Experiment 2 indicated that the other-race effect was not simply due to different levels of attention being paid to the mouths of own- and other-race speakers. Our findings demonstrated that own-race faces enhance the weight of visual input during audiovisual speech perception, and they provide evidence of the own-race effect in the audiovisual interaction for speech perception in adults.
Collapse
Affiliation(s)
- Yuta Ujiie
- Research Organization of Open Innovation and Collaboration, Ritsumeikan University, Ibaraki, Japan.,Graduate School of Psychology, Chukyo University, Nagoya-shi, Japan.,Japan Society for the Promotion of Science, Tokyo, Japan.,Research and Development Initiative, Chuo University, Tokyo, Japan
| | - Kohske Takahashi
- School of Psychology, Chukyo University, Nagoya-shi, Japan.,College of Comprehensive Psychology, Ritsumeikan University, Ibaraki, Japan
| |
Collapse
|
15
|
Diaz MT, Yalcinbas E. The neural bases of multimodal sensory integration in older adults. INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT 2021; 45:409-417. [PMID: 34650316 DOI: 10.1177/0165025420979362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although hearing often declines with age, prior research has shown that older adults may benefit from multisensory input to a greater extent when compared to younger adults, a concept known as inverse effectiveness. While there is behavioral evidence in support of this phenomenon, less is known about its neural basis. The present fMRI study examined how older and younger adults processed multimodal auditory-visual (AV) phonemic stimuli which were either congruent or incongruent across modalities. Incongruent AV pairs were designed to elicit the McGurk effect. Behaviorally, reaction times were significantly faster during congruent trials compared to incongruent trials for both age groups, and overall older adults responded more slowly. The interaction was not significant suggesting that older adults processed the AV stimuli similarly to younger adults. Although there were minimal behavioral differences, age-related differences in functional activation were identified: Younger adults elicited greater activation than older adults in primary sensory regions including superior temporal gyrus, the calcarine fissure, and left post-central gyrus. In contrast, older adults elicited greater activation than younger adults in dorsal frontal regions including middle and superior frontal gyri, as well as dorsal parietal regions. These data suggest that while there is age-related stability in behavioral sensitivity to multimodal stimuli, the neural bases for this effect differed between older and younger adults. Our results demonstrated that older adults underrecruited primary sensory cortices and had increased recruitment of regions involved in executive function, attention, and monitoring processes, which may reflect an attempt to compensate.
Collapse
Affiliation(s)
- Michele T Diaz
- Department of Psychology, The Pennsylvania State University
| | - Ege Yalcinbas
- Neurosciences Department, University of California, San Diego
| |
Collapse
|
16
|
Feng S, Lu H, Wang Q, Li T, Fang J, Chen L, Yi L. Face-viewing patterns predict audiovisual speech integration in autistic children. Autism Res 2021; 14:2592-2602. [PMID: 34415113 DOI: 10.1002/aur.2598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 07/06/2021] [Accepted: 07/30/2021] [Indexed: 11/10/2022]
Abstract
Autistic children show audiovisual speech integration deficits, though the underlying mechanisms remain unclear. The present study examined how audiovisual speech integration deficits in autistic children could be affected by their looking patterns. We measured audiovisual speech integration in 26 autistic children and 26 typically developing (TD) children (4- to 7-year-old) employing the McGurk task (a videotaped speaker uttering phonemes with her eyes open or closed) and tracked their eye movements. We found that, compared with TD children, autistic children showed weaker audiovisual speech integration (i.e., the McGurk effect) in the open-eyes condition and similar audiovisual speech integration in the closed-eyes condition. Autistic children viewed the speaker's mouth less in non-McGurk trials than in McGurk trials in both conditions. Importantly, autistic children's weaker audiovisual speech integration could be predicted by their reduced mouth-looking time. The present study indicated that atypical face-viewing patterns could serve as one of the cognitive mechanisms of audiovisual speech integration deficits in autistic children. LAY SUMMARY: McGurk effect occurs when the visual part of a phoneme (e.g., "ga") and the auditory part of another phoneme (e.g., "ba") uttered by a speaker were integrated into a fused perception (e.g., "da"). The present study examined how McGurk effect in autistic children could be affected by their looking patterns for the speaker's face. We found that less looking time for the speaker's mouth in autistic children could predict weaker McGurk effect. As McGurk effect manifests audiovisual speech integration, our findings imply that we could improve audiovisual speech integration in autistic children by directing them to look at the speaker's mouth in future intervention.
Collapse
Affiliation(s)
- Shuyuan Feng
- Institute for Applied Linguistics, School of Foreign Languages, Central South University, Changsha, Hunan, China.,School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Haoyang Lu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Qiandong Wang
- Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Tianbi Li
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Jing Fang
- Qingdao Autism Research Institute, Qingdao, China
| | - Lihan Chen
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Li Yi
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China.,IDG/McGovern Institute for Brain Research at PKU, Peking University, Beijing, China
| |
Collapse
|
17
|
The other-race effect on the McGurk effect in infancy. Atten Percept Psychophys 2021; 83:2924-2936. [PMID: 34386882 PMCID: PMC8460584 DOI: 10.3758/s13414-021-02342-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2021] [Indexed: 11/30/2022]
Abstract
This study investigated the difference in the McGurk effect between own-race-face and other-race-face stimuli among Japanese infants from 5 to 9 months of age. The McGurk effect results from infants using information from a speaker’s face in audiovisual speech integration. We hypothesized that the McGurk effect varies with the speaker’s race because of the other-race effect, which indicates an advantage for own-race faces in our face processing system. Experiment 1 demonstrated the other-race effect on audiovisual speech integration such that the infants ages 5–6 months and 8–9 months are likely to perceive the McGurk effect when observing an own-race-face speaker, but not when observing an other-race-face speaker. Experiment 2 found the other-race effect on audiovisual speech integration regardless of irrelevant speech identity cues. Experiment 3 confirmed the infants’ ability to differentiate two auditory syllables. These results showed that infants are likely to integrate voice with an own-race-face, but not with an other-race-face. This implies the role of experiences with own-race-faces in the development of audiovisual speech integration. Our findings also contribute to the discussion of whether perceptual narrowing is a modality-general, pan-sensory process.
Collapse
|
18
|
Han NX, Chakravarthula PN, Eckstein MP. Peripheral facial features guiding eye movements and reducing fixational variability. J Vis 2021; 21:7. [PMID: 34347018 PMCID: PMC8340657 DOI: 10.1167/jov.21.8.7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Face processing is a fast and efficient process due to its evolutionary and social importance. A majority of people direct their first eye movement to a featureless point just below the eyes that maximizes accuracy in recognizing a person's identity and gender. Yet, the exact properties or features of the face that guide the first eye movements and reduce fixational variability are unknown. Here, we manipulated the presence of the facial features and the spatial configuration of features to investigate their effect on the location and variability of first and second fixations to peripherally presented faces. Our results showed that observers can utilize the face outline, individual facial features, and feature spatial configuration to guide the first eye movements to their preferred point of fixation. The eyes have a preferential role in guiding the first eye movements and reducing fixation variability. Eliminating the eyes or altering their position had the greatest influence on the location and variability of fixations and resulted in the largest detriment to face identification performance. The other internal features (nose and mouth) also contribute to reducing fixation variability. A subsequent experiment measuring detection of single features showed that the eyes have the highest detectability (relative to other features) in the visual periphery providing a strong sensory signal to guide the oculomotor system. Together, the results suggest a flexible multiple-cue approach that might be a robust solution to cope with how the varying eccentricities in the real world influence the ability to resolve individual feature properties and the preferential role of the eyes.
Collapse
Affiliation(s)
- Nicole X Han
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA.,
| | - Puneeth N Chakravarthula
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA.,
| | - Miguel P Eckstein
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, USA.,
| |
Collapse
|
19
|
Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect. Atten Percept Psychophys 2020; 82:3544-3557. [PMID: 32533526 PMCID: PMC7788022 DOI: 10.3758/s13414-020-02042-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Seeing a talker’s face can aid audiovisual (AV) integration when speech is presented in noise. However, few studies have simultaneously manipulated auditory and visual degradation. We aimed to establish how degrading the auditory and visual signal affected AV integration. Where people look on the face in this context is also of interest; Buchan, Paré and Munhall (Brain Research, 1242, 162–171, 2008) found fixations on the mouth increased in the presence of auditory noise whilst Wilson, Alsius, Paré and Munhall (Journal of Speech, Language, and Hearing Research, 59(4), 601–615, 2016) found mouth fixations decreased with decreasing visual resolution. In Condition 1, participants listened to clear speech, and in Condition 2, participants listened to vocoded speech designed to simulate the information provided by a cochlear implant. Speech was presented in three levels of auditory noise and three levels of visual blurring. Adding noise to the auditory signal increased McGurk responses, while blurring the visual signal decreased McGurk responses. Participants fixated the mouth more on trials when the McGurk effect was perceived. Adding auditory noise led to people fixating the mouth more, while visual degradation led to people fixating the mouth less. Combined, the results suggest that modality preference and where people look during AV integration of incongruent syllables varies according to the quality of information available.
Collapse
|
20
|
A value-driven McGurk effect: Value-associated faces enhance the influence of visual information on audiovisual speech perception and its eye movement pattern. Atten Percept Psychophys 2020; 82:1928-1941. [PMID: 31898072 DOI: 10.3758/s13414-019-01918-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This study investigates whether and how value-associated faces affect audiovisual speech perception and its eye movement pattern. Participants were asked to learn to associate particular faces with or without monetary reward in the training phase, and, in the subsequent test phase, to identify syllables that the talkers had said in video clips in which the talkers' faces had or had not been associated with reward. The syllables were either congruent or incongruent with the talkers' mouth movements. Crucially, in some cases, the incongruent syllables could elicit the McGurk effect. Results showed that the McGurk effect occurred more often for reward-associated faces than for non-reward-associated faces. Moreover, the signal detection analysis revealed that participants had lower criterion and higher discriminability for reward-associated faces than for non-reward-associated faces. Surprisingly, eye movement data showed that participants spent more time looking at and fixated more often on the extraoral (nose/cheek) area for reward-associated faces than for non-reward-associated faces, while the opposite pattern was observed on the oral (mouth) area. The correlation analysis demonstrated that, over participants, the more they looked at the extraoral area in the training phase because of reward, the larger the increase of McGurk proportion (and the less they looked at the oral area) in the test phase. These findings not only demonstrate that value-associated faces enhance the influence of visual information on audiovisual speech perception but also highlight the importance of the extraoral facial area in the value-driven McGurk effect.
Collapse
|
21
|
Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020; 133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]
Abstract
The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.
Collapse
|
22
|
Jiang J, von Kriegstein K, Jiang J. Brain mechanisms of eye contact during verbal communication predict autistic traits in neurotypical individuals. Sci Rep 2020; 10:14602. [PMID: 32884087 PMCID: PMC7471895 DOI: 10.1038/s41598-020-71547-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 08/12/2020] [Indexed: 11/16/2022] Open
Abstract
Atypical eye contact in communication is a common characteristic in autism spectrum disorders. Autistic traits vary along a continuum extending into the neurotypical population. The relation between autistic traits and brain mechanisms underlying spontaneous eye contact during verbal communication remains unexplored. Here, we used simultaneous functional magnetic resonance imaging and eye tracking to investigate this relation in neurotypical people within a naturalistic verbal context. Using multiple regression analyses, we found that brain response in the posterior superior temporal sulcus (pSTS) and its connectivity with the fusiform face area (FFA) during eye contact with a speaker predicted the level of autistic traits measured by Autism-spectrum Quotient (AQ). Further analyses for different AQ subclusters revealed that these two predictors were negatively associated with attention to detail. The relation between FFA–pSTS connectivity and the attention to detail ability was mediated by individuals’ looking preferences for speaker’s eyes. This study identified the role of an individual eye contact pattern in the relation between brain mechanisms underlying natural eye contact during verbal communication and autistic traits in neurotypical people. The findings may help to increase our understanding of the mechanisms of atypical eye contact behavior during natural communication.
Collapse
Affiliation(s)
- Jing Jiang
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, 94305, USA. .,Max Planck Institute for Human Cognitive and Brain Sciences, 04103, Leipzig, Germany. .,Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, 10117, Berlin, Germany. .,Institute of Psychology, Humboldt-Universität zu Berlin, 12489, Berlin, Germany.
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, 04103, Leipzig, Germany.,Institute of Psychology, Humboldt-Universität zu Berlin, 12489, Berlin, Germany.,Faculty of Psychology, Technische Universität Dresden, 01187, Dresden, Germany
| | - Jiefeng Jiang
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, 52242, USA
| |
Collapse
|
23
|
Ujiie Y, Kanazawa S, Yamaguchi MK. The Other-Race-Effect on Audiovisual Speech Integration in Infants: A NIRS Study. Front Psychol 2020; 11:971. [PMID: 32499746 PMCID: PMC7243679 DOI: 10.3389/fpsyg.2020.00971] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 04/20/2020] [Indexed: 11/21/2022] Open
Abstract
Previous studies have revealed perceptual narrowing for the own-race-face in face discrimination, but this phenomenon is poorly understood in face and voice integration. We focused on infants' brain responses to the McGurk effect to examine whether the other-race effect occurs in the activation patterns. In Experiment 1, we conducted fNIRS measurements to find the presence of a mapping of the McGurk effect in Japanese 8- to 9-month-old infants and to examine the difference between the activation patterns in response to own-race-face and other-race-face stimuli. We used two race-face conditions, own-race-face (East Asian) and other-race-face (Caucasian), each of which contained audiovisual-matched and McGurk-type stimuli. While the infants (N = 34) were observing each speech stimulus for each race, we measured cerebral hemoglobin concentrations in bilateral temporal brain regions. The results showed that in the own-race-face condition, audiovisual-matched stimuli induced the activation of the left temporal region, and the McGurk stimuli induced the activation of the bilateral temporal regions. No significant activations were found in the other-race-face condition. These results mean that the McGurk effect occurred only in the own-race-face condition. In Experiment 2, we used a familiarization/novelty preference procedure to confirm that the infants (N = 28) could perceive the McGurk effect in the own-race-face condition but not that of the other-race-face. The behavioral data supported the results of the fNIRS data, implying the presence of narrowing for the own-race face in the McGurk effect. These results suggest that narrowing of the McGurk effect may be involved in the development of relatively high-order processing, such as face-to-face communication with people surrounding the infant. We discuss the hypothesis that perceptual narrowing is a modality-general, pan-sensory process.
Collapse
Affiliation(s)
- Yuta Ujiie
- Graduate School of Psychology, Chukyo University, Aichi, Japan
- Research and Development Initiative, Chuo University, Tokyo, Japan
- Japan Society for the Promotion of Science, Tokyo, Japan
| | - So Kanazawa
- Department of Psychology, Japan Women’s University, Kawasaki, Japan
| | | |
Collapse
|
24
|
Wegner-Clemens K, Rennig J, Beauchamp MS. A relationship between Autism-Spectrum Quotient and face viewing behavior in 98 participants. PLoS One 2020; 15:e0230866. [PMID: 32352984 PMCID: PMC7192493 DOI: 10.1371/journal.pone.0230866] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 03/10/2020] [Indexed: 01/18/2023] Open
Abstract
Faces are one of the most important stimuli that we encounter, but humans vary dramatically in their behavior when viewing a face: some individuals preferentially fixate the eyes, others fixate the mouth, and still others show an intermediate pattern. The determinants of these large individual differences are unknown. However, individuals with Autism Spectrum Disorder (ASD) spend less time fixating the eyes of a viewed face than controls, suggesting the hypothesis that autistic traits in healthy adults might explain individual differences in face viewing behavior. Autistic traits were measured in 98 healthy adults recruited from an academic setting using the Autism-Spectrum Quotient, a validated 50-statement questionnaire. Fixations were measured using a video-based eye tracker while participants viewed two different types of audiovisual movies: short videos of talker speaking single syllables and longer videos of talkers speaking sentences in a social context. For both types of movies, there was a positive correlation between Autism-Spectrum Quotient score and percent of time fixating the lower half of the face that explained from 4% to 10% of the variance in individual face viewing behavior. This effect suggests that in healthy adults, autistic traits are one of many factors that contribute to individual differences in face viewing behavior.
Collapse
Affiliation(s)
- Kira Wegner-Clemens
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, Texas, United States of America
| | - Johannes Rennig
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, Texas, United States of America
| | - Michael S. Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|
25
|
Šabić E, Henning D, Myüz H, Morrow A, Hout MC, MacDonald JA. Examining the Role of Eye Movements During Conversational Listening in Noise. Front Psychol 2020; 11:200. [PMID: 32116975 PMCID: PMC7033431 DOI: 10.3389/fpsyg.2020.00200] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 01/28/2020] [Indexed: 12/02/2022] Open
Abstract
Speech comprehension is often thought of as an entirely auditory process, but both normal hearing and hearing-impaired individuals sometimes use visual attention to disambiguate speech, particularly when it is difficult to hear. Many studies have investigated how visual attention (or the lack thereof) impacts the perception of simple speech sounds such as isolated consonants, but there is a gap in the literature concerning visual attention during natural speech comprehension. This issue needs to be addressed, as individuals process sounds and words in everyday speech differently than when they are separated into individual elements with no competing sound sources or noise. Moreover, further research is needed to explore patterns of eye movements during speech comprehension – especially in the presence of noise – as such an investigation would allow us to better understand how people strategically use visual information while processing speech. To this end, we conducted an experiment to track eye-gaze behavior during a series of listening tasks as a function of the number of speakers, background noise intensity, and the presence or absence of simulated hearing impairment. Our specific aims were to discover how individuals might adapt their oculomotor behavior to compensate for the difficulty of the listening scenario, such as when listening in noisy environments or experiencing simulated hearing loss. Speech comprehension difficulty was manipulated by simulating hearing loss and varying background noise intensity. Results showed that eye movements were affected by the number of speakers, simulated hearing impairment, and the presence of noise. Further, findings showed that differing levels of signal-to-noise ratio (SNR) led to changes in eye-gaze behavior. Most notably, we found that the addition of visual information (i.e. videos vs. auditory information only) led to enhanced speech comprehension – highlighting the strategic usage of visual information during this process.
Collapse
Affiliation(s)
- Edin Šabić
- Hearing Enhancement and Augmented Reality Lab, Department of Psychology, New Mexico State University, Las Cruces, NM, United States
| | - Daniel Henning
- Hearing Enhancement and Augmented Reality Lab, Department of Psychology, New Mexico State University, Las Cruces, NM, United States
| | - Hunter Myüz
- Hearing Enhancement and Augmented Reality Lab, Department of Psychology, New Mexico State University, Las Cruces, NM, United States
| | - Audrey Morrow
- Hearing Enhancement and Augmented Reality Lab, Department of Psychology, New Mexico State University, Las Cruces, NM, United States
| | - Michael C Hout
- Hearing Enhancement and Augmented Reality Lab, Department of Psychology, New Mexico State University, Las Cruces, NM, United States
| | - Justin A MacDonald
- Hearing Enhancement and Augmented Reality Lab, Department of Psychology, New Mexico State University, Las Cruces, NM, United States
| |
Collapse
|
26
|
Rennig J, Wegner-Clemens K, Beauchamp MS. Face viewing behavior predicts multisensory gain during speech perception. Psychon Bull Rev 2020; 27:70-77. [PMID: 31845209 PMCID: PMC7004844 DOI: 10.3758/s13423-019-01665-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Visual information from the face of an interlocutor complements auditory information from their voice, enhancing intelligibility. However, there are large individual differences in the ability to comprehend noisy audiovisual speech. Another axis of individual variability is the extent to which humans fixate the mouth or the eyes of a viewed face. We speculated that across a lifetime of face viewing, individuals who prefer to fixate the mouth of a viewed face might accumulate stronger associations between visual and auditory speech, resulting in improved comprehension of noisy audiovisual speech. To test this idea, we assessed interindividual variability in two tasks. Participants (n = 102) varied greatly in their ability to understand noisy audiovisual sentences (accuracy from 2-58%) and in the time they spent fixating the mouth of a talker enunciating clear audiovisual syllables (3-98% of total time). These two variables were positively correlated: a 10% increase in time spent fixating the mouth equated to a 5.6% increase in multisensory gain. This finding demonstrates an unexpected link, mediated by histories of visual exposure, between two fundamental human abilities: processing faces and understanding speech.
Collapse
Affiliation(s)
- Johannes Rennig
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, 1 Baylor Plaza Suite S104, Houston, TX, 77030, USA
| | - Kira Wegner-Clemens
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, 1 Baylor Plaza Suite S104, Houston, TX, 77030, USA
| | - Michael S Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, 1 Baylor Plaza Suite S104, Houston, TX, 77030, USA.
| |
Collapse
|
27
|
Wegner-Clemens K, Rennig J, Magnotti JF, Beauchamp MS. Using principal component analysis to characterize eye movement fixation patterns during face viewing. J Vis 2019; 19:2. [PMID: 31689715 PMCID: PMC6833982 DOI: 10.1167/19.13.2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 08/23/2019] [Indexed: 01/22/2023] Open
Abstract
Human faces contain dozens of visual features, but viewers preferentially fixate just two of them: the eyes and the mouth. Face-viewing behavior is usually studied by manually drawing regions of interest (ROIs) on the eyes, mouth, and other facial features. ROI analyses are problematic as they require arbitrary experimenter decisions about the location and number of ROIs, and they discard data because all fixations within each ROI are treated identically and fixations outside of any ROI are ignored. We introduce a data-driven method that uses principal component analysis (PCA) to characterize human face-viewing behavior. All fixations are entered into a PCA, and the resulting eigenimages provide a quantitative measure of variability in face-viewing behavior. In fixation data from 41 participants viewing four face exemplars under three stimulus and task conditions, the first principal component (PC1) separated the eye and mouth regions of the face. PC1 scores varied widely across participants, revealing large individual differences in preference for eye or mouth fixation, and PC1 scores varied by condition, revealing the importance of behavioral task in determining fixation location. Linear mixed effects modeling of the PC1 scores demonstrated that task condition accounted for 41% of the variance, individual differences accounted for 28% of the variance, and stimulus exemplar for less than 1% of the variance. Fixation eigenimages provide a useful tool for investigating the relative importance of the different factors that drive human face-viewing behavior.
Collapse
Affiliation(s)
- Kira Wegner-Clemens
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX
| | - Johannes Rennig
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX
| | - John F Magnotti
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX
| | - Michael S Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX
| |
Collapse
|
28
|
Feng G, Zhou B, Zhou W, Beauchamp MS, Magnotti JF. A Laboratory Study of the McGurk Effect in 324 Monozygotic and Dizygotic Twins. Front Neurosci 2019; 13:1029. [PMID: 31636529 PMCID: PMC6787151 DOI: 10.3389/fnins.2019.01029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 09/10/2019] [Indexed: 11/13/2022] Open
Abstract
Multisensory integration of information from the talker's voice and the talker's mouth facilitates human speech perception. A popular assay of audiovisual integration is the McGurk effect, an illusion in which incongruent visual speech information categorically changes the percept of auditory speech. There is substantial interindividual variability in susceptibility to the McGurk effect. To better understand possible sources of this variability, we examined the McGurk effect in 324 native Mandarin speakers, consisting of 73 monozygotic (MZ) and 89 dizygotic (DZ) twin pairs. When tested with 9 different McGurk stimuli, some participants never perceived the illusion and others always perceived it. Within participants, perception was similar across time (r = 0.55 at a 2-year retest in 150 participants) suggesting that McGurk susceptibility reflects a stable trait rather than short-term perceptual fluctuations. To examine the effects of shared genetics and prenatal environment, we compared McGurk susceptibility between MZ and DZ twins. Both twin types had significantly greater correlation than unrelated pairs (r = 0.28 for MZ twins and r = 0.21 for DZ twins) suggesting that the genes and environmental factors shared by twins contribute to individual differences in multisensory speech perception. Conversely, the existence of substantial differences within twin pairs (even MZ co-twins) and the overall low percentage of explained variance (5.5%) argues against a deterministic view of individual differences in multisensory integration.
Collapse
Affiliation(s)
- Guo Feng
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Psychological Research and Counseling Center, Southwest Jiaotong University, Chengdu, China
| | - Bin Zhou
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Wen Zhou
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Michael S. Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, United States
| | - John F. Magnotti
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, United States
| |
Collapse
|
29
|
Peterson MF, Zaun I, Hoke H, Jiahui G, Duchaine B, Kanwisher N. Eye movements and retinotopic tuning in developmental prosopagnosia. J Vis 2019; 19:7. [PMID: 31426085 DOI: 10.1167/19.9.7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Despite extensive investigation, the causes and nature of developmental prosopagnosia (DP)-a severe face identification impairment in the absence of acquired brain injury-remain poorly understood. Drawing on previous work showing that individuals identified as being neurotypical (NT) show robust individual differences in where they fixate on faces, and recognize faces best when the faces are presented at this location, we defined and tested four novel hypotheses for how atypical face-looking behavior and/or retinotopic face encoding could impair face recognition in DP: (a) fixating regions of poor information, (b) inconsistent saccadic targeting, (c) weak retinotopic tuning, and (d) fixating locations not matched to the individual's own face tuning. We found no support for the first three hypotheses, with NTs and DPs consistently fixating similar locations and showing similar retinotopic tuning of their face perception performance. However, in testing the fourth hypothesis, we found preliminary evidence for two distinct phenotypes of DP: (a) Subjects characterized by impaired face memory, typical face perception, and a preference to look high on the face, and (b) Subjects characterized by profound impairments to both face memory and perception and a preference to look very low on the face. Further, while all NTs and upper-looking DPs performed best when faces were presented near their preferred fixation location, this was not true for lower-looking DPs. These results suggest that face recognition deficits in a substantial proportion of people with DP may arise not from aberrant face gaze or compromised retinotopic tuning, but from the suboptimal matching of gaze to tuning.
Collapse
Affiliation(s)
- Matthew F Peterson
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ian Zaun
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Harris Hoke
- Center for Brain Science, Harvard University, Cambridge, MA, USA
| | - Guo Jiahui
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Brad Duchaine
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
30
|
O'Sullivan AE, Lim CY, Lalor EC. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations. Eur J Neurosci 2019; 50:3282-3295. [PMID: 31013361 DOI: 10.1111/ejn.14425] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 03/25/2019] [Accepted: 04/17/2019] [Indexed: 11/30/2022]
Abstract
Recent work using electroencephalography has applied stimulus reconstruction techniques to identify the attended speaker in a cocktail party environment. The success of these approaches has been primarily based on the ability to detect cortical tracking of the acoustic envelope at the scalp level. However, most studies have ignored the effects of visual input, which is almost always present in naturalistic scenarios. In this study, we investigated the effects of visual input on envelope-based cocktail party decoding in two multisensory cocktail party situations: (a) Congruent AV-facing the attended speaker while ignoring another speaker represented by the audio-only stream and (b) Incongruent AV (eavesdropping)-attending the audio-only speaker while looking at the unattended speaker. We trained and tested decoders for each condition separately and found that we can successfully decode attention to congruent audiovisual speech and can also decode attention when listeners were eavesdropping, i.e., looking at the face of the unattended talker. In addition to this, we found alpha power to be a reliable measure of attention to the visual speech. Using parieto-occipital alpha power, we found that we can distinguish whether subjects are attending or ignoring the speaker's face. Considering the practical applications of these methods, we demonstrate that with only six near-ear electrodes we can successfully determine the attended speech. This work extends the current framework for decoding attention to speech to more naturalistic scenarios, and in doing so provides additional neural measures which may be incorporated to improve decoding accuracy.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Chantelle Y Lim
- Department of Biomedical Engineering, University of Rochester, Rochester, New York
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.,Department of Biomedical Engineering, University of Rochester, Rochester, New York.,Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York
| |
Collapse
|
31
|
Rennig J, Beauchamp MS. Free viewing of talking faces reveals mouth and eye preferring regions of the human superior temporal sulcus. Neuroimage 2018; 183:25-36. [PMID: 30092347 PMCID: PMC6214361 DOI: 10.1016/j.neuroimage.2018.08.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 07/31/2018] [Accepted: 08/05/2018] [Indexed: 01/22/2023] Open
Abstract
During face-to-face communication, the mouth of the talker is informative about speech content, while the eyes of the talker convey other information, such as gaze location. Viewers most often fixate either the mouth or the eyes of the talker's face, presumably allowing them to sample these different sources of information. To study the neural correlates of this process, healthy humans freely viewed talking faces while brain activity was measured with BOLD fMRI and eye movements were recorded with a video-based eye tracker. Post hoc trial sorting was used to divide the data into trials in which participants fixated the mouth of the talker and trials in which they fixated the eyes. Although the audiovisual stimulus was identical, the two trials types evoked differing responses in subregions of the posterior superior temporal sulcus (pSTS). The anterior pSTS preferred trials in which participants fixated the mouth of the talker while the posterior pSTS preferred fixations on the eye of the talker. A second fMRI experiment demonstrated that anterior pSTS responded more strongly to auditory and audiovisual speech than posterior pSTS eye-preferring regions. These results provide evidence for functional specialization within the pSTS under more realistic viewing and stimulus conditions than in previous neuroimaging studies.
Collapse
Affiliation(s)
- Johannes Rennig
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| | - Michael S Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
32
|
Inattentional deafness to auditory alarms: Inter-individual differences, electrophysiological signature and single trial classification. Behav Brain Res 2018; 360:51-59. [PMID: 30508609 DOI: 10.1016/j.bbr.2018.11.045] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 11/22/2018] [Accepted: 11/29/2018] [Indexed: 02/03/2023]
Abstract
Inattentional deafness can have deleterious consequences in complex real-life situations (e.g. healthcare, aviation) leading to miss critical auditory signals. Such failure of auditory attention is thought to rely on top-down biasing mechanisms at the central executive level. A complementary approach to account for this phenomenon is to consider the existence of visual dominance over hearing that could be implemented via direct visual-to-auditory pathways. To investigate this phenomenon, thirteen aircraft pilots, equipped with a 32-channel EEG system, faced a low and high workload scenarii along with an auditory oddball task in a motion flight simulator. Prior to the flying task, the pilots were screened to assess their working memory span and visual dominance susceptibility. The behavioral results disclosed that the volunteers missed 57.7% of the auditory alarms in the difficult condition. Among all evaluated capabilities, only the visual dominance index was predictive of the miss rate in the difficult scenario. These findings provide behavioral evidences that other early cross-modal competitive process than top down modulation process could account for inattentional deafness. The electrophysiological analyses showed that the miss over the hit alarms led to a significant amplitude reduction of early perceptual (N100) and late attentional (P3a and P3b) event-related potentials components. Eventually, we implemented an EEG-based processing pipeline to perform single-trial classification of inattentional deafness. The results indicate that this processing chain could be used in an ecological setting as it led to 72.2% mean accuracy to discriminate missed from hit auditory alarms.
Collapse
|
33
|
Proverbio AM, Raso G, Zani A. Electrophysiological Indexes of Incongruent Audiovisual Phonemic Processing: Unraveling the McGurk Effect. Neuroscience 2018; 385:215-226. [PMID: 29932985 DOI: 10.1016/j.neuroscience.2018.06.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 06/11/2018] [Accepted: 06/12/2018] [Indexed: 11/15/2022]
Abstract
In this study the timing of electromagnetic signals recorded during incongruent and congruent audiovisual (AV) stimulation in 14 Italian healthy volunteers was examined. In a previous study (Proverbio et al., 2016) we investigated the McGurk effect in the Italian language and found out which visual and auditory inputs provided the most compelling illusory effects (e.g., bilabial phonemes presented acoustically and paired with non-labials, especially alveolar-nasal and velar-occlusive phonemes). In this study EEG was recorded from 128 scalp sites while participants observed a female and a male actor uttering 288 syllables selected on the basis of the previous investigation (lasting approximately 600 ms) and responded to rare targets (/re/, /ri/, /ro/, /ru/). In half of the cases the AV information was incongruent, except for targets that were always congruent. A pMMN (phonological Mismatch Negativity) to incongruent AV stimuli was identified 500 ms after voice onset time. This automatic response indexed the detection of an incongruity between the labial and phonetic information. SwLORETA (Low-Resolution Electromagnetic Tomography) analysis applied to the difference voltage incongruent-congruent in the same time window revealed that the strongest sources of this activity were the right superior temporal (STG) and superior frontal gyri, which supports their involvement in AV integration.
Collapse
Affiliation(s)
- Alice Mado Proverbio
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy.
| | - Giulia Raso
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy
| | | |
Collapse
|
34
|
Dittrich S, Noesselt T. Temporal Audiovisual Motion Prediction in 2D- vs. 3D-Environments. Front Psychol 2018; 9:368. [PMID: 29618999 PMCID: PMC5871701 DOI: 10.3389/fpsyg.2018.00368] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 03/06/2018] [Indexed: 11/24/2022] Open
Abstract
Predicting motion is essential for many everyday life activities, e.g., in road traffic. Previous studies on motion prediction failed to find consistent results, which might be due to the use of very different stimulus material and behavioural tasks. Here, we directly tested the influence of task (detection, extrapolation) and stimulus features (visual vs. audiovisual and three-dimensional vs. non-three-dimensional) on temporal motion prediction in two psychophysical experiments. In both experiments a ball followed a trajectory toward the observer and temporarily disappeared behind an occluder. In audiovisual conditions a moving white noise (congruent or non-congruent to visual motion direction) was presented concurrently. In experiment 1 the ball reappeared on a predictable or a non-predictable trajectory and participants detected when the ball reappeared. In experiment 2 the ball did not reappear after occlusion and participants judged when the ball would reach a specified position at two possible distances from the occluder (extrapolation task). Both experiments were conducted in three-dimensional space (using stereoscopic screen and polarised glasses) and also without stereoscopic presentation. Participants benefitted from visually predictable trajectories and concurrent sounds during detection. Additionally, visual facilitation was more pronounced for non-3D stimulation during detection task. In contrast, for a more complex extrapolation task group mean results indicated that auditory information impaired motion prediction. However, a post hoc cross-validation procedure (split-half) revealed that participants varied in their ability to use sounds during motion extrapolation. Most participants selectively profited from either near or far extrapolation distances but were impaired for the other one. We propose that interindividual differences in extrapolation efficiency might be the mechanism governing this effect. Together, our results indicate that both a realistic experimental environment and subject-specific differences modulate the ability of audiovisual motion prediction and need to be considered in future research.
Collapse
Affiliation(s)
- Sandra Dittrich
- Department of Biological Psychology, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Tömme Noesselt
- Department of Biological Psychology, Otto von Guericke University Magdeburg, Magdeburg, Germany.,Center for Behavioral Brain Sciences, Magdeburg, Germany
| |
Collapse
|
35
|
Magnotti JF, Basu Mallick D, Beauchamp MS. Reducing Playback Rate of Audiovisual Speech Leads to a Surprising Decrease in the McGurk Effect. Multisens Res 2018; 31:19-38. [DOI: 10.1163/22134808-00002586] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 06/03/2017] [Indexed: 11/19/2022]
Abstract
We report the unexpected finding that slowing video playback decreases perception of the McGurk effect. This reduction is counter-intuitive because the illusion depends on visual speech influencing the perception of auditory speech, and slowing speech should increase the amount of visual information available to observers. We recorded perceptual data from 110 subjects viewing audiovisual syllables (either McGurk or congruent control stimuli) played back at one of three rates: the rate used by the talker during recording (the natural rate), a slow rate (50% of natural), or a fast rate (200% of natural). We replicated previous studies showing dramatic variability in McGurk susceptibility at the natural rate, ranging from 0–100% across subjects and from 26–76% across the eight McGurk stimuli tested. Relative to the natural rate, slowed playback reduced the frequency of McGurk responses by 11% (79% of subjects showed a reduction) and reduced congruent accuracy by 3% (25% of subjects showed a reduction). Fast playback rate had little effect on McGurk responses or congruent accuracy. To determine whether our results are consistent with Bayesian integration, we constructed a Bayes-optimal model that incorporated two assumptions: individuals combine auditory and visual information according to their reliability, and changing playback rate affects sensory reliability. The model reproduced both our findings of large individual differences and the playback rate effect. This work illustrates that surprises remain in the McGurk effect and that Bayesian integration provides a useful framework for understanding audiovisual speech perception.
Collapse
Affiliation(s)
- John F. Magnotti
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| | | | - Michael S. Beauchamp
- Department of Neurosurgery and Core for Advanced MRI, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
36
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
37
|
Neural Mechanisms Underlying Cross-Modal Phonetic Encoding. J Neurosci 2017; 38:1835-1849. [PMID: 29263241 DOI: 10.1523/jneurosci.1566-17.2017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 11/17/2017] [Accepted: 12/08/2017] [Indexed: 11/21/2022] Open
Abstract
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (illusion-fa), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (illusion-ba), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba, and a reduced N1 when they perceived illusion-fa, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex.SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
Collapse
|
38
|
Odegaard B, Wozny DR, Shams L. A simple and efficient method to enhance audiovisual binding tendencies. PeerJ 2017; 5:e3143. [PMID: 28462016 PMCID: PMC5407282 DOI: 10.7717/peerj.3143] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Accepted: 03/04/2017] [Indexed: 11/20/2022] Open
Abstract
Individuals vary in their tendency to bind signals from multiple senses. For the same set of sights and sounds, one individual may frequently integrate multisensory signals and experience a unified percept, whereas another individual may rarely bind them and often experience two distinct sensations. Thus, while this binding/integration tendency is specific to each individual, it is not clear how plastic this tendency is in adulthood, and how sensory experiences may cause it to change. Here, we conducted an exploratory investigation which provides evidence that (1) the brain’s tendency to bind in spatial perception is plastic, (2) that it can change following brief exposure to simple audiovisual stimuli, and (3) that exposure to temporally synchronous, spatially discrepant stimuli provides the most effective method to modify it. These results can inform current theories about how the brain updates its internal model of the surrounding sensory world, as well as future investigations seeking to increase integration tendencies.
Collapse
Affiliation(s)
- Brian Odegaard
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| | - David R Wozny
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States
| | - Ladan Shams
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, United States.,Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA, United States.,Neuroscience Interdepartmental Program, University of California-Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
39
|
Francisco AA, Groen MA, Jesse A, McQueen JM. Beyond the usual cognitive suspects: The importance of speechreading and audiovisual temporal sensitivity in reading ability. LEARNING AND INDIVIDUAL DIFFERENCES 2017. [DOI: 10.1016/j.lindif.2017.01.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
40
|
Arizpe J, Walsh V, Yovel G, Baker CI. The categories, frequencies, and stability of idiosyncratic eye-movement patterns to faces. Vision Res 2016; 141:191-203. [PMID: 27940212 DOI: 10.1016/j.visres.2016.10.013] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 09/14/2016] [Accepted: 10/29/2016] [Indexed: 11/19/2022]
Abstract
The spatial pattern of eye-movements to faces considered typical for neurologically healthy individuals is a roughly T-shaped distribution over the internal facial features with peak fixation density tending toward the left eye (observer's perspective). However, recent studies indicate that striking deviations from this classic pattern are common within the population and are highly stable over time. The classic pattern actually reflects the average of these various idiosyncratic eye-movement patterns across individuals. The natural categories and respective frequencies of different types of idiosyncratic eye-movement patterns have not been specifically investigated before, so here we analyzed the spatial patterns of eye-movements for 48 participants to estimate the frequency of different kinds of individual eye-movement patterns to faces in the normal healthy population. Four natural clusters were discovered such that approximately 25% of our participants' fixation density peaks clustered over the left eye region (observer's perspective), 23% over the right eye-region, 31% over the nasion/bridge region of the nose, and 20% over the region spanning the nose, philthrum, and upper lips. We did not find any relationship between particular idiosyncratic eye-movement patterns and recognition performance. Individuals' eye-movement patterns early in a trial were more stereotyped than later ones and idiosyncratic fixation patterns evolved with time into a trial. Finally, while face inversion strongly modulated eye-movement patterns, individual patterns did not become less distinct for inverted compared to upright faces. Group-averaged fixation patterns do not represent individual patterns well, so exploration of such individual patterns is of value for future studies of visual cognition.
Collapse
Affiliation(s)
- Joseph Arizpe
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA; Applied Cognitive Neuroscience Group, Institute of Cognitive Neuroscience, University College London, London, United Kingdom; Department of Psychiatry, Harvard Medical School, Boston, MA, USA; Boston Veterans Affairs Medical Center, Jamaica Plain, MA, USA.
| | - Vincent Walsh
- Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Galit Yovel
- Department of Psychology, Tel Aviv University, Tel Aviv, Israel
| | - Chris I Baker
- Applied Cognitive Neuroscience Group, Institute of Cognitive Neuroscience, University College London, London, United Kingdom
| |
Collapse
|
41
|
Kumar GV, Halder T, Jaiswal AK, Mukherjee A, Roy D, Banerjee A. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study. Front Psychol 2016; 7:1558. [PMID: 27790169 PMCID: PMC5062921 DOI: 10.3389/fpsyg.2016.01558] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 09/23/2016] [Indexed: 11/13/2022] Open
Abstract
Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our study indicates that the temporal integration underlying multisensory speech perception requires to be understood in the framework of large-scale functional brain network mechanisms in addition to the established cortical loci of multisensory speech perception.
Collapse
Affiliation(s)
- G Vinodh Kumar
- Cognitive Brain Lab, National Brain Research Centre Gurgaon, India
| | - Tamesh Halder
- Cognitive Brain Lab, National Brain Research Centre Gurgaon, India
| | - Amit K Jaiswal
- Cognitive Brain Lab, National Brain Research Centre Gurgaon, India
| | | | - Dipanjan Roy
- Centre for Behavioural and Cognitive Sciences, University of Allahabad Allahabad, India
| | - Arpan Banerjee
- Cognitive Brain Lab, National Brain Research Centre Gurgaon, India
| |
Collapse
|
42
|
Skilled musicians are not subject to the McGurk effect. Sci Rep 2016; 6:30423. [PMID: 27453363 PMCID: PMC4958963 DOI: 10.1038/srep30423] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 07/05/2016] [Indexed: 11/25/2022] Open
Abstract
The McGurk effect is a compelling illusion in which humans auditorily perceive mismatched audiovisual speech as a completely different syllable. In this study evidences are provided that professional musicians are not subject to this illusion, possibly because of their finer auditory or attentional abilities. 80 healthy age-matched graduate students volunteered to the study. 40 were musicians of Brescia Luca Marenzio Conservatory of Music with at least 8–13 years of musical academic studies. /la/, /da/, /ta/, /ga/, /ka/, /na/, /ba/, /pa/ phonemes were presented to participants in audiovisual congruent and incongruent conditions, or in unimodal (only visual or only auditory) conditions while engaged in syllable recognition tasks. Overall musicians showed no significant McGurk effect for any of the phonemes. Controls showed a marked McGurk effect for several phonemes (including alveolar-nasal, velar-occlusive and bilabial ones). The results indicate that the early and intensive musical training might affect the way the auditory cortex process phonetic information.
Collapse
|
43
|
Magnotti JF, Basu Mallick D, Feng G, Zhou B, Zhou W, Beauchamp MS. Similar frequency of the McGurk effect in large samples of native Mandarin Chinese and American English speakers. Exp Brain Res 2015; 233:2581-6. [PMID: 26041554 DOI: 10.1007/s00221-015-4324-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 05/13/2015] [Indexed: 11/28/2022]
Abstract
Humans combine visual information from mouth movements with auditory information from the voice to recognize speech. A common method for assessing multisensory speech perception is the McGurk effect: When presented with particular pairings of incongruent auditory and visual speech syllables (e.g., the auditory speech sounds for "ba" dubbed onto the visual mouth movements for "ga"), individuals perceive a third syllable, distinct from the auditory and visual components. Chinese and American cultures differ in the prevalence of direct facial gaze and in the auditory structure of their languages, raising the possibility of cultural- and language-related group differences in the McGurk effect. There is no consensus in the literature about the existence of these group differences, with some studies reporting less McGurk effect in native Mandarin Chinese speakers than in English speakers and others reporting no difference. However, these studies sampled small numbers of participants tested with a small number of stimuli. Therefore, we collected data on the McGurk effect from large samples of Mandarin-speaking individuals from China and English-speaking individuals from the USA (total n = 307) viewing nine different stimuli. Averaged across participants and stimuli, we found similar frequencies of the McGurk effect between Chinese and American participants (48 vs. 44 %). In both groups, we observed a large range of frequencies both across participants (range from 0 to 100 %) and stimuli (15 to 83 %) with the main effect of culture and language accounting for only 0.3 % of the variance in the data. High individual variability in perception of the McGurk effect necessitates the use of large sample sizes to accurately estimate group differences.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Baylor College of Medicine, 1 Baylor Plaza, Suite 104, Houston, TX, USA,
| | | | | | | | | | | |
Collapse
|