1
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
2
|
Viktorsson C, Valtakari NV, Falck-Ytter T, Hooge ITC, Rudling M, Hessels RS. Stable eye versus mouth preference in a live speech-processing task. Sci Rep 2023; 13:12878. [PMID: 37553414 PMCID: PMC10409748 DOI: 10.1038/s41598-023-40017-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 08/03/2023] [Indexed: 08/10/2023] Open
Abstract
Looking at the mouth region is thought to be a useful strategy for speech-perception tasks. The tendency to look at the eyes versus the mouth of another person during speech processing has thus far mainly been studied using screen-based paradigms. In this study, we estimated the eye-mouth-index (EMI) of 38 adult participants in a live setting. Participants were seated across the table from an experimenter, who read sentences out loud for the participant to remember in both a familiar (English) and unfamiliar (Finnish) language. No statistically significant difference in the EMI between the familiar and the unfamiliar languages was observed. Total relative looking time at the mouth also did not predict the number of correctly identified sentences. Instead, we found that the EMI was higher during an instruction phase than during the speech-processing task. Moreover, we observed high intra-individual correlations in the EMI across the languages and different phases of the experiment. We conclude that there are stable individual differences in looking at the eyes versus the mouth of another person. Furthermore, this behavior appears to be flexible and dependent on the requirements of the situation (speech processing or not).
Collapse
Affiliation(s)
- Charlotte Viktorsson
- Development and Neurodiversity Lab, Department of Psychology, Uppsala University, Uppsala, Sweden.
| | - Niilo V Valtakari
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| | - Terje Falck-Ytter
- Development and Neurodiversity Lab, Department of Psychology, Uppsala University, Uppsala, Sweden
- Center of Neurodevelopmental Disorders (KIND), Division of Neuropsychiatry, Department of Women's and Children's Health, Karolinska Institutet, Stockholm, Sweden
| | - Ignace T C Hooge
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| | - Maja Rudling
- Development and Neurodiversity Lab, Department of Psychology, Uppsala University, Uppsala, Sweden
| | - Roy S Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
3
|
Harwood V, Baron A, Kleinman D, Campanelli L, Irwin J, Landi N. Event-Related Potentials in Assessing Visual Speech Cues in the Broader Autism Phenotype: Evidence from a Phonemic Restoration Paradigm. Brain Sci 2023; 13:1011. [PMID: 37508944 PMCID: PMC10377560 DOI: 10.3390/brainsci13071011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 06/26/2023] [Accepted: 06/29/2023] [Indexed: 07/30/2023] Open
Abstract
Audiovisual speech perception includes the simultaneous processing of auditory and visual speech. Deficits in audiovisual speech perception are reported in autistic individuals; however, less is known regarding audiovisual speech perception within the broader autism phenotype (BAP), which includes individuals with elevated, yet subclinical, levels of autistic traits. We investigate the neural indices of audiovisual speech perception in adults exhibiting a range of autism-like traits using event-related potentials (ERPs) in a phonemic restoration paradigm. In this paradigm, we consider conditions where speech articulators (mouth and jaw) are present (AV condition) and obscured by a pixelated mask (PX condition). These two face conditions were included in both passive (simply viewing a speaking face) and active (participants were required to press a button for a specific consonant-vowel stimulus) experiments. The results revealed an N100 ERP component which was present for all listening contexts and conditions; however, it was attenuated in the active AV condition where participants were able to view the speaker's face, including the mouth and jaw. The P300 ERP component was present within the active experiment only, and significantly greater within the AV condition compared to the PX condition. This suggests increased neural effort for detecting deviant stimuli when visible articulation was present and visual influence on perception. Finally, the P300 response was negatively correlated with autism-like traits, suggesting that higher autistic traits were associated with generally smaller P300 responses in the active AV and PX conditions. The conclusions support the finding that atypical audiovisual processing may be characteristic of the BAP in adults.
Collapse
Affiliation(s)
- Vanessa Harwood
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI 02881, USA
| | - Alisa Baron
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI 02881, USA
| | | | - Luca Campanelli
- Department of Communicative Disorders, University of Alabama, Tuscaloosa, AL 35487, USA
| | - Julia Irwin
- Haskins Laboratories, New Haven, CT 06519, USA
- Department of Psychology, Southern Connecticut State University, New Haven, CT 06515, USA
| | - Nicole Landi
- Haskins Laboratories, New Haven, CT 06519, USA
- Department of Psychological Sciences, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
4
|
Baron A, Harwood V, Kleinman D, Campanelli L, Molski J, Landi N, Irwin J. Where on the face do we look during phonemic restoration: An eye-tracking study. Front Psychol 2023; 14:1005186. [PMID: 37303890 PMCID: PMC10249372 DOI: 10.3389/fpsyg.2023.1005186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 04/28/2023] [Indexed: 06/13/2023] Open
Abstract
Face to face communication typically involves audio and visual components to the speech signal. To examine the effect of task demands on gaze patterns in response to a speaking face, adults participated in two eye-tracking experiments with an audiovisual (articulatory information from the mouth was visible) and a pixelated condition (articulatory information was not visible). Further, task demands were manipulated by having listeners respond in a passive (no response) or an active (button press response) context. The active experiment required participants to discriminate between speech stimuli and was designed to mimic environmental situations which require one to use visual information to disambiguate the speaker's message, simulating different listening conditions in real-world settings. Stimuli included a clear exemplar of the syllable /ba/ and a second exemplar in which the formant initial consonant was reduced creating an /a/-like consonant. Consistent with our hypothesis, results revealed that the greatest fixations to the mouth were present in the audiovisual active experiment and visual articulatory information led to a phonemic restoration effect for the /a/ speech token. In the pixelated condition, participants fixated on the eyes, and discrimination of the deviant token within the active experiment was significantly greater than the audiovisual condition. These results suggest that when required to disambiguate changes in speech, adults may look to the mouth for additional cues to support processing when it is available.
Collapse
Affiliation(s)
- Alisa Baron
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI, United States
| | - Vanessa Harwood
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI, United States
| | | | - Luca Campanelli
- Department of Communicative Disorders, The University of Alabama, Tuscaloosa, AL, United States
| | - Joseph Molski
- Department of Communicative Disorders, University of Rhode Island, Kingston, RI, United States
| | - Nicole Landi
- Haskins Laboratories, New Haven, CT, United States
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
| | - Julia Irwin
- Haskins Laboratories, New Haven, CT, United States
- Department of Psychology, Southern Connecticut State University, New Haven, CT, United States
| |
Collapse
|
5
|
Schwarz J, Li KK, Sim JH, Zhang Y, Buchanan-Worster E, Post B, Gibson JL, McDougall K. Semantic Cues Modulate Children’s and Adults’ Processing of Audio-Visual Face Mask Speech. Front Psychol 2022; 13:879156. [PMID: 35928422 PMCID: PMC9343587 DOI: 10.3389/fpsyg.2022.879156] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 05/25/2022] [Indexed: 12/03/2022] Open
Abstract
During the COVID-19 pandemic, questions have been raised about the impact of face masks on communication in classroom settings. However, it is unclear to what extent visual obstruction of the speaker’s mouth or changes to the acoustic signal lead to speech processing difficulties, and whether these effects can be mitigated by semantic predictability, i.e., the availability of contextual information. The present study investigated the acoustic and visual effects of face masks on speech intelligibility and processing speed under varying semantic predictability. Twenty-six children (aged 8-12) and twenty-six adults performed an internet-based cued shadowing task, in which they had to repeat aloud the last word of sentences presented in audio-visual format. The results showed that children and adults made more mistakes and responded more slowly when listening to face mask speech compared to speech produced without a face mask. Adults were only significantly affected by face mask speech when both the acoustic and the visual signal were degraded. While acoustic mask effects were similar for children, removal of visual speech cues through the face mask affected children to a lesser degree. However, high semantic predictability reduced audio-visual mask effects, leading to full compensation of the acoustically degraded mask speech in the adult group. Even though children did not fully compensate for face mask speech with high semantic predictability, overall, they still profited from semantic cues in all conditions. Therefore, in classroom settings, strategies that increase contextual information such as building on students’ prior knowledge, using keywords, and providing visual aids, are likely to help overcome any adverse face mask effects.
Collapse
Affiliation(s)
- Julia Schwarz
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Julia Schwarz,
| | - Katrina Kechun Li
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
- Katrina Kechun Li,
| | - Jasper Hong Sim
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | - Yixin Zhang
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | - Elizabeth Buchanan-Worster
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Brechtje Post
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | | | - Kirsty McDougall
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
6
|
Potthoff J, Schienle A. Effects of Self-Esteem on Self-Viewing: An Eye-Tracking Investigation on Mirror Gazing. Behav Sci (Basel) 2021; 11:164. [PMID: 34940099 PMCID: PMC8698327 DOI: 10.3390/bs11120164] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 11/24/2021] [Accepted: 11/26/2021] [Indexed: 12/14/2022] Open
Abstract
While some people enjoy looking at their faces in the mirror, others experience emotional distress. Despite these individual differences concerning self-viewing in the mirror, systematic investigations on this topic have not been conducted so far. The present eye-tracking study examined whether personality traits (self-esteem, narcissism propensity, self-disgust) are associated with gaze behavior (gaze duration, fixation count) during free mirror viewing of one's face. Sixty-eight adults (mean age = 23.5 years; 39 females, 29 males) viewed their faces in the mirror and watched a video of an unknown person matched for gender and age (control condition) for 90 s each. The computed regression analysis showed that higher self-esteem was associated with a shorter gaze duration for both self-face and other-face. This effect may reflect a less critical evaluation of the faces.
Collapse
Affiliation(s)
- Jonas Potthoff
- Institute of Psychology, University of Graz, 8010 Graz, Austria;
| | | |
Collapse
|
7
|
Abstract
Gaze-where one looks, how long, and when-plays an essential part in human social behavior. While many aspects of social gaze have been reviewed, there is no comprehensive review or theoretical framework that describes how gaze to faces supports face-to-face interaction. In this review, I address the following questions: (1) When does gaze need to be allocated to a particular region of a face in order to provide the relevant information for successful interaction; (2) How do humans look at other people, and faces in particular, regardless of whether gaze needs to be directed at a particular region to acquire the relevant visual information; (3) How does gaze support the regulation of interaction? The work reviewed spans psychophysical research, observational research, and eye-tracking research in both lab-based and interactive contexts. Based on the literature overview, I sketch a framework for future research based on dynamic systems theory. The framework holds that gaze should be investigated in relation to sub-states of the interaction, encompassing sub-states of the interactors, the content of the interaction as well as the interactive context. The relevant sub-states for understanding gaze in interaction vary over different timescales from microgenesis to ontogenesis and phylogenesis. The framework has important implications for vision science, psychopathology, developmental science, and social robotics.
Collapse
Affiliation(s)
- Roy S Hessels
- Experimental Psychology, Helmholtz Institute, Utrecht University, Heidelberglaan 1, 3584CS, Utrecht, The Netherlands.
- Developmental Psychology, Heidelberglaan 1, 3584CS, Utrecht, The Netherlands.
| |
Collapse
|
8
|
Jiang J, von Kriegstein K, Jiang J. Brain mechanisms of eye contact during verbal communication predict autistic traits in neurotypical individuals. Sci Rep 2020; 10:14602. [PMID: 32884087 PMCID: PMC7471895 DOI: 10.1038/s41598-020-71547-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 08/12/2020] [Indexed: 11/16/2022] Open
Abstract
Atypical eye contact in communication is a common characteristic in autism spectrum disorders. Autistic traits vary along a continuum extending into the neurotypical population. The relation between autistic traits and brain mechanisms underlying spontaneous eye contact during verbal communication remains unexplored. Here, we used simultaneous functional magnetic resonance imaging and eye tracking to investigate this relation in neurotypical people within a naturalistic verbal context. Using multiple regression analyses, we found that brain response in the posterior superior temporal sulcus (pSTS) and its connectivity with the fusiform face area (FFA) during eye contact with a speaker predicted the level of autistic traits measured by Autism-spectrum Quotient (AQ). Further analyses for different AQ subclusters revealed that these two predictors were negatively associated with attention to detail. The relation between FFA–pSTS connectivity and the attention to detail ability was mediated by individuals’ looking preferences for speaker’s eyes. This study identified the role of an individual eye contact pattern in the relation between brain mechanisms underlying natural eye contact during verbal communication and autistic traits in neurotypical people. The findings may help to increase our understanding of the mechanisms of atypical eye contact behavior during natural communication.
Collapse
Affiliation(s)
- Jing Jiang
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, 94305, USA. .,Max Planck Institute for Human Cognitive and Brain Sciences, 04103, Leipzig, Germany. .,Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, 10117, Berlin, Germany. .,Institute of Psychology, Humboldt-Universität zu Berlin, 12489, Berlin, Germany.
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, 04103, Leipzig, Germany.,Institute of Psychology, Humboldt-Universität zu Berlin, 12489, Berlin, Germany.,Faculty of Psychology, Technische Universität Dresden, 01187, Dresden, Germany
| | - Jiefeng Jiang
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, 52242, USA
| |
Collapse
|
9
|
Keitel A, Gross J, Kayser C. Shared and modality-specific brain regions that mediate auditory and visual word comprehension. eLife 2020; 9:e56972. [PMID: 32831168 PMCID: PMC7470824 DOI: 10.7554/elife.56972] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 08/18/2020] [Indexed: 12/22/2022] Open
Abstract
Visual speech carried by lip movements is an integral part of communication. Yet, it remains unclear in how far visual and acoustic speech comprehension are mediated by the same brain regions. Using multivariate classification of full-brain MEG data, we first probed where the brain represents acoustically and visually conveyed word identities. We then tested where these sensory-driven representations are predictive of participants' trial-wise comprehension. The comprehension-relevant representations of auditory and visual speech converged only in anterior angular and inferior frontal regions and were spatially dissociated from those representations that best reflected the sensory-driven word identity. These results provide a neural explanation for the behavioural dissociation of acoustic and visual speech comprehension and suggest that cerebral representations encoding word identities may be more modality-specific than often upheld.
Collapse
Affiliation(s)
- Anne Keitel
- Psychology, University of DundeeDundeeUnited Kingdom
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
- Institute for Biomagnetism and Biosignalanalysis, University of MünsterMünsterGermany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld UniversityBielefeldGermany
| |
Collapse
|
10
|
Shavit-Cohen K, Zion Golumbic E. The Dynamics of Attention Shifts Among Concurrent Speech in a Naturalistic Multi-speaker Virtual Environment. Front Hum Neurosci 2019; 13:386. [PMID: 31780911 PMCID: PMC6857110 DOI: 10.3389/fnhum.2019.00386] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 10/16/2019] [Indexed: 12/18/2022] Open
Abstract
Focusing attention on one speaker on the background of other irrelevant speech can be a challenging feat. A longstanding question in attention research is whether and how frequently individuals shift their attention towards task-irrelevant speech, arguably leading to occasional detection of words in a so-called unattended message. However, this has been difficult to gauge empirically, particularly when participants attend to continuous natural speech, due to the lack of appropriate metrics for detecting shifts in internal attention. Here we introduce a new experimental platform for studying the dynamic deployment of attention among concurrent speakers, utilizing a unique combination of Virtual Reality (VR) and Eye-Tracking technology. We created a Virtual Café in which participants sit across from and attend to the narrative of a target speaker. We manipulated the number and location of distractor speakers by placing additional characters throughout the Virtual Café. By monitoring participant's eye-gaze dynamics, we studied the patterns of overt attention-shifts among concurrent speakers as well as the consequences of these shifts on speech comprehension. Our results reveal important individual differences in the gaze-pattern displayed during selective attention to speech. While some participants stayed fixated on a target speaker throughout the entire experiment, approximately 30% of participants frequently shifted their gaze toward distractor speakers or other locations in the environment, regardless of the severity of audiovisual distraction. Critically, preforming frequent gaze-shifts negatively impacted the comprehension of target speech, and participants made more mistakes when looking away from the target speaker. We also found that gaze-shifts occurred primarily during gaps in the acoustic input, suggesting that momentary reductions in acoustic masking prompt attention-shifts between competing speakers, in line with "glimpsing" theories of processing speech in noise. These results open a new window into understanding the dynamics of attention as they wax and wane over time, and the different listening patterns employed for dealing with the influx of sensory input in multisensory environments. Moreover, the novel approach developed here for tracking the locus of momentary attention in a naturalistic virtual-reality environment holds high promise for extending the study of human behavior and cognition and bridging the gap between the laboratory and real-life.
Collapse
Affiliation(s)
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| |
Collapse
|
11
|
Worster E, Pimperton H, Ralph-Lewis A, Monroy L, Hulme C, MacSweeney M. Eye Movements During Visual Speech Perception in Deaf and Hearing Children. LANGUAGE LEARNING 2018; 68:159-179. [PMID: 29937576 PMCID: PMC6001475 DOI: 10.1111/lang.12264] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 08/10/2017] [Accepted: 08/11/2017] [Indexed: 06/08/2023]
Abstract
For children who are born deaf, lipreading (speechreading) is an important source of access to spoken language. We used eye tracking to investigate the strategies used by deaf (n = 33) and hearing 5-8-year-olds (n = 59) during a sentence speechreading task. The proportion of time spent looking at the mouth during speech correlated positively with speechreading accuracy. In addition, all children showed a tendency to watch the mouth during speech and watch the eyes when the model was not speaking. The extent to which the children used this communicative pattern, which we refer to as social-tuning, positively predicted their speechreading performance, with the deaf children showing a stronger relationship than the hearing children. These data suggest that better speechreading skills are seen in those children, both deaf and hearing, who are able to guide their visual attention to the appropriate part of the image and in those who have a good understanding of conversational turn-taking.
Collapse
Affiliation(s)
| | | | - Amelia Ralph-Lewis
- Deafness, Cognition, and Language Research Centre University College London
| | - Laura Monroy
- Institute of Cognitive Neuroscience University College London
| | | | - Mairéad MacSweeney
- Institute of Cognitive Neuroscience University College London
- Deafness, Cognition, and Language Research Centre University College London
| |
Collapse
|
12
|
Ozker M, Yoshor D, Beauchamp MS. Frontal cortex selects representations of the talker's mouth to aid in speech perception. eLife 2018; 7:30387. [PMID: 29485404 PMCID: PMC5828660 DOI: 10.7554/elife.30387] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 02/18/2018] [Indexed: 11/25/2022] Open
Abstract
Human faces contain multiple sources of information. During speech perception, visual information from the talker’s mouth is integrated with auditory information from the talker's voice. By directly recording neural responses from small populations of neurons in patients implanted with subdural electrodes, we found enhanced visual cortex responses to speech when auditory speech was absent (rendering visual speech especially relevant). Receptive field mapping demonstrated that this enhancement was specific to regions of the visual cortex with retinotopic representations of the mouth of the talker. Connectivity between frontal cortex and other brain regions was measured with trial-by-trial power correlations. Strong connectivity was observed between frontal cortex and mouth regions of visual cortex; connectivity was weaker between frontal cortex and non-mouth regions of visual cortex or auditory cortex. These results suggest that top-down selection of visual information from the talker’s mouth by frontal cortex plays an important role in audiovisual speech perception.
Collapse
Affiliation(s)
- Muge Ozker
- Department of Neurosurgery, Baylor College of Medicine, Texas, United States
| | - Daniel Yoshor
- Department of Neurosurgery, Baylor College of Medicine, Texas, United States.,Michael E. DeBakey Veterans Affairs Medical Center, Texas, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Baylor College of Medicine, Texas, United States
| |
Collapse
|
13
|
Jiang J, Borowiak K, Tudge L, Otto C, von Kriegstein K. Neural mechanisms of eye contact when listening to another person talking. Soc Cogn Affect Neurosci 2017; 12:319-328. [PMID: 27576745 PMCID: PMC5390711 DOI: 10.1093/scan/nsw127] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 08/24/2016] [Indexed: 11/14/2022] Open
Abstract
Eye contact occurs frequently and voluntarily during face-to-face verbal communication. However, the neural mechanisms underlying eye contact when it is accompanied by spoken language remain unexplored to date. Here we used a novel approach, fixation-based event-related functional magnetic resonance imaging (fMRI), to simulate the listener making eye contact with a speaker during verbal communication. Participants’ eye movements and fMRI data were recorded simultaneously while they were freely viewing a pre-recorded speaker talking. The eye tracking data were then used to define events for the fMRI analyses. The results showed that eye contact in contrast to mouth fixation involved visual cortical areas (cuneus, calcarine sulcus), brain regions related to theory of mind/intentionality processing (temporoparietal junction, posterior superior temporal sulcus, medial prefrontal cortex) and the dorsolateral prefrontal cortex. In addition, increased effective connectivity was found between these regions for eye contact in contrast to mouth fixations. The results provide first evidence for neural mechanisms underlying eye contact when watching and listening to another person talking. The network we found might be well suited for processing the intentions of communication partners during eye contact in verbal communication.
Collapse
Affiliation(s)
- Jing Jiang
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany.,Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin 10117, Germany.,Institute of Psychology, Humboldt-Universität zu Berlin, Berlin 12489, Germany
| | - Kamila Borowiak
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany.,Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin 10117, Germany
| | - Luke Tudge
- Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin 10117, Germany
| | - Carolin Otto
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany.,Institute of Psychology, Humboldt-Universität zu Berlin, Berlin 12489, Germany
| |
Collapse
|
14
|
Irwin J, Brancazio L, Volpe N. The development of gaze to a speaking face. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:3145. [PMID: 28599552 PMCID: PMC5422207 DOI: 10.1121/1.4982727] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 03/16/2017] [Accepted: 04/17/2017] [Indexed: 05/11/2023]
Abstract
When a speaker talks, the visible consequences of what they are saying can be seen. Listeners are influenced by this visible speech both in a noisy listening environment and even when auditory speech can easily be heard. While visible influence on heard speech has been reported to increase from early to late childhood, little is known about the mechanism that underlies this developmental trend. One possible account of developmental differences is that looking behavior to the face of a speaker changes with age. To examine this possibility, the gaze to a speaking face was examined in children from 5 to 10 yrs of age and adults. Participants viewed a speaker's face in a range of conditions that elicit looking: in a visual only (speech reading) condition, in the presence of auditory noise (speech in noise) condition, and in an audiovisual mismatch (McGurk) condition. Results indicate an increase in gaze on the face, and specifically, to the mouth of a speaker between the ages of 5 and 10 for all conditions. This change in looking behavior may help account for previous findings in the literature showing that visual influence on heard speech increases with development.
Collapse
Affiliation(s)
- Julia Irwin
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, USA
| | - Lawrence Brancazio
- Southern Connecticut State University, 501 Crescent Street, New Haven, Connecticut 06515, USA
| | - Nicole Volpe
- Southern Connecticut State University, 501 Crescent Street, New Haven, Connecticut 06515, USA
| |
Collapse
|
15
|
Irwin J, DiBlasi L. Audiovisual speech perception: A new approach and implications for clinical populations. LANGUAGE AND LINGUISTICS COMPASS 2017; 11:77-91. [PMID: 29520300 PMCID: PMC5839512 DOI: 10.1111/lnc3.12237] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 01/25/2017] [Indexed: 06/01/2023]
Abstract
This selected overview of audiovisual (AV) speech perception examines the influence of visible articulatory information on what is heard. Thought to be a cross-cultural phenomenon that emerges early in typical language development, variables that influence AV speech perception include properties of the visual and the auditory signal, attentional demands, and individual differences. A brief review of the existing neurobiological evidence on how visual information influences heard speech indicates potential loci, timing, and facilitatory effects of AV over auditory only speech. The current literature on AV speech in certain clinical populations (individuals with an autism spectrum disorder, developmental language disorder, or hearing loss) reveals differences in processing that may inform interventions. Finally, a new method of assessing AV speech that does not require obvious cross-category mismatch or auditory noise was presented as a novel approach for investigators.
Collapse
Affiliation(s)
- Julia Irwin
- LEARN Center, Haskins Laboratories Inc., USA
| | | |
Collapse
|
16
|
Culling JF. Speech intelligibility in virtual restaurants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2418. [PMID: 27794329 DOI: 10.1121/1.4964401] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Speech reception thresholds (SRTs) for a target voice on the same virtual table were measured in various restaurant simulations under conditions of masking by between one and eight interferers at other tables. Results for different levels of reverberation and different simulation techniques were qualitatively similar. SRTs increased steeply with the number of interferers, reflecting progressive failure to perceptually unmask the target speech as the acoustic scene became more complex. For a single interferer, continuous noise was the most effective masker, and a single interfering voice of either gender was least effective. With two interferers, evidence of informational masking emerged as a difference in SRT between forward and reversed speech, but SRTs for all interferer types progressively converged at four and eight interferers. In simulation based on a real room, this occurred at a signal-to-noise ratio of around -5 dB.
Collapse
Affiliation(s)
- John F Culling
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, CF10 3AT, United Kingdom
| |
Collapse
|
17
|
Atypical audiovisual word processing in school-age children with a history of specific language impairment: an event-related potential study. J Neurodev Disord 2016; 8:33. [PMID: 27597881 PMCID: PMC5011345 DOI: 10.1186/s11689-016-9168-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 08/17/2016] [Indexed: 11/12/2022] Open
Abstract
Background Visual speech cues influence different aspects of language acquisition. However, whether developmental language disorders may be associated with atypical processing of visual speech is unknown. In this study, we used behavioral and ERP measures to determine whether children with a history of SLI (H-SLI) differ from their age-matched typically developing (TD) peers in the ability to match auditory words with corresponding silent visual articulations. Methods Nineteen 7–13-year-old H-SLI children and 19 age-matched TD children participated in the study. Children first heard a word and then saw a speaker silently articulating a word. In half of trials, the articulated word matched the auditory word (congruent trials), while in another half, it did not (incongruent trials). Children specified whether the auditory and the articulated words matched. We examined ERPs elicited by the onset of visual stimuli (visual P1, N1, and P2) as well as ERPs elicited by the articulatory movements themselves—namely, N400 to incongruent articulations and late positive complex (LPC) to congruent articulations. We also examined whether ERP measures of visual speech processing could predict (1) children’s linguistic skills and (2) the use of visual speech cues when listening to speech-in-noise (SIN). Results H-SLI children were less accurate in matching auditory words with visual articulations. They had a significantly reduced P1 to the talker’s face and a smaller N400 to incongruent articulations. In contrast, congruent articulations elicited LPCs of similar amplitude in both groups of children. The P1 and N400 amplitude was significantly correlated with accuracy enhancement on the SIN task when seeing the talker’s face. Conclusions H-SLI children have poorly defined correspondences between speech sounds and visually observed articulatory movements that produce them.
Collapse
|
18
|
Wilson AH, Alsius A, Paré M, Munhall KG. Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:601-15. [PMID: 27537379 PMCID: PMC5280058 DOI: 10.1044/2016_jslhr-s-15-0092] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 09/16/2015] [Accepted: 10/07/2015] [Indexed: 06/06/2023]
Abstract
PURPOSE The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. METHOD We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent conditions (Experiment 1; N = 66). In Experiment 2 (N = 20), participants performed a visual-only speech perception task and in Experiment 3 (N = 20) an audiovisual task while having their gaze behavior monitored using eye-tracking equipment. RESULTS In the visual-only condition, increasing image resolution led to monotonic increases in performance, and proficient speechreaders were more affected by the removal of high spatial information than were poor speechreaders. The McGurk effect also increased with increasing visual resolution, although it was less affected by the removal of high-frequency information. Observers tended to fixate on the mouth more in visual-only perception, but gaze toward the mouth did not correlate with accuracy of silent speechreading or the magnitude of the McGurk effect. CONCLUSIONS The results suggest that individual differences in silent speechreading and the McGurk effect are not related. This conclusion is supported by differential influences of high-resolution visual information on the 2 tasks and differences in the pattern of gaze.
Collapse
Affiliation(s)
- Amanda H. Wilson
- Psychology Department, Queen's University, Kingston, Ontario, Canada
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Agnès Alsius
- Psychology Department, Queen's University, Kingston, Ontario, Canada
| | - Martin Paré
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| | - Kevin G. Munhall
- Psychology Department, Queen's University, Kingston, Ontario, Canada
- Centre for Neuroscience Studies, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
19
|
Kaganovich N, Schumaker J, Rowland C. Matching heard and seen speech: An ERP study of audiovisual word recognition. BRAIN AND LANGUAGE 2016; 157-158:14-24. [PMID: 27155219 PMCID: PMC4915735 DOI: 10.1016/j.bandl.2016.04.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 03/23/2016] [Accepted: 04/10/2016] [Indexed: 06/05/2023]
Abstract
Seeing articulatory gestures while listening to speech-in-noise (SIN) significantly improves speech understanding. However, the degree of this improvement varies greatly among individuals. We examined a relationship between two distinct stages of visual articulatory processing and the SIN accuracy by combining a cross-modal repetition priming task with ERP recordings. Participants first heard a word referring to a common object (e.g., pumpkin) and then decided whether the subsequently presented visual silent articulation matched the word they had just heard. Incongruent articulations elicited a significantly enhanced N400, indicative of a mismatch detection at the pre-lexical level. Congruent articulations elicited a significantly larger LPC, indexing articulatory word recognition. Only the N400 difference between incongruent and congruent trials was significantly correlated with individuals' SIN accuracy improvement in the presence of the talker's face.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States; Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2038, United States.
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States
| | - Courtney Rowland
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States
| |
Collapse
|
20
|
Kaganovich N, Schumaker J, Leonard LB, Gustafson D, Macias D. Children with a history of SLI show reduced sensitivity to audiovisual temporal asynchrony: an ERP study. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:1480-502. [PMID: 24686922 PMCID: PMC4266431 DOI: 10.1044/2014_jslhr-l-13-0192] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
PURPOSE The authors examined whether school-age children with a history of specific language impairment (H-SLI), their peers with typical development (TD), and adults differ in sensitivity to audiovisual temporal asynchrony and whether such difference stems from the sensory encoding of audiovisual information. METHOD Fifteen H-SLI children, 15 TD children, and 15 adults judged whether a flashed explosion-shaped figure and a 2-kHz pure tone occurred simultaneously. The stimuli were presented at 0-, 100-, 200-, 300-, 400-, and 500-ms temporal offsets. This task was combined with EEG recordings. RESULTS H-SLI children were profoundly less sensitive to temporal separations between auditory and visual modalities compared with their TD peers. Those H-SLI children who performed better at simultaneity judgment also had higher language aptitude. TD children were less accurate than adults, revealing a remarkably prolonged developmental course of the audiovisual temporal discrimination. Analysis of early event-related potential components suggested that poor sensory encoding was not a key factor in H-SLI children's reduced sensitivity to audiovisual asynchrony. CONCLUSIONS Audiovisual temporal discrimination is impaired in H-SLI children and is still immature during mid-childhood in TD children. The present findings highlight the need for further evaluation of the role of atypical audiovisual processing in the development of SLI.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue
University, 500 Oval Drive West Lafayette, IN 47907-2038
- Department of Psychological Sciences, Purdue University, 703 Third
Street, West Lafayette, IN 47907-2038
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue
University, 500 Oval Drive West Lafayette, IN 47907-2038
| | - Laurence B. Leonard
- Department of Speech, Language, and Hearing Sciences, Purdue
University, 500 Oval Drive West Lafayette, IN 47907-2038
| | - Dana Gustafson
- Department of Speech, Language, and Hearing Sciences, Purdue
University, 500 Oval Drive West Lafayette, IN 47907-2038
| | - Danielle Macias
- Department of Speech, Language, and Hearing Sciences, Purdue
University, 500 Oval Drive West Lafayette, IN 47907-2038
| |
Collapse
|
21
|
Irwin JR, Brancazio L. Seeing to hear? Patterns of gaze to speaking faces in children with autism spectrum disorders. Front Psychol 2014; 5:397. [PMID: 24847297 PMCID: PMC4021198 DOI: 10.3389/fpsyg.2014.00397] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/15/2014] [Indexed: 11/13/2022] Open
Abstract
Using eye-tracking methodology, gaze to a speaking face was compared in a group of children with autism spectrum disorders (ASD) and a group with typical development (TD). Patterns of gaze were observed under three conditions: audiovisual (AV) speech in auditory noise, visual only speech and an AV non-face, non-speech control. Children with ASD looked less to the face of the speaker and fixated less on the speakers’ mouth than TD controls. No differences in gaze were reported for the non-face, non-speech control task. Since the mouth holds much of the articulatory information available on the face, these findings suggest that children with ASD may have reduced access to critical linguistic information. This reduced access to visible articulatory information could be a contributor to the communication and language problems exhibited by children with ASD.
Collapse
Affiliation(s)
- Julia R Irwin
- Haskins Laboratories New Haven, CT, USA ; Department of Psychology, Southern Connecticut State University New Haven, CT, USA
| | - Lawrence Brancazio
- Haskins Laboratories New Haven, CT, USA ; Department of Psychology, Southern Connecticut State University New Haven, CT, USA
| |
Collapse
|
22
|
Irwin JR, Brancazio L. Seeing to hear? Patterns of gaze to speaking faces in children with autism spectrum disorders. Front Psychol 2014; 5:397. [PMID: 24847297 DOI: 10.3389/fpsyg.201400397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/15/2014] [Indexed: 05/25/2023] Open
Abstract
Using eye-tracking methodology, gaze to a speaking face was compared in a group of children with autism spectrum disorders (ASD) and a group with typical development (TD). Patterns of gaze were observed under three conditions: audiovisual (AV) speech in auditory noise, visual only speech and an AV non-face, non-speech control. Children with ASD looked less to the face of the speaker and fixated less on the speakers' mouth than TD controls. No differences in gaze were reported for the non-face, non-speech control task. Since the mouth holds much of the articulatory information available on the face, these findings suggest that children with ASD may have reduced access to critical linguistic information. This reduced access to visible articulatory information could be a contributor to the communication and language problems exhibited by children with ASD.
Collapse
Affiliation(s)
- Julia R Irwin
- Haskins Laboratories New Haven, CT, USA ; Department of Psychology, Southern Connecticut State University New Haven, CT, USA
| | - Lawrence Brancazio
- Haskins Laboratories New Haven, CT, USA ; Department of Psychology, Southern Connecticut State University New Haven, CT, USA
| |
Collapse
|