1
|
Attention Drives Visual Processing and Audiovisual Integration During Multimodal Communication. J Neurosci 2024; 44:e0870232023. [PMID: 38199864 PMCID: PMC10919203 DOI: 10.1523/jneurosci.0870-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/12/2024] Open
Abstract
During communication in real-life settings, our brain often needs to integrate auditory and visual information and at the same time actively focus on the relevant sources of information, while ignoring interference from irrelevant events. The interaction between integration and attention processes remains poorly understood. Here, we use rapid invisible frequency tagging and magnetoencephalography to investigate how attention affects auditory and visual information processing and integration, during multimodal communication. We presented human participants (male and female) with videos of an actress uttering action verbs (auditory; tagged at 58 Hz) accompanied by two movie clips of hand gestures on both sides of fixation (attended stimulus tagged at 65 Hz; unattended stimulus tagged at 63 Hz). Integration difficulty was manipulated by a lower-order auditory factor (clear/degraded speech) and a higher-order visual semantic factor (matching/mismatching gesture). We observed an enhanced neural response to the attended visual information during degraded speech compared to clear speech. For the unattended information, the neural response to mismatching gestures was enhanced compared to matching gestures. Furthermore, signal power at the intermodulation frequencies of the frequency tags, indexing nonlinear signal interactions, was enhanced in the left frontotemporal and frontal regions. Focusing on the left inferior frontal gyrus, this enhancement was specific for the attended information, for those trials that benefitted from integration with a matching gesture. Together, our results suggest that attention modulates audiovisual processing and interaction, depending on the congruence and quality of the sensory input.
Collapse
|
2
|
The Self-reference Effect Can Modulate Language Syntactic Processing Even Without Explicit Awareness: An Electroencephalography Study. J Cogn Neurosci 2024; 36:460-474. [PMID: 38165746 DOI: 10.1162/jocn_a_02104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
Although it is well established that self-related information can rapidly capture our attention and bias cognitive functioning, whether this self-bias can affect language processing remains largely unknown. In addition, there is an ongoing debate as to the functional independence of language processes, notably regarding the syntactic domain. Hence, this study investigated the influence of self-related content on syntactic speech processing. Participants listened to sentences that could contain morphosyntactic anomalies while the masked face identity (self, friend, or unknown faces) was presented for 16 msec preceding the critical word. The language-related ERP components (left anterior negativity [LAN] and P600) appeared for all identity conditions. However, the largest LAN effect followed by a reduced P600 effect was observed for self-faces, whereas a larger LAN with no reduction of the P600 was found for friend faces compared with unknown faces. These data suggest that both early and late syntactic processes can be modulated by self-related content. In addition, alpha power was more suppressed over the left inferior frontal gyrus only when self-faces appeared before the critical word. This may reflect higher semantic demands concomitant to early syntactic operations (around 150-550 msec). Our data also provide further evidence of self-specific response, as reflected by the N250 component. Collectively, our results suggest that identity-related information is rapidly decoded from facial stimuli and may impact core linguistic processes, supporting an interactive view of syntactic processing. This study provides evidence that the self-reference effect can be extended to syntactic processing.
Collapse
|
3
|
Hand Gestures Have Predictive Potential During Conversation: An Investigation of the Timing of Gestures in Relation to Speech. Cogn Sci 2024; 48:e13407. [PMID: 38279899 DOI: 10.1111/cogs.13407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 07/09/2023] [Accepted: 01/10/2024] [Indexed: 01/29/2024]
Abstract
During face-to-face conversation, transitions between speaker turns are incredibly fast. These fast turn exchanges seem to involve next speakers predicting upcoming semantic information, such that next turn planning can begin before a current turn is complete. Given that face-to-face conversation also involves the use of communicative bodily signals, an important question is how bodily signals such as co-speech hand gestures play into these processes of prediction and fast responding. In this corpus study, we found that hand gestures that depict or refer to semantic information started before the corresponding information in speech, which held both for the onset of the gesture as a whole, as well as the onset of the stroke (the most meaningful part of the gesture). This early timing potentially allows listeners to use the gestural information to predict the corresponding semantic information to be conveyed in speech. Moreover, we provided further evidence that questions with gestures got faster responses than questions without gestures. However, we found no evidence for the idea that how much a gesture precedes its lexical affiliate (i.e., its predictive potential) relates to how fast responses were given. The findings presented here highlight the importance of the temporal relation between speech and gesture and help to illuminate the potential mechanisms underpinning multimodal language processing during face-to-face conversation.
Collapse
|
4
|
Embodied Processing at Six Linguistic Granularity Levels: A Consensus Paper. J Cogn 2023; 6:60. [PMID: 37841668 PMCID: PMC10573585 DOI: 10.5334/joc.231] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 06/13/2022] [Indexed: 10/17/2023] Open
Abstract
Language processing is influenced by sensorimotor experiences. Here, we review behavioral evidence for embodied and grounded influences in language processing across six linguistic levels of granularity. We examine (a) sub-word features, discussing grounded influences on iconicity (systematic associations between word form and meaning); (b) words, discussing boundary conditions and generalizations for the simulation of color, sensory modality, and spatial position; (c) sentences, discussing boundary conditions and applications of action direction simulation; (d) texts, discussing how the teaching of simulation can improve comprehension in beginning readers; (e) conversations, discussing how multi-modal cues improve turn taking and alignment; and (f) text corpora, discussing how distributional semantic models can reveal how grounded and embodied knowledge is encoded in texts. These approaches are converging on a convincing account of the psychology of language, but at the same time, there are important criticisms of the embodied approach and of specific experimental paradigms. The surest way forward requires the adoption of a wide array of scientific methods. By providing complimentary evidence, a combination of multiple methods on various levels of granularity can help us gain a more complete understanding of the role of embodiment and grounding in language processing.
Collapse
|
5
|
Studying naturalistic human communication using dual-EEG and audio-visual recordings. STAR Protoc 2023; 4:102370. [PMID: 37421617 PMCID: PMC10511849 DOI: 10.1016/j.xpro.2023.102370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 04/06/2023] [Accepted: 05/19/2023] [Indexed: 07/10/2023] Open
Abstract
We present a protocol to study naturalistic human communication using dual-electroencephalography (EEG) and audio-visual recordings. We describe preparatory steps for data collection including setup preparation, experiment design, and piloting. We then describe the data collection process in detail which consists of participant recruitment, experiment room preparation, and data collection. We also outline the kinds of research questions that can be addressed with the current protocol, including several analysis possibilities, from conversational to advanced time-frequency analyses. For complete details on the use and execution of this protocol, please refer to Drijvers and Holler (2022).1.
Collapse
|
6
|
Rapid invisible frequency tagging (RIFT): a promising technique to study neural and cognitive processing using naturalistic paradigms. Cereb Cortex 2023; 33:1626-1629. [PMID: 35452080 PMCID: PMC9977367 DOI: 10.1093/cercor/bhac160] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 03/30/2022] [Accepted: 03/31/2022] [Indexed: 11/14/2022] Open
Abstract
Frequency tagging has been successfully used to investigate selective stimulus processing in electroencephalography (EEG) or magnetoencephalography (MEG) studies. Recently, new projectors have been developed that allow for frequency tagging at higher frequencies (>60 Hz). This technique, rapid invisible frequency tagging (RIFT), provides two crucial advantages over low-frequency tagging as (i) it leaves low-frequency oscillations unperturbed, and thus open for investigation, and ii) it can render the tagging invisible, resulting in more naturalistic paradigms and a lack of participant awareness. The development of this technique has far-reaching implications as oscillations involved in cognitive processes can be investigated, and potentially manipulated, in a more naturalistic manner.
Collapse
|
7
|
Face-to-face spatial orientation fine-tunes the brain for neurocognitive processing in conversation. iScience 2022; 25:105413. [PMID: 36388995 PMCID: PMC9664361 DOI: 10.1016/j.isci.2022.105413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/20/2022] [Accepted: 10/18/2022] [Indexed: 11/07/2022] Open
Abstract
We here demonstrate that face-to-face spatial orientation induces a special ‘social mode’ for neurocognitive processing during conversation, even in the absence of visibility. Participants conversed face to face, face to face but visually occluded, and back to back to tease apart effects caused by seeing visual communicative signals and by spatial orientation. Using dual EEG, we found that (1) listeners’ brains engaged more strongly while conversing face to face than back to back, irrespective of the visibility of communicative signals, (2) listeners attended to speech more strongly in a back-to-back compared to a face-to-face spatial orientation without visibility; visual signals further reduced the attention needed; (3) the brains of interlocutors were more in sync in a face-to-face compared to a back-to-back spatial orientation, even when they could not see each other; visual signals further enhanced this pattern. Communicating in face-to-face spatial orientation is thus sufficient to induce a special ‘social mode’ which fine-tunes the brain for neurocognitive processing in conversation. Listeners engage more strongly when conversing face to face than back to back More attention to speech when conversing back to back than face to face Inter-brain synchrony was stronger face to face than back to back Face-to-face orientation induces a special social mode for neurocognitive processing
Collapse
|
8
|
The Effects of Iconic Gestures and Babble Language on Word Intelligibility in Sentence Context. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1822-1838. [PMID: 35439423 DOI: 10.1044/2022_jslhr-21-00387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE This study investigated to what extent iconic co-speech gestures help word intelligibility in sentence context in two different linguistic maskers (native vs. foreign). It was hypothesized that sentence recognition improves with the presence of iconic co-speech gestures and with foreign compared to native babble. METHOD Thirty-two native Dutch participants performed a Dutch word recognition task in context in which they were presented with videos in which an actress uttered short Dutch sentences (e.g., Ze begint te openen, "She starts to open"). Participants were presented with a total of six audiovisual conditions: no background noise (i.e., clear condition) without gesture, no background noise with gesture, French babble without gesture, French babble with gesture, Dutch babble without gesture, and Dutch babble with gesture; and they were asked to type down what was said by the Dutch actress. The accurate identification of the action verbs at the end of the target sentences was measured. RESULTS The results demonstrated that performance on the task was better in the gesture compared to the nongesture conditions (i.e., gesture enhancement effect). In addition, performance was better in French babble than in Dutch babble. CONCLUSIONS Listeners benefit from iconic co-speech gestures during communication and from foreign background speech compared to native. These insights into multimodal communication may be valuable to everyone who engages in multimodal communication and especially to a public who often works in public places where competing speech is present in the background.
Collapse
|
9
|
Embodied Space-pitch Associations are Shaped by Language. Cogn Sci 2022; 46:e13083. [PMID: 35188682 DOI: 10.1111/cogs.13083] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 11/29/2021] [Accepted: 12/02/2021] [Indexed: 11/28/2022]
Abstract
Height-pitch associations are claimed to be universal and independent of language, but this claim remains controversial. The present study sheds new light on this debate with a multimodal analysis of individual sound and melody descriptions obtained in an interactive communication paradigm with speakers of Dutch and Farsi. The findings reveal that, in contrast to Dutch speakers, Farsi speakers do not use a height-pitch metaphor consistently in speech. Both Dutch and Farsi speakers' co-speech gestures did reveal a mapping of higher pitches to higher space and lower pitches to lower space, and this gesture space-pitch mapping tended to co-occur with corresponding spatial words (high-low). However, this mapping was much weaker in Farsi speakers than Dutch speakers. This suggests that cross-linguistic differences shape the conceptualization of pitch and further calls into question the universality of height-pitch associations.
Collapse
|
10
|
Abstract
It is now widely accepted that the brunt of animal communication is conducted via several modalities, e.g. acoustic and visual, either simultaneously or sequentially. This is a laudable multimodal turn relative to traditional accounts of temporal aspects of animal communication which have focused on a single modality at a time. However, the fields that are currently contributing to the study of multimodal communication are highly varied, and still largely disconnected given their sole focus on a particular level of description or their particular concern with human or non-human animals. Here, we provide an integrative overview of converging findings that show how multimodal processes occurring at neural, bodily, as well as social interactional levels each contribute uniquely to the complex rhythms that characterize communication in human and non-human animals. Though we address findings for each of these levels independently, we conclude that the most important challenge in this field is to identify how processes at these different levels connect. This article is part of the theme issue 'Synchrony and rhythm interaction: from the brain to behavioural ecology'.
Collapse
|
11
|
Abstract
In everyday conversation, we are often challenged with communicating in non-ideal settings, such as in noise. Increased speech intensity and larger mouth movements are used to overcome noise in constrained settings (the Lombard effect). How we adapt to noise in face-to-face interaction, the natural environment of human language use, where manual gestures are ubiquitous, is currently unknown. We asked Dutch adults to wear headphones with varying levels of multi-talker babble while attempting to communicate action verbs to one another. Using quantitative motion capture and acoustic analyses, we found that (1) noise is associated with increased speech intensity and enhanced gesture kinematics and mouth movements, and (2) acoustic modulation only occurs when gestures are not present, while kinematic modulation occurs regardless of co-occurring speech. Thus, in face-to-face encounters the Lombard effect is not constrained to speech but is a multimodal phenomenon where the visual channel carries most of the communicative burden.
Collapse
|
12
|
Aging and working memory modulate the ability to benefit from visible speech and iconic gestures during speech-in-noise comprehension. PSYCHOLOGICAL RESEARCH 2021; 85:1997-2011. [PMID: 32627053 PMCID: PMC8289811 DOI: 10.1007/s00426-020-01363-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 05/20/2020] [Indexed: 12/19/2022]
Abstract
When comprehending speech-in-noise (SiN), younger and older adults benefit from seeing the speaker's mouth, i.e. visible speech. Younger adults additionally benefit from manual iconic co-speech gestures. Here, we investigate to what extent younger and older adults benefit from perceiving both visual articulators while comprehending SiN, and whether this is modulated by working memory and inhibitory control. Twenty-eight younger and 28 older adults performed a word recognition task in three visual contexts: mouth blurred (speech-only), visible speech, or visible speech + iconic gesture. The speech signal was either clear or embedded in multitalker babble. Additionally, there were two visual-only conditions (visible speech, visible speech + gesture). Accuracy levels for both age groups were higher when both visual articulators were present compared to either one or none. However, older adults received a significantly smaller benefit than younger adults, although they performed equally well in speech-only and visual-only word recognition. Individual differences in verbal working memory and inhibitory control partly accounted for age-related performance differences. To conclude, perceiving iconic gestures in addition to visible speech improves younger and older adults' comprehension of SiN. Yet, the ability to benefit from this additional visual information is modulated by age and verbal working memory. Future research will have to show whether these findings extend beyond the single word level.
Collapse
|
13
|
Rapid invisible frequency tagging reveals nonlinear integration of auditory and visual information. Hum Brain Mapp 2021; 42:1138-1152. [PMID: 33206441 PMCID: PMC7856646 DOI: 10.1002/hbm.25282] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 10/15/2020] [Accepted: 10/21/2020] [Indexed: 12/21/2022] Open
Abstract
During communication in real-life settings, the brain integrates information from auditory and visual modalities to form a unified percept of our environment. In the current magnetoencephalography (MEG) study, we used rapid invisible frequency tagging (RIFT) to generate steady-state evoked fields and investigated the integration of audiovisual information in a semantic context. We presented participants with videos of an actress uttering action verbs (auditory; tagged at 61 Hz) accompanied by a gesture (visual; tagged at 68 Hz, using a projector with a 1,440 Hz refresh rate). Integration difficulty was manipulated by lower-order auditory factors (clear/degraded speech) and higher-order visual factors (congruent/incongruent gesture). We identified MEG spectral peaks at the individual (61/68 Hz) tagging frequencies. We furthermore observed a peak at the intermodulation frequency of the auditory and visually tagged signals (fvisual - fauditory = 7 Hz), specifically when lower-order integration was easiest because signal quality was optimal. This intermodulation peak is a signature of nonlinear audiovisual integration, and was strongest in left inferior frontal gyrus and left temporal regions; areas known to be involved in speech-gesture integration. The enhanced power at the intermodulation frequency thus reflects the ease of lower-order audiovisual integration and demonstrates that speech-gesture information interacts in higher-order language areas. Furthermore, we provide a proof-of-principle of the use of RIFT to study the integration of audiovisual stimuli, in relation to, for instance, semantic context.
Collapse
|
14
|
Abstract
Rhythmic neural activity synchronizes with certain rhythmic behaviors, such as breathing, sniffing, saccades, and speech. The extent to which neural oscillations synchronize with higher-level and more complex behaviors is largely unknown. Here, we investigated electrophysiological synchronization with keyboard typing, which is an omnipresent behavior daily engaged by an uncountably large number of people. Keyboard typing is rhythmic, with frequency characteristics roughly the same as neural oscillatory dynamics associated with cognitive control, notably through midfrontal theta (4-7 Hz) oscillations. We tested the hypothesis that synchronization occurs between typing and midfrontal theta and breaks down when errors are committed. Thirty healthy participants typed words and sentences on a keyboard without visual feedback, while EEG was recorded. Typing rhythmicity was investigated by interkeystroke interval analyses and by a kernel density estimation method. We used a multivariate spatial filtering technique to investigate frequency-specific synchronization between typing and neuronal oscillations. Our results demonstrate theta rhythmicity in typing (around 6.5 Hz) through the two different behavioral analyses. Synchronization between typing and neuronal oscillations occurred at frequencies ranging from 4 to 15 Hz, but to a larger extent for lower frequencies. However, peak synchronization frequency was idiosyncratic across participants, therefore not specific to theta nor to midfrontal regions, and correlated somewhat with peak typing frequency. Errors and trials associated with stronger cognitive control were not associated with changes in synchronization at any frequency. As a whole, this study shows that brain-behavior synchronization does occur during keyboard typing but is not specific to midfrontal theta.
Collapse
|
15
|
Non-native Listeners Benefit Less from Gestures and Visible Speech than Native Listeners During Degraded Speech Comprehension. LANGUAGE AND SPEECH 2020; 63:209-220. [PMID: 30795715 PMCID: PMC7254629 DOI: 10.1177/0023830919831311] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Native listeners benefit from both visible speech and iconic gestures to enhance degraded speech comprehension (Drijvers & Ozyürek, 2017). We tested how highly proficient non-native listeners benefit from these visual articulators compared to native listeners. We presented videos of an actress uttering a verb in clear, moderately, or severely degraded speech, while her lips were blurred, visible, or visible and accompanied by a gesture. Our results revealed that unlike native listeners, non-native listeners were less likely to benefit from the combined enhancement of visible speech and gestures, especially since the benefit from visible speech was minimal when the signal quality was not sufficient.
Collapse
|
16
|
Degree of Language Experience Modulates Visual Attention to Visible Speech and Iconic Gestures During Clear and Degraded Speech Comprehension. Cogn Sci 2019; 43:e12789. [PMID: 31621126 PMCID: PMC6790953 DOI: 10.1111/cogs.12789] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 07/12/2019] [Accepted: 08/19/2019] [Indexed: 11/27/2022]
Abstract
Visual information conveyed by iconic hand gestures and visible speech can enhance speech comprehension under adverse listening conditions for both native and non-native listeners. However, how a listener allocates visual attention to these articulators during speech comprehension is unknown. We used eye-tracking to investigate whether and how native and highly proficient non-native listeners of Dutch allocated overt eye gaze to visible speech and gestures during clear and degraded speech comprehension. Participants watched video clips of an actress uttering a clear or degraded (6-band noise-vocoded) action verb while performing a gesture or not, and were asked to indicate the word they heard in a cued-recall task. Gestural enhancement was the largest (i.e., a relative reduction in reaction time cost) when speech was degraded for all listeners, but it was stronger for native listeners. Both native and non-native listeners mostly gazed at the face during comprehension, but non-native listeners gazed more often at gestures than native listeners. However, only native but not non-native listeners' gaze allocation to gestures predicted gestural benefit during degraded speech comprehension. We conclude that non-native listeners might gaze at gesture more as it might be more challenging for non-native listeners to resolve the degraded auditory cues and couple those cues to phonological information that is conveyed by visible speech. This diminished phonological knowledge might hinder the use of semantic information that is conveyed by gestures for non-native compared to native listeners. Our results demonstrate that the degree of language experience impacts overt visual attention to visual articulators, resulting in different visual benefits for native versus non-native listeners.
Collapse
|
17
|
Native and non-native listeners show similar yet distinct oscillatory dynamics when using gestures to access speech in noise. Neuroimage 2019; 194:55-67. [DOI: 10.1016/j.neuroimage.2019.03.032] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Revised: 03/12/2019] [Accepted: 03/15/2019] [Indexed: 11/30/2022] Open
|
18
|
Alpha and Beta Oscillations Index Semantic Congruency between Speech and Gestures in Clear and Degraded Speech. J Cogn Neurosci 2018; 30:1086-1097. [DOI: 10.1162/jocn_a_01301] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Previous work revealed that visual semantic information conveyed by gestures can enhance degraded speech comprehension, but the mechanisms underlying these integration processes under adverse listening conditions remain poorly understood. We used MEG to investigate how oscillatory dynamics support speech–gesture integration when integration load is manipulated by auditory (e.g., speech degradation) and visual semantic (e.g., gesture congruency) factors. Participants were presented with videos of an actress uttering an action verb in clear or degraded speech, accompanied by a matching (mixing gesture + “mixing”) or mismatching (drinking gesture + “walking”) gesture. In clear speech, alpha/beta power was more suppressed in the left inferior frontal gyrus and motor and visual cortices when integration load increased in response to mismatching versus matching gestures. In degraded speech, beta power was less suppressed over posterior STS and medial temporal lobe for mismatching compared with matching gestures, showing that integration load was lowest when speech was degraded and mismatching gestures could not be integrated and disambiguate the degraded signal. Our results thus provide novel insights on how low-frequency oscillatory modulations in different parts of the cortex support the semantic audiovisual integration of gestures in clear and degraded speech: When speech is clear, the left inferior frontal gyrus and motor and visual cortices engage because higher-level semantic information increases semantic integration load. When speech is degraded, posterior STS/middle temporal gyrus and medial temporal lobe are less engaged because integration load is lowest when visual semantic information does not aid lexical retrieval and speech and gestures cannot be integrated.
Collapse
|
19
|
Commentary: Transcranial Magnetic Stimulation over Left Inferior Frontal and Posterior Temporal Cortex Disrupts Gesture-Speech Integration. Front Hum Neurosci 2018; 12:256. [PMID: 29973874 PMCID: PMC6019840 DOI: 10.3389/fnhum.2018.00256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Accepted: 06/05/2018] [Indexed: 12/02/2022] Open
|
20
|
Native language status of the listener modulates the neural integration of speech and iconic gestures in clear and adverse listening conditions. BRAIN AND LANGUAGE 2018; 177-178:7-17. [PMID: 29421272 DOI: 10.1016/j.bandl.2018.01.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Revised: 01/05/2018] [Accepted: 01/15/2018] [Indexed: 06/08/2023]
Abstract
Native listeners neurally integrate iconic gestures with speech, which can enhance degraded speech comprehension. However, it is unknown how non-native listeners neurally integrate speech and gestures, as they might process visual semantic context differently than natives. We recorded EEG while native and highly-proficient non-native listeners watched videos of an actress uttering an action verb in clear or degraded speech, accompanied by a matching ('to drive'+driving gesture) or mismatching gesture ('to drink'+mixing gesture). Degraded speech elicited an enhanced N400 amplitude compared to clear speech in both groups, revealing an increase in neural resources needed to resolve the spoken input. A larger N400 effect was found in clear speech for non-natives compared to natives, but in degraded speech only for natives. Non-native listeners might thus process gesture more strongly than natives when speech is clear, but need more auditory cues to facilitate access to gestural semantic information when speech is degraded.
Collapse
|
21
|
Hearing and seeing meaning in noise: Alpha, beta, and gamma oscillations predict gestural enhancement of degraded speech comprehension. Hum Brain Mapp 2018; 39:2075-2087. [PMID: 29380945 PMCID: PMC5947738 DOI: 10.1002/hbm.23987] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 01/09/2018] [Accepted: 01/19/2018] [Indexed: 11/10/2022] Open
Abstract
During face‐to‐face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal neural oscillatory activity associated with gestural enhancement of degraded speech comprehension. Participants watched videos of an actress uttering clear or degraded speech, accompanied by a gesture or not and completed a cued‐recall task after watching every video. When gestures semantically disambiguated degraded speech comprehension, an alpha and beta power suppression and a gamma power increase revealed engagement and active processing in the hand‐area of the motor cortex, the extended language network (LIFG/pSTS/STG/MTG), medial temporal lobe, and occipital regions. These observed low‐ and high‐frequency oscillatory modulations in these areas support general unification, integration and lexical access processes during online language comprehension, and simulation of and increased visual attention to manual gestures over time. All individual oscillatory power modulations associated with gestural enhancement of degraded speech comprehension predicted a listener's correct disambiguation of the degraded verb after watching the videos. Our results thus go beyond the previously proposed role of oscillatory dynamics in unimodal degraded speech comprehension and provide first evidence for the role of low‐ and high‐frequency oscillations in predicting the integration of auditory and visual information at a semantic level.
Collapse
|
22
|
Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:212-222. [PMID: 27960196 DOI: 10.1044/2016_jslhr-h-16-0101] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 06/22/2016] [Indexed: 05/21/2023]
Abstract
PURPOSE This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. METHOD Twenty participants watched videos of an actress uttering an action verb and completed a free-recall task. The videos were presented in 3 speech conditions (2-band noise-vocoding, 6-band noise-vocoding, clear), 3 multimodal conditions (speech + lips blurred, speech + visible speech, speech + visible speech + gesture), and 2 visual-only conditions (visible speech, visible speech + gesture). RESULTS Accuracy levels were higher when both visual articulators were present compared with 1 or none. The enhancement effects of (a) visible speech, (b) gestural information on top of visible speech, and (c) both visible speech and iconic gestures were larger in 6-band than 2-band noise-vocoding or visual-only conditions. Gestural enhancement in 2-band noise-vocoding did not differ from gestural enhancement in visual-only conditions. CONCLUSIONS When perceiving degraded speech in a visual context, listeners benefit more from having both visual articulators present compared with 1. This benefit was larger at 6-band than 2-band noise-vocoding, where listeners can benefit from both phonological cues from visible speech and semantic cues from iconic gestures to disambiguate speech.
Collapse
|
23
|
Alpha and gamma band oscillations index differential processing of acoustically reduced and full forms. BRAIN AND LANGUAGE 2016; 153-154:27-37. [PMID: 26878718 DOI: 10.1016/j.bandl.2016.01.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 01/04/2016] [Accepted: 01/23/2016] [Indexed: 06/05/2023]
Abstract
Reduced forms like yeshay for yesterday often occur in conversations. Previous behavioral research reported a processing advantage for full over reduced forms. The present study investigated whether this processing advantage is reflected in a modulation of alpha (8-12Hz) and gamma (30+Hz) band activity. In three electrophysiological experiments, participants listened to full and reduced forms in isolation (Experiment 1), sentence-final position (Experiment 2), or mid-sentence position (Experiment 3). Alpha power was larger in response to reduced forms than to full forms, but only in Experiments 1 and 2. We interpret these increases in alpha power as reflections of higher auditory cognitive load. In all experiments, gamma power only increased in response to full forms, which we interpret as showing that lexical activation spreads more quickly through the semantic network for full than for reduced forms. These results confirm a processing advantage for full forms, especially in non-medial sentence position.
Collapse
|