151
|
Wallace MT, Woynaroski TG, Stevenson RA. Multisensory Integration as a Window into Orderly and Disrupted Cognition and Communication. Annu Rev Psychol 2020; 71:193-219. [DOI: 10.1146/annurev-psych-010419-051112] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
During our everyday lives, we are confronted with a vast amount of information from several sensory modalities. This multisensory information needs to be appropriately integrated for us to effectively engage with and learn from our world. Research carried out over the last half century has provided new insights into the way such multisensory processing improves human performance and perception; the neurophysiological foundations of multisensory function; the time course for its development; how multisensory abilities differ in clinical populations; and, most recently, the links between multisensory processing and cognitive abilities. This review summarizes the extant literature on multisensory function in typical and atypical circumstances, discusses the implications of the work carried out to date for theory and research, and points toward next steps for advancing the field.
Collapse
Affiliation(s)
- Mark T. Wallace
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;,
- Departments of Psychology and Pharmacology, Vanderbilt University, Nashville, Tennessee 37232, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee 37232, USA
- Vanderbilt Kennedy Center, Nashville, Tennessee 37203, USA
| | - Tiffany G. Woynaroski
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;,
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, Tennessee 37232, USA
- Vanderbilt Kennedy Center, Nashville, Tennessee 37203, USA
| | - Ryan A. Stevenson
- Departments of Psychology and Psychiatry and Program in Neuroscience, University of Western Ontario, London, Ontario N6A 3K7, Canada
- Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada
| |
Collapse
|
152
|
Zinchenko A, Kotz SA, Schröger E, Kanske P. Moving towards dynamics: Emotional modulation of cognitive and emotional control. Int J Psychophysiol 2020; 147:193-201. [DOI: 10.1016/j.ijpsycho.2019.10.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 10/18/2019] [Accepted: 10/23/2019] [Indexed: 12/13/2022]
|
153
|
Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech. J Neurosci 2019; 40:1053-1065. [PMID: 31889007 DOI: 10.1523/jneurosci.1101-19.2019] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 11/28/2019] [Accepted: 12/04/2019] [Indexed: 11/21/2022] Open
Abstract
Lip-reading is crucial for understanding speech in challenging conditions. But how the brain extracts meaning from, silent, visual speech is still under debate. Lip-reading in silence activates the auditory cortices, but it is not known whether such activation reflects immediate synthesis of the corresponding auditory stimulus or imagery of unrelated sounds. To disentangle these possibilities, we used magnetoencephalography to evaluate how cortical activity in 28 healthy adult humans (17 females) entrained to the auditory speech envelope and lip movements (mouth opening) when listening to a spoken story without visual input (audio-only), and when seeing a silent video of a speaker articulating another story (video-only). In video-only, auditory cortical activity entrained to the absent auditory signal at frequencies <1 Hz more than to the seen lip movements. This entrainment process was characterized by an auditory-speech-to-brain delay of ∼70 ms in the left hemisphere, compared with ∼20 ms in audio-only. Entrainment to mouth opening was found in the right angular gyrus at <1 Hz, and in early visual cortices at 1-8 Hz. These findings demonstrate that the brain can use a silent lip-read signal to synthesize a coarse-grained auditory speech representation in early auditory cortices. Our data indicate the following underlying oscillatory mechanism: seeing lip movements first modulates neuronal activity in early visual cortices at frequencies that match articulatory lip movements; the right angular gyrus then extracts slower features of lip movements, mapping them onto the corresponding speech sound features; this information is fed to auditory cortices, most likely facilitating speech parsing.SIGNIFICANCE STATEMENT Lip-reading consists in decoding speech based on visual information derived from observation of a speaker's articulatory facial gestures. Lip-reading is known to improve auditory speech understanding, especially when speech is degraded. Interestingly, lip-reading in silence still activates the auditory cortices, even when participants do not know what the absent auditory signal should be. However, it was uncertain what such activation reflected. Here, using magnetoencephalographic recordings, we demonstrate that it reflects fast synthesis of the auditory stimulus rather than mental imagery of unrelated, speech or non-speech, sounds. Our results also shed light on the oscillatory dynamics underlying lip-reading.
Collapse
|
154
|
Prieur J, Barbu S, Blois‐Heulin C, Lemasson A. The origins of gestures and language: history, current advances and proposed theories. Biol Rev Camb Philos Soc 2019; 95:531-554. [DOI: 10.1111/brv.12576] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 11/30/2019] [Accepted: 12/03/2019] [Indexed: 12/16/2022]
Affiliation(s)
- Jacques Prieur
- Department of Education and PsychologyComparative Developmental Psychology, Freie Universität Berlin Berlin Germany
- Univ Rennes, Normandie Univ, CNRS, EthoS (Ethologie animale et humaine) – UMR 6552 F‐35380 Paimpont France
| | - Stéphanie Barbu
- Univ Rennes, Normandie Univ, CNRS, EthoS (Ethologie animale et humaine) – UMR 6552 F‐35380 Paimpont France
| | - Catherine Blois‐Heulin
- Univ Rennes, Normandie Univ, CNRS, EthoS (Ethologie animale et humaine) – UMR 6552 F‐35380 Paimpont France
| | - Alban Lemasson
- Univ Rennes, Normandie Univ, CNRS, EthoS (Ethologie animale et humaine) – UMR 6552 F‐35380 Paimpont France
| |
Collapse
|
155
|
Sensorimotor influences on speech perception in pre-babbling infants: Replication and extension of Bruderer et al. (2015). Psychon Bull Rev 2019; 26:1388-1399. [PMID: 31037603 DOI: 10.3758/s13423-019-01601-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The relationship between speech perception and production is central to understanding language processing, yet remains under debate, particularly in early development. Recent research suggests that in infants aged 6 months, when the native phonological system is still being established, sensorimotor information from the articulators influences speech perception: The placement of a teething toy restricting tongue-tip movements interfered with infants' discrimination of a non-native contrast, /Da/-/da/, that involves tongue-tip movement. This effect was selective: A different teething toy that prevented lip closure but not tongue-tip movement did not disrupt discrimination. We conducted two sets of studies to replicate and extend these findings. Experiments 1 and 2 replicated the study by Bruderer et al. (Proceedings of the National Academy of Sciences of the United States of America, 112 (44), 13531-13536, 2015), but with synthesized auditory stimuli. Infants discriminated the non-native contrast (dental /da/ - retroflex /Da/) (Experiment 1), but showed no evidence of discrimination when the tongue-tip movement was prevented with a teething toy (Experiment 2). Experiments 3 and 4 extended this work to a native phonetic contrast (bilabial /ba/ - dental /da/). Infants discriminated the distinction with no teething toy present (Experiment 3), but when they were given a teething toy that interfered only with lip closure, a movement involved in the production of /ba/, discrimination was disrupted (Experiment 4). Importantly, this was the same teething toy that did not interfere with discrimination of /da/-/Da/ in Bruderer et al. (2015). These findings reveal specificity in the relation between sensorimotor and perceptual processes in pre-babbling infants, and show generalizability to a second phonetic contrast.
Collapse
|
156
|
Morillon B, Arnal LH, Schroeder CE, Keitel A. Prominence of delta oscillatory rhythms in the motor cortex and their relevance for auditory and speech perception. Neurosci Biobehav Rev 2019; 107:136-142. [DOI: 10.1016/j.neubiorev.2019.09.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 07/25/2019] [Accepted: 09/09/2019] [Indexed: 01/21/2023]
|
157
|
Sorati M, Behne DM. Musical Expertise Affects Audiovisual Speech Perception: Findings From Event-Related Potentials and Inter-trial Phase Coherence. Front Psychol 2019; 10:2562. [PMID: 31803107 PMCID: PMC6874039 DOI: 10.3389/fpsyg.2019.02562] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/29/2019] [Indexed: 12/03/2022] Open
Abstract
In audiovisual speech perception, visual information from a talker's face during mouth articulation is available before the onset of the corresponding audio speech, and thereby allows the perceiver to use visual information to predict the upcoming audio. This prediction from phonetically congruent visual information modulates audiovisual speech perception and leads to a decrease in N1 and P2 amplitudes and latencies compared to the perception of audio speech alone. Whether audiovisual experience, such as with musical training, influences this prediction is unclear, but if so, may explain some of the variations observed in previous research. The current study addresses whether audiovisual speech perception is affected by musical training, first assessing N1 and P2 event-related potentials (ERPs) and in addition, inter-trial phase coherence (ITPC). Musicians and non-musicians are presented the syllable, /ba/ in audio only (AO), video only (VO), and audiovisual (AV) conditions. With the predictory effect of mouth movement isolated from the AV speech (AV-VO), results showed that, compared to audio speech, both groups have a lower N1 latency and P2 amplitude and latency. Moreover, they also showed lower ITPCs in the delta, theta, and beta bands in audiovisual speech perception. However, musicians showed significant suppression of N1 amplitude and desynchronization in the alpha band in audiovisual speech, not present for non-musicians. Collectively, the current findings indicate that early sensory processing can be modified by musical experience, which in turn can explain some of the variations in previous AV speech perception research.
Collapse
Affiliation(s)
- Marzieh Sorati
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| | | |
Collapse
|
158
|
Fu Z, Wu X, Chen J. Congruent audiovisual speech enhances auditory attention decoding with EEG. J Neural Eng 2019; 16:066033. [PMID: 31505476 DOI: 10.1088/1741-2552/ab4340] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE The auditory attention decoding (AAD) approach can be used to determine the identity of the attended speaker during an auditory selective attention task, by analyzing measurements of electroencephalography (EEG) data. The AAD approach has the potential to guide the design of speech enhancement algorithms in hearing aids, i.e. to identify the speech stream of listener's interest so that hearing aids algorithms can amplify the target speech and attenuate other distracting sounds. This would consequently result in improved speech understanding and communication and reduced cognitive load, etc. The present work aimed to investigate whether additional visual input (i.e. lipreading) would enhance the AAD performance for normal-hearing listeners. APPROACH In a two-talker scenario, where auditory stimuli of audiobooks narrated by two speakers were presented, multi-channel EEG signals were recorded while participants were selectively attending to one speaker and ignoring the other one. Speakers' mouth movements were recorded during narrating for providing visual stimuli. Stimulus conditions included audio-only, visual input congruent with either (i.e. attended or unattended) speaker, and visual input incongruent with either speaker. The AAD approach was performed separately for each condition to evaluate the effect of additional visual input on AAD. MAIN RESULTS Relative to the audio-only condition, the AAD performance was found improved by visual input only when it was congruent with the attended speech stream, and the improvement was about 14 percentage points on decoding accuracy. Cortical envelope tracking activities in both auditory and visual cortex were demonstrated stronger for the congruent audiovisual speech condition than other conditions. In addition, a higher AAD robustness was revealed for the congruent audiovisual condition, with reduced channel number and trial duration achieving higher accuracy than the audio-only condition. SIGNIFICANCE The present work complements previous studies and further manifests the feasibility of the AAD-guided design of hearing aids in daily face-to-face conversations. The present work also has a directive significance for designing a low-density EEG setup for the AAD approach.
Collapse
Affiliation(s)
- Zhen Fu
- Department of Machine Intelligence, Speech and Hearing Research Center, and Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, People's Republic of China
| | | | | |
Collapse
|
159
|
Oganian Y, Chang EF. A speech envelope landmark for syllable encoding in human superior temporal gyrus. SCIENCE ADVANCES 2019; 5:eaay6279. [PMID: 31976369 PMCID: PMC6957234 DOI: 10.1126/sciadv.aay6279] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Accepted: 09/16/2019] [Indexed: 05/13/2023]
Abstract
The most salient acoustic features in speech are the modulations in its intensity, captured by the amplitude envelope. Perceptually, the envelope is necessary for speech comprehension. Yet, the neural computations that represent the envelope and their linguistic implications are heavily debated. We used high-density intracranial recordings, while participants listened to speech, to determine how the envelope is represented in human speech cortical areas on the superior temporal gyrus (STG). We found that a well-defined zone in middle STG detects acoustic onset edges (local maxima in the envelope rate of change). Acoustic analyses demonstrated that timing of acoustic onset edges cues syllabic nucleus onsets, while their slope cues syllabic stress. Synthesized amplitude-modulated tone stimuli showed that steeper slopes elicited greater responses, confirming cortical encoding of amplitude change, not absolute amplitude. Overall, STG encoding of the timing and magnitude of acoustic onset edges underlies the perception of speech temporal structure.
Collapse
|
160
|
Pouw W, Dixon JA. Gesture Networks: Introducing Dynamic Time Warping and Network Analysis for the Kinematic Study of Gesture Ensembles. DISCOURSE PROCESSES 2019. [DOI: 10.1080/0163853x.2019.1678967] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Wim Pouw
- Center for the Ecological Study of Perception and Action University of Connecticut
- Department of Psychology, Educational, and Child Studies, Erasmus University Rotterdam
| | - James A. Dixon
- Center for the Ecological Study of Perception and Action University of Connecticut
| |
Collapse
|
161
|
Zoefel B, Allard I, Anil M, Davis MH. Perception of Rhythmic Speech Is Modulated by Focal Bilateral Transcranial Alternating Current Stimulation. J Cogn Neurosci 2019; 32:226-240. [PMID: 31659922 DOI: 10.1162/jocn_a_01490] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Several recent studies have used transcranial alternating current stimulation (tACS) to demonstrate a causal role of neural oscillatory activity in speech processing. In particular, it has been shown that the ability to understand speech in a multi-speaker scenario or background noise depends on the timing of speech presentation relative to simultaneously applied tACS. However, it is possible that tACS did not change actual speech perception but rather auditory stream segregation. In this study, we tested whether the phase relation between tACS and the rhythm of degraded words, presented in silence, modulates word report accuracy. We found strong evidence for a tACS-induced modulation of speech perception, but only if the stimulation was applied bilaterally using ring electrodes (not for unilateral left hemisphere stimulation with square electrodes). These results were only obtained when data were analyzed using a statistical approach that was identified as optimal in a previous simulation study. The effect was driven by a phasic disruption of word report scores. Our results suggest a causal role of neural entrainment for speech perception and emphasize the importance of optimizing stimulation protocols and statistical approaches for brain stimulation research.
Collapse
|
162
|
Lalonde K, Werner LA. Infants and Adults Use Visual Cues to Improve Detection and Discrimination of Speech in Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3860-3875. [PMID: 31618097 PMCID: PMC7201336 DOI: 10.1044/2019_jslhr-h-19-0106] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/30/2019] [Accepted: 07/08/2019] [Indexed: 06/10/2023]
Abstract
Purpose This study assessed the extent to which 6- to 8.5-month-old infants and 18- to 30-year-old adults detect and discriminate auditory syllables in noise better in the presence of visual speech than in auditory-only conditions. In addition, we examined whether visual cues to the onset and offset of the auditory signal account for this benefit. Method Sixty infants and 24 adults were randomly assigned to speech detection or discrimination tasks and were tested using a modified observer-based psychoacoustic procedure. Each participant completed 1-3 conditions: auditory-only, with visual speech, and with a visual signal that only cued the onset and offset of the auditory syllable. Results Mixed linear modeling indicated that infants and adults benefited from visual speech on both tasks. Adults relied on the onset-offset cue for detection, but the same cue did not improve their discrimination. The onset-offset cue benefited infants for both detection and discrimination. Whereas the onset-offset cue improved detection similarly for infants and adults, the full visual speech signal benefited infants to a lesser extent than adults on the discrimination task. Conclusions These results suggest that infants' use of visual onset-offset cues is mature, but their ability to use more complex visual speech cues is still developing. Additional research is needed to explore differences in audiovisual enhancement (a) of speech discrimination across speech targets and (b) with increasingly complex tasks and stimuli.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| | - Lynne A. Werner
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| |
Collapse
|
163
|
The impact of when, what and how predictions on auditory speech perception. Exp Brain Res 2019; 237:3143-3153. [DOI: 10.1007/s00221-019-05661-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 09/24/2019] [Indexed: 11/26/2022]
|
164
|
Abstract
OBJECTIVES The present study investigated presentation modality differences in lexical encoding and working memory representations of spoken words of older, hearing-impaired adults. Two experiments were undertaken: a memory-scanning experiment and a stimulus gating experiment. The primary objective of experiment 1 was to determine whether memory encoding and retrieval and scanning speeds are different for easily identifiable words presented in auditory-visual (AV), auditory-only (AO), and visual-only (VO) modalities. The primary objective of experiment 2 was to determine if memory encoding and retrieval speed differences observed in experiment 1 could be attributed to the early availability of AV speech information compared with AO or VO conditions. DESIGN Twenty-six adults over age 60 years with bilateral mild to moderate sensorineural hearing loss participated in experiment 1, and 24 adults who took part in experiment 1 participated in experiment 2. An item recognition reaction-time paradigm (memory-scanning) was used in experiment 1 to measure (1) lexical encoding speed, that is, the speed at which an easily identifiable word was recognized and placed into working memory, and (2) retrieval speed, that is, the speed at which words were retrieved from memory and compared with similarly encoded words (memory scanning) presented in AV, AO, and VO modalities. Experiment 2 used a time-gated word identification task to test whether the time course of stimulus information available to participants predicted the modality-related memory encoding and retrieval speed results from experiment 1. RESULTS The results of experiment 1 revealed significant differences among the modalities with respect to both memory encoding and retrieval speed, with AV fastest and VO slowest. These differences motivated an examination of the time course of stimulus information available as a function of modality. Results from experiment 2 indicated the encoding and retrieval speed advantages for AV and AO words compared with VO words were mostly driven by the time course of stimulus information. The AV advantage seen in encoding and retrieval speeds is likely due to a combination of robust stimulus information available to the listener earlier in time and lower attentional demands compared with AO or VO encoding and retrieval. CONCLUSIONS Significant modality differences in lexical encoding and memory retrieval speeds were observed across modalities. The memory scanning speed advantage observed for AV compared with AO or VO modalities was strongly related to the time course of stimulus information. In contrast, lexical encoding and retrieval speeds for VO words could not be explained by the time-course of stimulus information alone. Working memory processes for the VO modality may be impacted by greater attentional demands and less information availability compared with the AV and AO modalities. Overall, these results support the hypothesis that the presentation modality for speech inputs (AV, AO, or VO) affects how older adult listeners with hearing loss encode, remember, and retrieve what they hear.
Collapse
|
165
|
Meng Z, Han S, Liu P, Tong Y. Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3293-3306. [PMID: 29994138 DOI: 10.1109/tcyb.2018.2840090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
It is challenging to recognize facial action unit (AU) from spontaneous facial displays, especially when they are accompanied by speech. The major reason is that the information is extracted from a single source, i.e., the visual channel, in the current practice. However, facial activity is highly correlated with voice in natural human communications. Instead of solely improving visual observations, this paper presents a novel audiovisual fusion framework, which makes the best use of visual and acoustic cues in recognizing speech-related facial AUs. In particular, a dynamic Bayesian network is employed to explicitly model the semantic and dynamic physiological relationships between AUs and phonemes as well as measurement uncertainty. Experiments on a pilot audiovisual AU-coded database have demonstrated that the proposed framework significantly outperforms the state-of-the-art visual-based methods in terms of recognizing speech-related AUs, especially for those AUs whose visual observations are impaired during speech, and more importantly is also superior to audio-based methods and feature-level fusion methods, which employ low-level audio features, by explicitly modeling and exploiting physiological relationships between AUs and phonemes.
Collapse
|
166
|
Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019; 8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open
Abstract
Visual information about speech content from the talker's mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.
Collapse
Affiliation(s)
- Patrick J Karas
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - John F Magnotti
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Brian A Metzger
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Lin L Zhu
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Kristen B Smith
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Daniel Yoshor
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | | |
Collapse
|
167
|
Kaganovich N, Ancel E. Different neural processes underlie visual speech perception in school-age children and adults: An event-related potentials study. J Exp Child Psychol 2019; 184:98-122. [PMID: 31015101 PMCID: PMC6857813 DOI: 10.1016/j.jecp.2019.03.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 03/15/2019] [Accepted: 03/26/2019] [Indexed: 11/18/2022]
Abstract
The ability to use visual speech cues does not fully develop until late adolescence. The cognitive and neural processes underlying this slow maturation are not yet understood. We examined electrophysiological responses of younger (8-9 years) and older (11-12 years) children as well as adults elicited by visually perceived articulations in an audiovisual word matching task and related them to the amount of benefit gained during a speech-in-noise (SIN) perception task when seeing the talker's face. On each trial, participants first heard a word and, after a short pause, saw a speaker silently articulate a word. In half of the trials the articulated word matched the auditory word (congruent trials), whereas in the other half it did not (incongruent trials). In all three age groups, incongruent articulations elicited the N400 component and congruent articulations elicited the late positive complex (LPC). Groups did not differ in the mean amplitude of N400. The mean amplitude of LPC was larger in younger children compared with older children and adults. Importantly, the relationship between event-related potential measures and SIN performance varied by group. In 8- and 9-year-olds, neither component was predictive of SIN gain. The LPC amplitude predicted the SIN gain in older children but not in adults. Conversely, the N400 amplitude predicted the SIN gain in adults. We argue that although all groups were able to detect correspondences between auditory and visual word onsets at the phonemic/syllabic level, only adults could use this information for lexical access.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA; Department of Psychological Sciences, Purdue University, West Lafayette, IN 47907, USA.
| | - Elizabeth Ancel
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
168
|
Stuckenberg MV, Schröger E, Widmann A. Presentation Probability of Visual–Auditory Pairs Modulates Visually Induced Auditory Predictions. J Cogn Neurosci 2019; 31:1110-1125. [DOI: 10.1162/jocn_a_01398] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Predictions about forthcoming auditory events can be established on the basis of preceding visual information. Sounds being incongruent to predictive visual information have been found to elicit an enhanced negative ERP in the latency range of the auditory N1 compared with physically identical sounds being preceded by congruent visual information. This so-called incongruency response (IR) is interpreted as reduced prediction error for predicted sounds at a sensory level. The main purpose of this study was to examine the impact of probability manipulations on the IR. We manipulated the probability with which particular congruent visual–auditory pairs were presented (83/17 vs. 50/50 condition). This manipulation led to two conditions with different strengths of the association of visual with auditory information. A visual cue was presented either above or below a fixation cross and was followed by either a high- or low-pitched sound. In 90% of trials, the visual cue correctly predicted the subsequent sound. In one condition, one of the sounds was presented more frequently (83% of trials), whereas in the other condition both sounds were presented with equal probability (50% of trials). Therefore, in the 83/17 condition, one congruent combination of visual cue and corresponding sound was presented more frequently than the other combinations presumably leading to a stronger visual–auditory association. A significant IR for unpredicted compared with predicted but otherwise identical sounds was observed only in the 83/17 condition, but not in the 50/50 condition, where both congruent visual cue–sound combinations were presented with equal probability. We also tested whether the processing of the prediction violation is dependent on the task relevance of the visual information. Therefore, we contrasted a visual–auditory matching task with a pitch discrimination task. It turned out that the task only had an impact on the behavioral performance but not on the prediction error signals. Results suggest that the generation of visual-to-auditory sensory predictions is facilitated by a strong association between the visual cue and the predicted sound (83/17 condition) but is not influenced by the task relevance of the visual information.
Collapse
Affiliation(s)
- Maria V. Stuckenberg
- Institute of Psychology, University of Leipzig
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | | | - Andreas Widmann
- Institute of Psychology, University of Leipzig
- Leibniz Institute for Neurobiology, Magdeburg, Germany
| |
Collapse
|
169
|
Beyond neonatal imitation: Aerodigestive stereotypies, speech development, and social interaction in the extended perinatal period. Behav Brain Sci 2019; 40:e403. [PMID: 29342817 DOI: 10.1017/s0140525x17001923] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In our target article, we argued that the positive results of neonatal imitation are likely to be by-products of normal aerodigestive development. Our hypothesis elicited various responses on the role of social interaction in infancy, the methodological issues about imitation experiments, and the relation between the aerodigestive theory and the development of speech. Here we respond to the commentaries.
Collapse
|
170
|
Abstract
Visual cues facilitate speech perception during face-to-face communication, particularly in noisy environments. These visual-driven enhancements arise from both automatic lip-reading behaviors and attentional tuning to auditory-visual signals. However, in crowded settings, such as a cocktail party, how do we accurately bind the correct voice to the correct face, enabling the benefit of visual cues on speech perception processes? Previous research has emphasized that spatial and temporal alignment of the auditory-visual signals determines which voice is integrated with which speaking face. Here, we present a novel illusion demonstrating that when multiple faces and voices are presented in the presence of ambiguous temporal and spatial information as to which pairs of auditory-visual signals should be integrated, our perceptual system relies on identity information extracted from each signal to determine pairings. Data from three experiments demonstrate that expectations about an individual’s voice (based on their identity) can change where individuals perceive that voice to arise from.
Collapse
Affiliation(s)
- David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
171
|
Morita T, Koda H. Superregular grammars do not provide additional explanatory power but allow for a compact analysis of animal song. ROYAL SOCIETY OPEN SCIENCE 2019; 6:190139. [PMID: 31417719 PMCID: PMC6689648 DOI: 10.1098/rsos.190139] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 06/14/2019] [Indexed: 06/10/2023]
Abstract
A pervasive belief with regard to the differences between human language and animal vocal sequences (song) is that they belong to different classes of computational complexity, with animal song belonging to regular languages, whereas human language is superregular. This argument, however, lacks empirical evidence since superregular analyses of animal song are understudied. The goal of this paper is to perform a superregular analysis of animal song, using data from gibbons as a case study, and demonstrate that a superregular analysis can be effectively used with non-human data. A key finding is that a superregular analysis does not increase explanatory power but rather provides for compact analysis: fewer grammatical rules are necessary once superregularity is allowed. This pattern is analogous to a previous computational analysis of human language, and accordingly, the null hypothesis, that human language and animal song are governed by the same type of grammatical systems, cannot be rejected.
Collapse
Affiliation(s)
- T. Morita
- Primate Research Institute, Kyoto University, 41-2 Kanrin, Inuyama, Aichi 484-8506, Japan
| | - H. Koda
- Primate Research Institute, Kyoto University, 41-2 Kanrin, Inuyama, Aichi 484-8506, Japan
| |
Collapse
|
172
|
Holler J, Levinson SC. Multimodal Language Processing in Human Communication. Trends Cogn Sci 2019; 23:639-652. [PMID: 31235320 DOI: 10.1016/j.tics.2019.05.006] [Citation(s) in RCA: 106] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 05/17/2019] [Accepted: 05/21/2019] [Indexed: 11/25/2022]
Abstract
The natural ecology of human language is face-to-face interaction comprising the exchange of a plethora of multimodal signals. Trying to understand the psycholinguistic processing of language in its natural niche raises new issues, first and foremost the binding of multiple, temporally offset signals under tight time constraints posed by a turn-taking system. This might be expected to overload and slow our cognitive system, but the reverse is in fact the case. We propose cognitive mechanisms that may explain this phenomenon and call for a multimodal, situated psycholinguistic framework to unravel the full complexities of human language processing.
Collapse
Affiliation(s)
- Judith Holler
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, The Netherlands.
| | - Stephen C Levinson
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands; Centre for Language Studies, Radboud University Nijmegen, Nijmegen, The Netherlands
| |
Collapse
|
173
|
Solberg Økland H, Todorović A, Lüttke CS, McQueen JM, de Lange FP. Combined predictive effects of sentential and visual constraints in early audiovisual speech processing. Sci Rep 2019; 9:7870. [PMID: 31133646 PMCID: PMC6536519 DOI: 10.1038/s41598-019-44311-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 05/14/2019] [Indexed: 11/09/2022] Open
Abstract
In language comprehension, a variety of contextual cues act in unison to render upcoming words more or less predictable. As a sentence unfolds, we use prior context (sentential constraints) to predict what the next words might be. Additionally, in a conversation, we can predict upcoming sounds through observing the mouth movements of a speaker (visual constraints). In electrophysiological studies, effects of visual constraints have typically been observed early in language processing, while effects of sentential constraints have typically been observed later. We hypothesized that the visual and the sentential constraints might feed into the same predictive process such that effects of sentential constraints might also be detectable early in language processing through modulations of the early effects of visual salience. We presented participants with audiovisual speech while recording their brain activity with magnetoencephalography. Participants saw videos of a person saying sentences where the last word was either sententially constrained or not, and began with a salient or non-salient mouth movement. We found that sentential constraints indeed exerted an early (N1) influence on language processing. Sentential modulations of the N1 visual predictability effect were visible in brain areas associated with semantic processing, and were differently expressed in the two hemispheres. In the left hemisphere, visual and sentential constraints jointly suppressed the auditory evoked field, while the right hemisphere was sensitive to visual constraints only in the absence of strong sentential constraints. These results suggest that sentential and visual constraints can jointly influence even very early stages of audiovisual speech comprehension.
Collapse
Affiliation(s)
- Heidi Solberg Økland
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Ana Todorović
- Oxford Centre for Human Brain Activity, University of Oxford, Oxford, UK.
| | - Claudia S Lüttke
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - James M McQueen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.,Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
174
|
O'Sullivan AE, Lim CY, Lalor EC. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations. Eur J Neurosci 2019; 50:3282-3295. [PMID: 31013361 DOI: 10.1111/ejn.14425] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 03/25/2019] [Accepted: 04/17/2019] [Indexed: 11/30/2022]
Abstract
Recent work using electroencephalography has applied stimulus reconstruction techniques to identify the attended speaker in a cocktail party environment. The success of these approaches has been primarily based on the ability to detect cortical tracking of the acoustic envelope at the scalp level. However, most studies have ignored the effects of visual input, which is almost always present in naturalistic scenarios. In this study, we investigated the effects of visual input on envelope-based cocktail party decoding in two multisensory cocktail party situations: (a) Congruent AV-facing the attended speaker while ignoring another speaker represented by the audio-only stream and (b) Incongruent AV (eavesdropping)-attending the audio-only speaker while looking at the unattended speaker. We trained and tested decoders for each condition separately and found that we can successfully decode attention to congruent audiovisual speech and can also decode attention when listeners were eavesdropping, i.e., looking at the face of the unattended talker. In addition to this, we found alpha power to be a reliable measure of attention to the visual speech. Using parieto-occipital alpha power, we found that we can distinguish whether subjects are attending or ignoring the speaker's face. Considering the practical applications of these methods, we demonstrate that with only six near-ear electrodes we can successfully determine the attended speech. This work extends the current framework for decoding attention to speech to more naturalistic scenarios, and in doing so provides additional neural measures which may be incorporated to improve decoding accuracy.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Chantelle Y Lim
- Department of Biomedical Engineering, University of Rochester, Rochester, New York
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.,Department of Biomedical Engineering, University of Rochester, Rochester, New York.,Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York
| |
Collapse
|
175
|
Doelling KB, Assaneo MF, Bevilacqua D, Pesaran B, Poeppel D. An oscillator model better predicts cortical entrainment to music. Proc Natl Acad Sci U S A 2019; 116:10113-10121. [PMID: 31019082 PMCID: PMC6525506 DOI: 10.1073/pnas.1816414116] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A body of research demonstrates convincingly a role for synchronization of auditory cortex to rhythmic structure in sounds including speech and music. Some studies hypothesize that an oscillator in auditory cortex could underlie important temporal processes such as segmentation and prediction. An important critique of these findings raises the plausible concern that what is measured is perhaps not an oscillator but is instead a sequence of evoked responses. The two distinct mechanisms could look very similar in the case of rhythmic input, but an oscillator might better provide the computational roles mentioned above (i.e., segmentation and prediction). We advance an approach to adjudicate between the two models: analyzing the phase lag between stimulus and neural signal across different stimulation rates. We ran numerical simulations of evoked and oscillatory computational models, showing that in the evoked case,phase lag is heavily rate-dependent, while the oscillatory model displays marked phase concentration across stimulation rates. Next, we compared these model predictions with magnetoencephalography data recorded while participants listened to music of varying note rates. Our results show that the phase concentration of the experimental data is more in line with the oscillatory model than with the evoked model. This finding supports an auditory cortical signal that (i) contains components of both bottom-up evoked responses and internal oscillatory synchronization whose strengths are weighted by their appropriateness for particular stimulus types and (ii) cannot be explained by evoked responses alone.
Collapse
Affiliation(s)
- Keith B Doelling
- Department of Psychology, New York University, New York, NY 10003;
| | | | - Dana Bevilacqua
- Department of Psychology, New York University, New York, NY 10003
| | - Bijan Pesaran
- Center for Neural Science, New York University, New York, NY 10003
| | - David Poeppel
- Department of Psychology, New York University, New York, NY 10003
- Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, 60322 Frankfurt am Main, Germany
| |
Collapse
|
176
|
Sato W, Kochiyama T, Uono S, Sawada R, Kubota Y, Yoshimura S, Toichi M. Widespread and lateralized social brain activity for processing dynamic facial expressions. Hum Brain Mapp 2019; 40:3753-3768. [PMID: 31090126 DOI: 10.1002/hbm.24629] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Revised: 04/28/2019] [Accepted: 05/02/2019] [Indexed: 11/07/2022] Open
Abstract
Dynamic facial expressions of emotions constitute natural and powerful means of social communication in daily life. A number of previous neuroimaging studies have explored the neural mechanisms underlying the processing of dynamic facial expressions, and indicated the activation of certain social brain regions (e.g., the amygdala) during such tasks. However, the activated brain regions were inconsistent across studies, and their laterality was rarely evaluated. To investigate these issues, we measured brain activity using functional magnetic resonance imaging in a relatively large sample (n = 51) during the observation of dynamic facial expressions of anger and happiness and their corresponding dynamic mosaic images. The observation of dynamic facial expressions, compared with dynamic mosaics, elicited stronger activity in the bilateral posterior cortices, including the inferior occipital gyri, fusiform gyri, and superior temporal sulci. The dynamic facial expressions also activated bilateral limbic regions, including the amygdalae and ventromedial prefrontal cortices, more strongly versus mosaics. In the same manner, activation was found in the right inferior frontal gyrus (IFG) and left cerebellum. Laterality analyses comparing original and flipped images revealed right hemispheric dominance in the superior temporal sulcus and IFG and left hemispheric dominance in the cerebellum. These results indicated that the neural mechanisms underlying processing of dynamic facial expressions include widespread social brain regions associated with perceptual, emotional, and motor functions, and include a clearly lateralized (right cortical and left cerebellar) network like that involved in language processing.
Collapse
Affiliation(s)
- Wataru Sato
- Kokoro Research Center, Kyoto University, Kyoto, Japan
| | | | - Shota Uono
- Department of Neurodevelopmental Psychiatry, Habilitation and Rehabilitation, Kyoto University, Kyoto, Japan
| | - Reiko Sawada
- Department of Neurodevelopmental Psychiatry, Habilitation and Rehabilitation, Kyoto University, Kyoto, Japan
| | - Yasutaka Kubota
- Health and Medical Services Center, Shiga University, Hikone, Shiga, Japan
| | - Sayaka Yoshimura
- Department of Neurodevelopmental Psychiatry, Habilitation and Rehabilitation, Kyoto University, Kyoto, Japan
| | - Motomi Toichi
- Faculty of Human Health Science, Kyoto University, Kyoto, Japan.,The Organization for Promoting Neurodevelopmental Disorder Research, Kyoto, Japan
| |
Collapse
|
177
|
|
178
|
|
179
|
Imafuku M, Kanakogi Y, Butler D, Myowa M. Demystifying infant vocal imitation: The roles of mouth looking and speaker's gaze. Dev Sci 2019; 22:e12825. [PMID: 30980494 DOI: 10.1111/desc.12825] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 01/08/2019] [Accepted: 03/01/2019] [Indexed: 12/20/2022]
Abstract
Vocal imitation plays a fundamental role in human language acquisition from infancy. Little is known, however, about how infants imitate other's sounds. We focused on three factors: (a) whether infants receive information from upright faces, (b) the infant's observation of the speaker's mouth and (c) the speaker directing their gaze towards the infant. We recorded the eye movements of 6-month-olds who participated in experiments watching videos of a speaker producing vowel sounds. We found that an infants' tendency to vocally imitate such videos increased as a function of (a) seeing upright rather than inverted faces, (b) their increased looking towards the speaker's mouth and (c) whether the speaker directed their gaze towards, rather than away from infants. These latter findings are consistent with theories of motor resonance and natural pedagogy respectively. New light has been shed on the cues and underlying mechanisms linking infant speech perception and production.
Collapse
Affiliation(s)
- Masahiro Imafuku
- Graduate School of Education, Kyoto University, Kyoto, Japan.,Faculty of Education, Musashino University, Tokyo, Japan
| | | | - David Butler
- Graduate School of Education, Kyoto University, Kyoto, Japan.,The Institute for Social Neuroscience Psychology, Heidelberg, Victoria, Australia
| | - Masako Myowa
- Graduate School of Education, Kyoto University, Kyoto, Japan
| |
Collapse
|
180
|
Hearing-impaired listeners show increased audiovisual benefit when listening to speech in noise. Neuroimage 2019; 196:261-268. [PMID: 30978494 DOI: 10.1016/j.neuroimage.2019.04.017] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 04/02/2019] [Accepted: 04/04/2019] [Indexed: 11/22/2022] Open
Abstract
Recent studies provide evidence for changes in audiovisual perception as well as for adaptive cross-modal auditory cortex plasticity in older individuals with high-frequency hearing impairments (presbycusis). We here investigated whether these changes facilitate the use of visual information, leading to an increased audiovisual benefit of hearing-impaired individuals when listening to speech in noise. We used a naturalistic design in which older participants with a varying degree of high-frequency hearing loss attended to running auditory or audiovisual speech in noise and detected rare target words. Passages containing only visual speech served as a control condition. Simultaneously acquired scalp electroencephalography (EEG) data were used to study cortical speech tracking. Target word detection accuracy was significantly increased in the audiovisual as compared to the auditory listening condition. The degree of this audiovisual enhancement was positively related to individual high-frequency hearing loss and subjectively reported listening effort in challenging daily life situations, which served as a subjective marker of hearing problems. On the neural level, the early cortical tracking of the speech envelope was enhanced in the audiovisual condition. Similar to the behavioral findings, individual differences in the magnitude of the enhancement were positively associated with listening effort ratings. Our results therefore suggest that hearing-impaired older individuals make increased use of congruent visual information to compensate for the degraded auditory input.
Collapse
|
181
|
He L, Zhang Y, Dellwo V. Between-speaker variability and temporal organization of the first formant. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL209. [PMID: 31067968 DOI: 10.1121/1.5093450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Accepted: 02/19/2019] [Indexed: 06/09/2023]
Abstract
First formant (F1) trajectories of vocalic intervals were divided into positive and negative dynamics. Positive F1 dynamics were defined as the speeds of F1 increases to reach the maxima, and negative F1 dynamics as the speeds of F1 decreases away from the maxima. Mean, standard deviation, and sequential variability were measured for both dynamics. Results showed that measures of negative F1 dynamics explained more between-speaker variability, which was highly congruent with a previous study using intensity dynamics [He and Dellwo (2017). J. Acoust. Soc. Am. 141, EL488-EL494]. The results may be explained by speaker idiosyncratic articulation.
Collapse
Affiliation(s)
- Lei He
- Department of Linguistics, University of Tübingen, Wilhelmstrasse 19-23, DE-72074, Tübingen,
| | - Yu Zhang
- Institute of Computational Linguistics, University of Zurich, Andreasstrasse 15, CH-8050, Zurich, ,
| | - Volker Dellwo
- Institute of Computational Linguistics, University of Zurich, Andreasstrasse 15, CH-8050, Zurich, ,
| |
Collapse
|
182
|
Eye activity tracks task-relevant structures during speech and auditory sequence perception. Nat Commun 2018; 9:5374. [PMID: 30560906 PMCID: PMC6299078 DOI: 10.1038/s41467-018-07773-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 11/16/2018] [Indexed: 11/09/2022] Open
Abstract
The sensory and motor systems jointly contribute to complex behaviors, but whether motor systems are involved in high-order perceptual tasks such as speech and auditory comprehension remain debated. Here, we show that ocular muscle activity is synchronized to mentally constructed sentences during speech listening, in the absence of any sentence-related visual or prosodic cue. Ocular tracking of sentences is observed in the vertical electrooculogram (EOG), whether the eyes are open or closed, and in eye blinks measured by eyetracking. Critically, the phase of sentence-tracking ocular activity is strongly modulated by temporal attention, i.e., which word in a sentence is attended. Ocular activity also tracks high-level structures in non-linguistic auditory and visual sequences, and captures rapid fluctuations in temporal attention. Ocular tracking of non-visual rhythms possibly reflects global neural entrainment to task-relevant temporal structures across sensory and motor areas, which could serve to implement temporal attention and coordinate cortical networks.
Collapse
|
183
|
Pérez A, Dumas G, Karadag M, Duñabeitia JA. Differential brain-to-brain entrainment while speaking and listening in native and foreign languages. Cortex 2018; 111:303-315. [PMID: 30598230 DOI: 10.1016/j.cortex.2018.11.026] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 09/28/2018] [Accepted: 11/29/2018] [Indexed: 10/27/2022]
Abstract
The study explores interbrain neural coupling when interlocutors engage in a conversation whether it be in their native or nonnative language. To this end, electroencephalographic hyperscanning was used to study brain-to-brain phase synchronization during a two-person turn-taking verbal exchange with no visual contact, in either a native or a foreign language context. Results show that the coupling strength between brain signals is increased in both, the native language context and the foreign language context, specifically, in the alpha frequency band. A difference in brain-to speech entrainment to native and foreign languages is also shown. These results indicate that between brain similarities in the timing of neural activations and their spatial distributions change depending on the language code used. We argue that factors like linguistic alignment, joint attention and brain-entrainment to speech operate with a language-idiosyncratic neural configuration, modulating the alignment of neural activity between speakers and listeners. Other possible factors leading to the differential interbrain synchronization patterns as well as the potential features of brain-to-brain entrainment as a mechanism are briefly discussed. We concluded that linguistic context should be considered when addressing interpersonal communication. The findings here open doors to quantifying linguistic interactions.
Collapse
Affiliation(s)
- Alejandro Pérez
- Centre for French & Linguistics, University of Toronto Scarborough, Toronto, Canada; Psychology Department, University of Toronto Scarborough, Toronto, Canada; BCBL, Basque Center on Cognition Brain and Language, Donostia-San Sebastián, Spain.
| | - Guillaume Dumas
- Human Genetics and Cognitive Functions Unit, Institut Pasteur, Paris, France; CNRS UMR 3571 Genes, Synapses and Cognition, Institut Pasteur, Paris, France; Human Genetics and Cognitive Functions, University Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Melek Karadag
- Centre for Speech, Language and the Brain, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Jon Andoni Duñabeitia
- BCBL, Basque Center on Cognition Brain and Language, Donostia-San Sebastián, Spain; Facultad de Lenguas y Educación, Universidad Nebrija, Madrid, Spain
| |
Collapse
|
184
|
The effects of periodic interruptions on cortical entrainment to speech. Neuropsychologia 2018; 121:58-68. [DOI: 10.1016/j.neuropsychologia.2018.10.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 09/19/2018] [Accepted: 10/24/2018] [Indexed: 11/21/2022]
|
185
|
Biau E, Kotz SA. Lower Beta: A Central Coordinator of Temporal Prediction in Multimodal Speech. Front Hum Neurosci 2018; 12:434. [PMID: 30405383 PMCID: PMC6207805 DOI: 10.3389/fnhum.2018.00434] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 10/03/2018] [Indexed: 12/18/2022] Open
Abstract
How the brain decomposes and integrates information in multimodal speech perception is linked to oscillatory dynamics. However, how speech takes advantage of redundancy between different sensory modalities, and how this translates into specific oscillatory patterns remains unclear. We address the role of lower beta activity (~20 Hz), generally associated with motor functions, as an amodal central coordinator that receives bottom-up delta-theta copies from specific sensory areas and generate top-down temporal predictions for auditory entrainment. Dissociating temporal prediction from entrainment may explain how and why visual input benefits speech processing rather than adding cognitive load in multimodal speech perception. On the one hand, body movements convey prosodic and syllabic features at delta and theta rates (i.e., 1–3 Hz and 4–7 Hz). On the other hand, the natural precedence of visual input before auditory onsets may prepare the brain to anticipate and facilitate the integration of auditory delta-theta copies of the prosodic-syllabic structure. Here, we identify three fundamental criteria based on recent evidence and hypotheses, which support the notion that lower motor beta frequency may play a central and generic role in temporal prediction during speech perception. First, beta activity must respond to rhythmic stimulation across modalities. Second, beta power must respond to biological motion and speech-related movements conveying temporal information in multimodal speech processing. Third, temporal prediction may recruit a communication loop between motor and primary auditory cortices (PACs) via delta-to-beta cross-frequency coupling. We discuss evidence related to each criterion and extend these concepts to a beta-motivated framework of multimodal speech processing.
Collapse
Affiliation(s)
- Emmanuel Biau
- Basic and Applied Neuro Dynamics Laboratory, Department of Neuropsychology and Psychopharmacology, Faculty of Psychology and Neuroscience, University of Maastricht, Maastricht, Netherlands
| | - Sonja A Kotz
- Basic and Applied Neuro Dynamics Laboratory, Department of Neuropsychology and Psychopharmacology, Faculty of Psychology and Neuroscience, University of Maastricht, Maastricht, Netherlands.,Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
186
|
Birulés J, Bosch L, Brieke R, Pons F, Lewkowicz DJ. Inside bilingualism: Language background modulates selective attention to a talker's mouth. Dev Sci 2018; 22:e12755. [PMID: 30251757 DOI: 10.1111/desc.12755] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 08/08/2018] [Accepted: 09/21/2018] [Indexed: 11/28/2022]
Abstract
Previous findings indicate that bilingual Catalan/Spanish-learning infants attend more to the highly salient audiovisual redundancy cues normally available in a talker's mouth than do monolingual infants. Presumably, greater attention to such cues renders the challenge of learning two languages easier. Spanish and Catalan are, however, rhythmically and phonologically close languages. This raises the possibility that bilinguals only rely on redundant audiovisual cues when their languages are close. To test this possibility, we exposed 15-month-old and 4- to 6-year-old close-language bilinguals (Spanish/Catalan) and distant-language bilinguals (Spanish/"other") to videos of a talker uttering Spanish or Catalan (native) and English (non-native) monologues and recorded eye-gaze to the talker's eyes and mouth. At both ages, the close-language bilinguals attended more to the talker's mouth than the distant-language bilinguals. This indicates that language proximity modulates selective attention to a talker's mouth during early childhood and suggests that reliance on the greater salience of audiovisual speech cues depends on the difficulty of the speech-processing task.
Collapse
Affiliation(s)
- Joan Birulés
- Department of Cognition, Development and Educational Psychology, Universitat de Barcelona, Barcelona, Spain
| | - Laura Bosch
- Department of Cognition, Development and Educational Psychology, Universitat de Barcelona, Barcelona, Spain
| | - Ricarda Brieke
- Department of Cognition, Development and Educational Psychology, Universitat de Barcelona, Barcelona, Spain
| | - Ferran Pons
- Department of Cognition, Development and Educational Psychology, Universitat de Barcelona, Barcelona, Spain
| | - David J Lewkowicz
- Department of Communication Sciences and Disorders, Northeastern University, Boston, Massachusetts
| |
Collapse
|
187
|
Rajendran VG, Teki S, Schnupp JWH. Temporal Processing in Audition: Insights from Music. Neuroscience 2018; 389:4-18. [PMID: 29108832 PMCID: PMC6371985 DOI: 10.1016/j.neuroscience.2017.10.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 10/24/2017] [Accepted: 10/27/2017] [Indexed: 11/28/2022]
Abstract
Music is a curious example of a temporally patterned acoustic stimulus, and a compelling pan-cultural phenomenon. This review strives to bring some insights from decades of music psychology and sensorimotor synchronization (SMS) literature into the mainstream auditory domain, arguing that musical rhythm perception is shaped in important ways by temporal processing mechanisms in the brain. The feature that unites these disparate disciplines is an appreciation of the central importance of timing, sequencing, and anticipation. Perception of musical rhythms relies on an ability to form temporal predictions, a general feature of temporal processing that is equally relevant to auditory scene analysis, pattern detection, and speech perception. By bringing together findings from the music and auditory literature, we hope to inspire researchers to look beyond the conventions of their respective fields and consider the cross-disciplinary implications of studying auditory temporal sequence processing. We begin by highlighting music as an interesting sound stimulus that may provide clues to how temporal patterning in sound drives perception. Next, we review the SMS literature and discuss possible neural substrates for the perception of, and synchronization to, musical beat. We then move away from music to explore the perceptual effects of rhythmic timing in pattern detection, auditory scene analysis, and speech perception. Finally, we review the neurophysiology of general timing processes that may underlie aspects of the perception of rhythmic patterns. We conclude with a brief summary and outlook for future research.
Collapse
Affiliation(s)
- Vani G Rajendran
- Auditory Neuroscience Group, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, UK
| | - Sundeep Teki
- Auditory Neuroscience Group, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, UK
| | - Jan W H Schnupp
- City University of Hong Kong, Department of Biomedical Sciences, 31 To Yuen Street, Kowloon Tong, Hong Kong.
| |
Collapse
|
188
|
Milne AE, Petkov CI, Wilson B. Auditory and Visual Sequence Learning in Humans and Monkeys using an Artificial Grammar Learning Paradigm. Neuroscience 2018; 389:104-117. [PMID: 28687306 PMCID: PMC6278909 DOI: 10.1016/j.neuroscience.2017.06.059] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Revised: 06/26/2017] [Accepted: 06/27/2017] [Indexed: 11/30/2022]
Abstract
Language flexibly supports the human ability to communicate using different sensory modalities, such as writing and reading in the visual modality and speaking and listening in the auditory domain. Although it has been argued that nonhuman primate communication abilities are inherently multisensory, direct behavioural comparisons between human and nonhuman primates are scant. Artificial grammar learning (AGL) tasks and statistical learning experiments can be used to emulate ordering relationships between words in a sentence. However, previous comparative work using such paradigms has primarily investigated sequence learning within a single sensory modality. We used an AGL paradigm to evaluate how humans and macaque monkeys learn and respond to identically structured sequences of either auditory or visual stimuli. In the auditory and visual experiments, we found that both species were sensitive to the ordering relationships between elements in the sequences. Moreover, the humans and monkeys produced largely similar response patterns to the visual and auditory sequences, indicating that the sequences are processed in comparable ways across the sensory modalities. These results provide evidence that human sequence processing abilities stem from an evolutionarily conserved capacity that appears to operate comparably across the sensory modalities in both human and nonhuman primates. The findings set the stage for future neurobiological studies to investigate the multisensory nature of these sequencing operations in nonhuman primates and how they compare to related processes in humans.
Collapse
Affiliation(s)
- Alice E Milne
- Institute of Neuroscience, Henry Wellcome Building, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, United Kingdom; Centre for Behaviour and Evolution, Henry Wellcome Building, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, United Kingdom
| | - Christopher I Petkov
- Institute of Neuroscience, Henry Wellcome Building, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, United Kingdom; Centre for Behaviour and Evolution, Henry Wellcome Building, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, United Kingdom.
| | - Benjamin Wilson
- Institute of Neuroscience, Henry Wellcome Building, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, United Kingdom; Centre for Behaviour and Evolution, Henry Wellcome Building, Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, United Kingdom
| |
Collapse
|
189
|
Kotz S, Ravignani A, Fitch W. The Evolution of Rhythm Processing. Trends Cogn Sci 2018; 22:896-910. [DOI: 10.1016/j.tics.2018.08.002] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 07/25/2018] [Accepted: 08/02/2018] [Indexed: 01/14/2023]
|
190
|
Keitel C, Benwell CSY, Thut G, Gross J. No changes in parieto-occipital alpha during neural phase locking to visual quasi-periodic theta-, alpha-, and beta-band stimulation. Eur J Neurosci 2018; 48:2551-2565. [PMID: 29737585 PMCID: PMC6220955 DOI: 10.1111/ejn.13935] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Revised: 03/21/2018] [Accepted: 04/10/2018] [Indexed: 11/30/2022]
Abstract
Recent studies have probed the role of the parieto-occipital alpha rhythm (8-12 Hz) in human visual perception through attempts to drive its neural generators. To that end, paradigms have used high-intensity strictly-periodic visual stimulation that created strong predictions about future stimulus occurrences and repeatedly demonstrated perceptual consequences in line with an entrainment of parieto-occipital alpha. Our study, in turn, examined the case of alpha entrainment by non-predictive low-intensity quasi-periodic visual stimulation within theta- (4-7 Hz), alpha- (8-13 Hz), and beta (14-20 Hz) frequency bands, i.e., a class of stimuli that resemble the temporal characteristics of naturally occurring visual input more closely. We have previously reported substantial neural phase-locking in EEG recording during all three stimulation conditions. Here, we studied to what extent this phase-locking reflected an entrainment of intrinsic alpha rhythms in the same dataset. Specifically, we tested whether quasi-periodic visual stimulation affected several properties of parieto-occipital alpha generators. Speaking against an entrainment of intrinsic alpha rhythms by non-predictive low-intensity quasi-periodic visual stimulation, we found none of these properties to show differences between stimulation frequency bands. In particular, alpha band generators did not show increased sensitivity to alpha band stimulation and Bayesian inference corroborated evidence against an influence of stimulation frequency. Our results set boundary conditions for when and how to expect effects of entrainment of alpha generators and suggest that the parieto-occipital alpha rhythm may be more inert to external influences than previously thought.
Collapse
Affiliation(s)
- Christian Keitel
- Institute of Neuroscience and PsychologyUniversity of GlasgowGlasgowUK
| | | | - Gregor Thut
- Institute of Neuroscience and PsychologyUniversity of GlasgowGlasgowUK
| | - Joachim Gross
- Institute of Neuroscience and PsychologyUniversity of GlasgowGlasgowUK
- Institut für Biomagnetismus und BiosignalanalyseWestfälische Wilhelms‐UniversitätMünsterGermany
| |
Collapse
|
191
|
Nidiffer AR, Diederich A, Ramachandran R, Wallace MT. Multisensory perception reflects individual differences in processing temporal correlations. Sci Rep 2018; 8:14483. [PMID: 30262826 PMCID: PMC6160476 DOI: 10.1038/s41598-018-32673-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 09/04/2018] [Indexed: 12/27/2022] Open
Abstract
Sensory signals originating from a single event, such as audiovisual speech, are temporally correlated. Correlated signals are known to facilitate multisensory integration and binding. We sought to further elucidate the nature of this relationship, hypothesizing that multisensory perception will vary with the strength of audiovisual correlation. Human participants detected near-threshold amplitude modulations in auditory and/or visual stimuli. During audiovisual trials, the frequency and phase of auditory modulations were varied, producing signals with a range of correlations. After accounting for individual differences which likely reflect relative unisensory temporal characteristics in participants, we found that multisensory perception varied linearly with strength of correlation. Diffusion modelling confirmed this and revealed that stimulus correlation is supplied to the decisional system as sensory evidence. These data implicate correlation as an important cue in audiovisual feature integration and binding and suggest correlational strength as an important factor for flexibility in these processes.
Collapse
Affiliation(s)
- Aaron R Nidiffer
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN, USA.
| | - Adele Diederich
- Department of Health, Life Sciences & Chemistry Jacobs University, Bremen, Germany
| | - Ramnarayan Ramachandran
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt University, Nashville, TN, USA
| | - Mark T Wallace
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
- Department of Psychiatry, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
192
|
Park H, Thut G, Gross J. Predictive entrainment of natural speech through two fronto-motor top-down channels. LANGUAGE, COGNITION AND NEUROSCIENCE 2018; 35:739-751. [PMID: 32939354 PMCID: PMC7446042 DOI: 10.1080/23273798.2018.1506589] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Natural communication between interlocutors is enabled by the ability to predict upcoming speech in a given context. Previously we showed that these predictions rely on a fronto-motor top-down control of low-frequency oscillations in auditory-temporal brain areas that track intelligible speech. However, a comprehensive spatio-temporal characterisation of this effect is still missing. Here, we applied transfer entropy to source-localised MEG data during continuous speech perception. First, at low frequencies (1-4 Hz, brain delta phase to speech delta phase), predictive effects start in left fronto-motor regions and progress to right temporal regions. Second, at higher frequencies (14-18 Hz, brain beta power to speech delta phase), predictive patterns show a transition from left inferior frontal gyrus via left precentral gyrus to left primary auditory areas. Our results suggest a progression of prediction processes from higher-order to early sensory areas in at least two different frequency channels.
Collapse
Affiliation(s)
- Hyojin Park
- School of Psychology & Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham, UK
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
- Hyojin Park https://www.facebook.com/hyojin.park.1223
| | - Gregor Thut
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| | - Joachim Gross
- Institute for Biomagnetism and Biosignalanalysis, University of Muenster, Muenster, Germany
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
| |
Collapse
|
193
|
Park H, Ince RAA, Schyns PG, Thut G, Gross J. Representational interactions during audiovisual speech entrainment: Redundancy in left posterior superior temporal gyrus and synergy in left motor cortex. PLoS Biol 2018; 16:e2006558. [PMID: 30080855 PMCID: PMC6095613 DOI: 10.1371/journal.pbio.2006558] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 08/16/2018] [Accepted: 07/24/2018] [Indexed: 11/24/2022] Open
Abstract
Integration of multimodal sensory information is fundamental to many aspects of human behavior, but the neural mechanisms underlying these processes remain mysterious. For example, during face-to-face communication, we know that the brain integrates dynamic auditory and visual inputs, but we do not yet understand where and how such integration mechanisms support speech comprehension. Here, we quantify representational interactions between dynamic audio and visual speech signals and show that different brain regions exhibit different types of representational interaction. With a novel information theoretic measure, we found that theta (3-7 Hz) oscillations in the posterior superior temporal gyrus/sulcus (pSTG/S) represent auditory and visual inputs redundantly (i.e., represent common features of the two), whereas the same oscillations in left motor and inferior temporal cortex represent the inputs synergistically (i.e., the instantaneous relationship between audio and visual inputs is also represented). Importantly, redundant coding in the left pSTG/S and synergistic coding in the left motor cortex predict behavior-i.e., speech comprehension performance. Our findings therefore demonstrate that processes classically described as integration can have different statistical properties and may reflect distinct mechanisms that occur in different brain regions to support audiovisual speech comprehension.
Collapse
Affiliation(s)
- Hyojin Park
- School of Psychology, Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham, United Kingdom
| | - Robin A. A. Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Philippe G. Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Gregor Thut
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
- Institute for Biomagnetism and Biosignalanalysis, University of Muenster, Muenster, Germany
| |
Collapse
|
194
|
Dorn K, Weinert S, Falck-Ytter T. Watch and listen - A cross-cultural study of audio-visual-matching behavior in 4.5-month-old infants in German and Swedish talking faces. Infant Behav Dev 2018; 52:121-129. [PMID: 30007216 DOI: 10.1016/j.infbeh.2018.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 05/11/2018] [Accepted: 05/11/2018] [Indexed: 11/30/2022]
Abstract
Investigating infants' ability to match visual and auditory speech segments presented sequentially allows us to understand more about the type of information they encode in each domain, as well as their ability to relate the information. One previous study found that 4.5- month-old infants' preference for visual French or German speech depended on whether they had previously heard the respective language, suggesting a remarkable ability to encode and relate audio-visual speech cues and to use these to guide their looking behavior. However, French and German differ in their prosody, meaning that perhaps, the infants did not base their matching on phonological or phonetic cues, but on prosody patterns. The present study aimed to address this issue by tracking the eye gaze of 4.5-month-old German and Swedish infants cross-culturally in an intersensory matching procedure, comparing German and Swedish, two same-rhythm-class languages differing in phonetic and phonological attributes but not in prosody. Looking times indicated that even when distinctive prosodic cues were eliminated, 4.5- month-olds were able to extract subtle language properties and sequentially match visual and heard fluent speech. This outcome was the same for different individual speakers for the two modalities, ruling out the possibility that the infants matched speech patterns specific to one individual. This study confirms a remarkably early emerging ability of infants to match auditory and visual information. The fact that the types of information were matched despite sequential presentation demonstrates that the information is retained in short term memory, and thus goes beyond purely perceptual - here-and-now processing.
Collapse
Affiliation(s)
- Katharina Dorn
- Department of Developmental Psychology, Otto-Friedrich University, Bamberg, Germany.
| | - Sabine Weinert
- Department of Developmental Psychology, Otto-Friedrich University, Bamberg, Germany
| | | |
Collapse
|
195
|
Hernández-Gutiérrez D, Abdel Rahman R, Martín-Loeches M, Muñoz F, Schacht A, Sommer W. Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence. Cortex 2018; 104:12-25. [DOI: 10.1016/j.cortex.2018.03.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 11/08/2017] [Accepted: 03/31/2018] [Indexed: 11/26/2022]
|
196
|
Garrido-Vásquez P, Pell MD, Paulmann S, Kotz SA. Dynamic Facial Expressions Prime the Processing of Emotional Prosody. Front Hum Neurosci 2018; 12:244. [PMID: 29946247 PMCID: PMC6007283 DOI: 10.3389/fnhum.2018.00244] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 05/28/2018] [Indexed: 11/29/2022] Open
Abstract
Evidence suggests that emotion is represented supramodally in the human brain. Emotional facial expressions, which often precede vocally expressed emotion in real life, can modulate event-related potentials (N100 and P200) during emotional prosody processing. To investigate these cross-modal emotional interactions, two lines of research have been put forward: cross-modal integration and cross-modal priming. In cross-modal integration studies, visual and auditory channels are temporally aligned, while in priming studies they are presented consecutively. Here we used cross-modal emotional priming to study the interaction of dynamic visual and auditory emotional information. Specifically, we presented dynamic facial expressions (angry, happy, neutral) as primes and emotionally-intoned pseudo-speech sentences (angry, happy) as targets. We were interested in how prime-target congruency would affect early auditory event-related potentials, i.e., N100 and P200, in order to shed more light on how dynamic facial information is used in cross-modal emotional prediction. Results showed enhanced N100 amplitudes for incongruently primed compared to congruently and neutrally primed emotional prosody, while the latter two conditions did not significantly differ. However, N100 peak latency was significantly delayed in the neutral condition compared to the other two conditions. Source reconstruction revealed that the right parahippocampal gyrus was activated in incongruent compared to congruent trials in the N100 time window. No significant ERP effects were observed in the P200 range. Our results indicate that dynamic facial expressions influence vocal emotion processing at an early point in time, and that an emotional mismatch between a facial expression and its ensuing vocal emotional signal induces additional processing costs in the brain, potentially because the cross-modal emotional prediction mechanism is violated in case of emotional prime-target incongruency.
Collapse
Affiliation(s)
- Patricia Garrido-Vásquez
- Department of Experimental Psychology and Cognitive Science, Justus Liebig University Giessen, Giessen, Germany.,Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Marc D Pell
- School of Communication Sciences and Disorders, McGill University, Montreal, QC, Canada
| | - Silke Paulmann
- Department of Psychology, University of Essex, Colchester, United Kingdom
| | - Sonja A Kotz
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,Department of Neuropsychology and Psychopharmacology, University of Maastricht, Maastricht, Netherlands
| |
Collapse
|
197
|
Kikuchi Y, Sedley W, Griffiths TD, Petkov CI. Evolutionarily conserved neural signatures involved in sequencing predictions and their relevance for language. Curr Opin Behav Sci 2018; 21:145-153. [PMID: 30057937 PMCID: PMC6058086 DOI: 10.1016/j.cobeha.2018.05.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Predicting the occurrence of future events from prior ones is vital for animal perception and cognition. Although how such sequence learning (a form of relational knowledge) relates to particular operations in language remains controversial, recent evidence shows that sequence learning is disrupted in frontal lobe damage associated with aphasia. Also, neural sequencing predictions at different temporal scales resemble those involved in language operations occurring at similar scales. Furthermore, comparative work in humans and monkeys highlights evolutionarily conserved frontal substrates and predictive oscillatory signatures in the temporal lobe processing learned sequences of speech signals. Altogether this evidence supports a relational knowledge hypothesis of language evolution, proposing that language processes in humans are functionally integrated with an ancestral neural system for predictive sequence learning.
Collapse
Affiliation(s)
- Yukiko Kikuchi
- Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK
- Centre for Behaviour and Evolution, Newcastle University, Newcastle Upon Tyne, UK
| | - William Sedley
- Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK
| | - Timothy D Griffiths
- Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK
- Wellcome Trust Centre for Neuroimaging, University College London, UK
- Department of Neurosurgery, University of Iowa, Iowa City, USA
| | - Christopher I Petkov
- Institute of Neuroscience, Newcastle University Medical School, Newcastle Upon Tyne, UK
- Centre for Behaviour and Evolution, Newcastle University, Newcastle Upon Tyne, UK
| |
Collapse
|
198
|
He L. Development of speech rhythm in first language: The role of syllable intensity variability. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:EL463. [PMID: 29960429 DOI: 10.1121/1.5042083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The opening-closing alternations of the mouth were viewed as the articulatory basis of speech rhythm. Such articulatory cycles have been observed to highly correlate with the intensity curve of the speech signal. Analysis of the intensity variability in English monolingual children and adults revealed that (1) adults showed significantly smaller intensity variability than children, and (2) intensity variability decreased from intermediate-aged children to older children. Maturation of articulatory motor control is likely to be the main reason for the reduced variability in articulatory cycles, and hence smaller intensity variability in adults and older children.
Collapse
Affiliation(s)
- Lei He
- Department of Linguistics, University of Tübingen, Wilhelmstraße 19-23, DE-72074, Tübingen, Germany
| |
Collapse
|
199
|
Brooks CJ, Chan YM, Anderson AJ, McKendrick AM. Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss. Front Hum Neurosci 2018; 12:192. [PMID: 29867415 PMCID: PMC5954093 DOI: 10.3389/fnhum.2018.00192] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 04/20/2018] [Indexed: 11/26/2022] Open
Abstract
Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information.
Collapse
Affiliation(s)
- Cassandra J Brooks
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Yu Man Chan
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Andrew J Anderson
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Allison M McKendrick
- Department of Optometry and Vision Sciences, The University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
200
|
Hauswald A, Lithari C, Collignon O, Leonardelli E, Weisz N. A Visual Cortical Network for Deriving Phonological Information from Intelligible Lip Movements. Curr Biol 2018; 28:1453-1459.e3. [PMID: 29681475 PMCID: PMC5956463 DOI: 10.1016/j.cub.2018.03.044] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 02/25/2018] [Accepted: 03/20/2018] [Indexed: 11/26/2022]
Abstract
Successful lip-reading requires a mapping from visual to phonological information [1]. Recently, visual and motor cortices have been implicated in tracking lip movements (e.g., [2]). It remains unclear, however, whether visuo-phonological mapping occurs already at the level of the visual cortex-that is, whether this structure tracks the acoustic signal in a functionally relevant manner. To elucidate this, we investigated how the cortex tracks (i.e., entrains to) absent acoustic speech signals carried by silent lip movements. Crucially, we contrasted the entrainment to unheard forward (intelligible) and backward (unintelligible) acoustic speech. We observed that the visual cortex exhibited stronger entrainment to the unheard forward acoustic speech envelope compared to the unheard backward acoustic speech envelope. Supporting the notion of a visuo-phonological mapping process, this forward-backward difference of occipital entrainment was not present for actually observed lip movements. Importantly, the respective occipital region received more top-down input, especially from left premotor, primary motor, and somatosensory regions and, to a lesser extent, also from posterior temporal cortex. Strikingly, across participants, the extent of top-down modulation of the visual cortex stemming from these regions partially correlated with the strength of entrainment to absent acoustic forward speech envelope, but not to present forward lip movements. Our findings demonstrate that a distributed cortical network, including key dorsal stream auditory regions [3-5], influences how the visual cortex shows sensitivity to the intelligibility of speech while tracking silent lip movements.
Collapse
Affiliation(s)
- Anne Hauswald
- Centre for Cognitive Neurosciences, University of Salzburg, Salzburg 5020, Austria; CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy.
| | - Chrysa Lithari
- Centre for Cognitive Neurosciences, University of Salzburg, Salzburg 5020, Austria; CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy
| | - Olivier Collignon
- CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy; Institute of Research in Psychology & Institute of NeuroScience, Université catholique de Louvain, Louvain 1348, Belgium
| | - Elisa Leonardelli
- CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy
| | - Nathan Weisz
- Centre for Cognitive Neurosciences, University of Salzburg, Salzburg 5020, Austria; CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy.
| |
Collapse
|