1
|
Sato M. Audiovisual speech asynchrony asymmetrically modulates neural binding. Neuropsychologia 2024; 198:108866. [PMID: 38518889 DOI: 10.1016/j.neuropsychologia.2024.108866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/09/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024]
Abstract
Previous psychophysical and neurophysiological studies in young healthy adults have provided evidence that audiovisual speech integration occurs with a large degree of temporal tolerance around true simultaneity. To further determine whether audiovisual speech asynchrony modulates auditory cortical processing and neural binding in young healthy adults, N1/P2 auditory evoked responses were compared using an additive model during a syllable categorization task, without or with an audiovisual asynchrony ranging from 240 ms visual lead to 240 ms auditory lead. Consistent with previous psychophysical findings, the observed results converge in favor of an asymmetric temporal integration window. Three main findings were observed: 1) predictive temporal and phonetic cues from pre-phonatory visual movements before the acoustic onset appeared essential for neural binding to occur, 2) audiovisual synchrony, with visual pre-phonatory movements predictive of the onset of the acoustic signal, was a prerequisite for N1 latency facilitation, and 3) P2 amplitude suppression and latency facilitation occurred even when visual pre-phonatory movements were not predictive of the acoustic onset but the syllable to come. Taken together, these findings help further clarify how audiovisual speech integration partly operates through two stages of visually-based temporal and phonetic predictions.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, Aix-Marseille Université, Aix-en-Provence, France.
| |
Collapse
|
2
|
Tremblay P, Sato M. Movement-related cortical potential and speech-induced suppression during speech production in younger and older adults. BRAIN AND LANGUAGE 2024; 253:105415. [PMID: 38692095 DOI: 10.1016/j.bandl.2024.105415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 05/03/2024]
Abstract
With age, the speech system undergoes important changes that render speech production more laborious, slower and often less intelligible. And yet, the neural mechanisms that underlie these age-related changes remain unclear. In this EEG study, we examined two important mechanisms in speech motor control: pre-speech movement-related cortical potential (MRCP), which reflects speech motor planning, and speaking-induced suppression (SIS), which indexes auditory predictions of speech motor commands, in 20 healthy young and 20 healthy older adults. Participants undertook a vowel production task which was followed by passive listening of their own recorded vowels. Our results revealed extensive differences in MRCP in older compared to younger adults. Further, while longer latencies were observed in older adults on N1 and P2, in contrast, the SIS was preserved. The observed reduced MRCP appears as a potential explanatory mechanism for the known age-related slowing of speech production, while preserved SIS suggests intact motor-to-auditory integration.
Collapse
Affiliation(s)
- Pascale Tremblay
- Université Laval, Faculté de Médecine, Département de Réadaptation, Quebec City G1V 0A6, Canada; CERVO Brain Research Center, Quebec City G1J 2G3, Canada.
| | - Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, Aix-Marseille Université, Aix-en-Provence, France
| |
Collapse
|
3
|
Asaadi AH, Amiri SH, Bosaghzadeh A, Ebrahimpour R. Effects and prediction of cognitive load on encoding model of brain response to auditory and linguistic stimuli in educational multimedia. Sci Rep 2024; 14:9133. [PMID: 38644370 PMCID: PMC11033259 DOI: 10.1038/s41598-024-59411-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 04/10/2024] [Indexed: 04/23/2024] Open
Abstract
Multimedia is extensively used for educational purposes. However, certain types of multimedia lack proper design, which could impose a cognitive load on the user. Therefore, it is essential to predict cognitive load and understand how it impairs brain functioning. Participants watched a version of educational multimedia that applied Mayer's principles, followed by a version that did not. Meanwhile, their electroencephalography (EEG) was recorded. Subsequently, they participated in a post-test and completed a self-reported cognitive load questionnaire. The audio envelope and word frequency were extracted from the multimedia, and the temporal response functions (TRFs) were obtained using a linear encoding model. We observed that the behavioral data are different between the two groups and the TRFs of the two multimedia versions were different. We saw changes in the amplitude and latencies of both early and late components. In addition, correlations were found between behavioral data and the amplitude and latencies of TRF components. Cognitive load decreased participants' attention to the multimedia, and semantic processing of words also occurred with a delay and smaller amplitude. Hence, encoding models provide insights into the temporal and spatial mapping of the cognitive load activity, which could help us detect and reduce cognitive load in potential environments such as educational multimedia or simulators for different purposes.
Collapse
Affiliation(s)
- Amir Hosein Asaadi
- Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
- Institute for Research in Fundamental Sciences (IPM), School of Cognitive Sciences, Tehran, Iran
| | - S Hamid Amiri
- Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
| | - Alireza Bosaghzadeh
- Department of Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Islamic Republic of Iran
| | - Reza Ebrahimpour
- Center for Cognitive Science, Institute for Convergence Science and Technology (ICST), Sharif University of Technology, P.O. Box:14588-89694, Tehran, Iran.
| |
Collapse
|
4
|
Sato M. Competing influence of visual speech on auditory neural adaptation. BRAIN AND LANGUAGE 2023; 247:105359. [PMID: 37951157 DOI: 10.1016/j.bandl.2023.105359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 09/25/2023] [Accepted: 11/06/2023] [Indexed: 11/13/2023]
Abstract
Visual information from a speaker's face enhances auditory neural processing and speech recognition. To determine whether auditory memory can be influenced by visual speech, the degree of auditory neural adaptation of an auditory syllable preceded by an auditory, visual, or audiovisual syllable was examined using EEG. Consistent with previous findings and additional adaptation of auditory neurons tuned to acoustic features, stronger adaptation of N1, P2 and N2 auditory evoked responses was observed when the auditory syllable was preceded by an auditory compared to a visual syllable. However, although stronger than when preceded by a visual syllable, lower adaptation was observed when the auditory syllable was preceded by an audiovisual compared to an auditory syllable. In addition, longer N1 and P2 latencies were then observed. These results further demonstrate that visual speech acts on auditory memory but suggest competing visual influences in the case of audiovisual stimulation.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, UMR 7309 CNRS & Aix-Marseille Université, Aix-Marseille Université, 5 avenue Pasteur, Aix-en-Provence, France.
| |
Collapse
|
5
|
Hansmann D, Derrick D, Theys C. Hearing, seeing, and feeling speech: the neurophysiological correlates of trimodal speech perception. Front Hum Neurosci 2023; 17:1225976. [PMID: 37706173 PMCID: PMC10495990 DOI: 10.3389/fnhum.2023.1225976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 08/08/2023] [Indexed: 09/15/2023] Open
Abstract
Introduction To perceive speech, our brains process information from different sensory modalities. Previous electroencephalography (EEG) research has established that audio-visual information provides an advantage compared to auditory-only information during early auditory processing. In addition, behavioral research showed that auditory speech perception is not only enhanced by visual information but also by tactile information, transmitted by puffs of air arriving at the skin and aligned with speech. The current EEG study aimed to investigate whether the behavioral benefits of bimodal audio-aerotactile and trimodal audio-visual-aerotactile speech presentation are reflected in cortical auditory event-related neurophysiological responses. Methods To examine the influence of multimodal information on speech perception, 20 listeners conducted a two-alternative forced-choice syllable identification task at three different signal-to-noise levels. Results Behavioral results showed increased syllable identification accuracy when auditory information was complemented with visual information, but did not show the same effect for the addition of tactile information. Similarly, EEG results showed an amplitude suppression for the auditory N1 and P2 event-related potentials for the audio-visual and audio-visual-aerotactile modalities compared to auditory and audio-aerotactile presentations of the syllable/pa/. No statistically significant difference was present between audio-aerotactile and auditory-only modalities. Discussion Current findings are consistent with past EEG research showing a visually induced amplitude suppression during early auditory processing. In addition, the significant neurophysiological effect of audio-visual but not audio-aerotactile presentation is in line with the large benefit of visual information but comparatively much smaller effect of aerotactile information on auditory speech perception previously identified in behavioral research.
Collapse
Affiliation(s)
- Doreen Hansmann
- School of Psychology, Speech and Hearing, University of Canterbury, Christchurch, New Zealand
| | - Donald Derrick
- New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| | - Catherine Theys
- School of Psychology, Speech and Hearing, University of Canterbury, Christchurch, New Zealand
- New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
6
|
Caron CJ, Vilain C, Schwartz JL, Bayard C, Calcus A, Leybaert J, Colin C. The Effect of Cued-Speech (CS) Perception on Auditory Processing in Typically Hearing (TH) Individuals Who Are Either Naïve or Experienced CS Producers. Brain Sci 2023; 13:1036. [PMID: 37508968 PMCID: PMC10377728 DOI: 10.3390/brainsci13071036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/28/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
Cued Speech (CS) is a communication system that uses manual gestures to facilitate lipreading. In this study, we investigated how CS information interacts with natural speech using Event-Related Potential (ERP) analyses in French-speaking, typically hearing adults (TH) who were either naïve or experienced CS producers. The audiovisual (AV) presentation of lipreading information elicited an amplitude attenuation of the entire N1 and P2 complex in both groups, accompanied by N1 latency facilitation in the group of CS producers. Adding CS gestures to lipread information increased the magnitude of effects observed at the N1 time window, but did not enhance P2 amplitude attenuation. Interestingly, presenting CS gestures without lipreading information yielded distinct response patterns depending on participants' experience with the system. In the group of CS producers, AV perception of CS gestures facilitated the early stage of speech processing, while in the group of naïve participants, it elicited a latency delay at the P2 time window. These results suggest that, for experienced CS users, the perception of gestures facilitates early stages of speech processing, but when people are not familiar with the system, the perception of gestures impacts the efficiency of phonological decoding.
Collapse
Affiliation(s)
- Cora Jirschik Caron
- Center for Research Cognition and Neuroscience, Université Libre de Bruxelles, 1050 Bruxelles, Belgium
| | - Coriandre Vilain
- GIPSA-Lab, Université Grenoble Alpes, CNRS, Grenoble INP, 38402 Saint-Martin-d'Hères, France
| | - Jean-Luc Schwartz
- GIPSA-Lab, Université Grenoble Alpes, CNRS, Grenoble INP, 38402 Saint-Martin-d'Hères, France
| | - Clémence Bayard
- GIPSA-Lab, Université Grenoble Alpes, CNRS, Grenoble INP, 38402 Saint-Martin-d'Hères, France
| | - Axelle Calcus
- Center for Research Cognition and Neuroscience, Université Libre de Bruxelles, 1050 Bruxelles, Belgium
| | - Jacqueline Leybaert
- Center for Research Cognition and Neuroscience, Université Libre de Bruxelles, 1050 Bruxelles, Belgium
| | - Cécile Colin
- Center for Research Cognition and Neuroscience, Université Libre de Bruxelles, 1050 Bruxelles, Belgium
| |
Collapse
|
7
|
Sato M. The timing of visual speech modulates auditory neural processing. BRAIN AND LANGUAGE 2022; 235:105196. [PMID: 36343508 DOI: 10.1016/j.bandl.2022.105196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 10/15/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
In face-to-face communication, visual information from a speaker's face and time-varying kinematics of articulatory movements have been shown to fine-tune auditory neural processing and improve speech recognition. To further determine whether the timing of visual gestures modulates auditory cortical processing, three sets of syllables only differing in the onset and duration of silent prephonatory movements, before the acoustic speech signal, were contrasted using EEG. Despite similar visual recognition rates, an increase in the amplitude of P2 auditory evoked responses was observed from the longest to the shortest movements. Taken together, these results clarify how audiovisual speech perception partly operates through visually-based predictions and related processing time, with acoustic-phonetic neural processing paralleling the timing of visual prephonatory gestures.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, Aix-Marseille Université, Aix-en-Provence, France.
| |
Collapse
|
8
|
Sato M. Motor and visual influences on auditory neural processing during speaking and listening. Cortex 2022; 152:21-35. [DOI: 10.1016/j.cortex.2022.03.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 02/02/2022] [Accepted: 03/15/2022] [Indexed: 11/03/2022]
|
9
|
Ihara AS, Matsumoto A, Ojima S, Katayama J, Nakamura K, Yokota Y, Watanabe H, Naruse Y. Prediction of Second Language Proficiency Based on Electroencephalographic Signals Measured While Listening to Natural Speech. Front Hum Neurosci 2021; 15:665809. [PMID: 34335208 PMCID: PMC8322447 DOI: 10.3389/fnhum.2021.665809] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 06/22/2021] [Indexed: 12/04/2022] Open
Abstract
This study had two goals: to clarify the relationship between electroencephalographic (EEG) features estimated while non-native speakers listened to a second language (L2) and their proficiency in L2 determined by a conventional paper test and to provide a predictive model for L2 proficiency based on EEG features. We measured EEG signals from 205 native Japanese speakers, who varied widely in English proficiency while they listened to natural speech in English. Following the EEG measurement, they completed a conventional English listening test for Japanese speakers. We estimated multivariate temporal response functions separately for word class, speech rate, word position, and parts of speech. We found significant negative correlations between listening score and 17 EEG features, which included peak latency of early components (corresponding to N1 and P2) for both open and closed class words and peak latency and amplitude of a late component (corresponding to N400) for open class words. On the basis of the EEG features, we generated a predictive model for Japanese speakers’ English listening proficiency. The correlation coefficient between the true and predicted listening scores was 0.51. Our results suggest that L2 or foreign language ability can be assessed using neural signatures measured while listening to natural speech, without the need of a conventional paper test.
Collapse
Affiliation(s)
- Aya S Ihara
- National Institute of Information and Communications Technology, and Osaka University, Kobe, Japan
| | - Atsushi Matsumoto
- National Institute of Information and Communications Technology, and Osaka University, Kobe, Japan
| | - Shiro Ojima
- Department of English, College of Education, Yokohama National University, Yokohama, Japan
| | - Jun'ichi Katayama
- Department of Psychological Science, and Center for Applied Psychological Science (CAPS), Kwansei Gakuin University, Nishinomiya, Japan
| | | | - Yusuke Yokota
- National Institute of Information and Communications Technology, and Osaka University, Kobe, Japan
| | - Hiroki Watanabe
- National Institute of Information and Communications Technology, and Osaka University, Kobe, Japan
| | - Yasushi Naruse
- National Institute of Information and Communications Technology, and Osaka University, Kobe, Japan
| |
Collapse
|
10
|
Lindborg A, Andersen TS. Bayesian binding and fusion models explain illusion and enhancement effects in audiovisual speech perception. PLoS One 2021; 16:e0246986. [PMID: 33606815 PMCID: PMC7895372 DOI: 10.1371/journal.pone.0246986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 01/31/2021] [Indexed: 11/24/2022] Open
Abstract
Speech is perceived with both the ears and the eyes. Adding congruent visual speech improves the perception of a faint auditory speech stimulus, whereas adding incongruent visual speech can alter the perception of the utterance. The latter phenomenon is the case of the McGurk illusion, where an auditory stimulus such as e.g. "ba" dubbed onto a visual stimulus such as "ga" produces the illusion of hearing "da". Bayesian models of multisensory perception suggest that both the enhancement and the illusion case can be described as a two-step process of binding (informed by prior knowledge) and fusion (informed by the information reliability of each sensory cue). However, there is to date no study which has accounted for how they each contribute to audiovisual speech perception. In this study, we expose subjects to both congruent and incongruent audiovisual speech, manipulating the binding and the fusion stages simultaneously. This is done by varying both temporal offset (binding) and auditory and visual signal-to-noise ratio (fusion). We fit two Bayesian models to the behavioural data and show that they can both account for the enhancement effect in congruent audiovisual speech, as well as the McGurk illusion. This modelling approach allows us to disentangle the effects of binding and fusion on behavioural responses. Moreover, we find that these models have greater predictive power than a forced fusion model. This study provides a systematic and quantitative approach to measuring audiovisual integration in the perception of the McGurk illusion as well as congruent audiovisual speech, which we hope will inform future work on audiovisual speech perception.
Collapse
Affiliation(s)
- Alma Lindborg
- Department of Psychology, University of Potsdam, Potsdam, Germany
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Tobias S. Andersen
- Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
11
|
Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. J Neurosci 2020; 40:6938-6948. [PMID: 32727820 PMCID: PMC7470920 DOI: 10.1523/jneurosci.0279-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/22/2022] Open
Abstract
Experimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech, but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory-only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in which visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.SIGNIFICANCE STATEMENT Understanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.
Collapse
|
12
|
Carmona L, Diez PF, Laciar E, Mut V. Multisensory Stimulation and EEG Recording Below the Hair-Line: A New Paradigm on Brain Computer Interfaces. IEEE Trans Neural Syst Rehabil Eng 2020; 28:825-831. [PMID: 32149649 DOI: 10.1109/tnsre.2020.2979684] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
To test the feasibility of implementing multisensory (auditory and visual) stimulation in combination with electrodes placed on non-hair positions to design more efficient and comfortable Brain-computer interfaces (BCI). Fifteen volunteers participated in the experiments. They were stimulated by visual, auditory and multisensory stimuli set at 37, 38, 39 and 40Hz and at different phases (0°, 90°, 180° and 270°). The electroencephalogram (EEG) was measured from Oz, T7, T8, Tp9 and Tp10 positions. To evaluate the amplitude of the visual and auditory evoked potentials, the signal-to-noise ratio (SNR) was used and the accuracy of detection was calculated using canonical correlation analysis. Additionally, the volunteers were asked about the discomfort of each kind of stimulus. The multisensory stimulation allows for attaining higher SNR on every electrode. Non-hair (Tp9 and Tp10) positions attained SNR and accuracy similar to the ones obtained from occipital positions on visual stimulation. No significant difference was found on the discomfort produced by each kind of stimulation. The results demonstrated that multisensory stimulation can help in obtaining high amplitude steady-state evoked responses with a similar discomfort level. Then, it is possible to design a more efficient and comfortable hybrid-BCI based on multisensory stimulation and electrodes on non-hair positions. The current article proposes a new paradigm for hybrid-BCI based on steady-state evoked potentials measured from the area behind-the-ears and elicited by multisensory stimulation, thus, allowing subjects to achieve similar performance to the one achieved by visual-occipital BCI, but measuring the EEG on a more comfortable electrode location.
Collapse
|
13
|
The impact of when, what and how predictions on auditory speech perception. Exp Brain Res 2019; 237:3143-3153. [DOI: 10.1007/s00221-019-05661-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 09/24/2019] [Indexed: 11/26/2022]
|
14
|
Lindborg A, Baart M, Stekelenburg JJ, Vroomen J, Andersen TS. Speech-specific audiovisual integration modulates induced theta-band oscillations. PLoS One 2019; 14:e0219744. [PMID: 31310616 PMCID: PMC6634411 DOI: 10.1371/journal.pone.0219744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4–8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.
Collapse
Affiliation(s)
- Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
15
|
Simon DM, Wallace MT. Integration and Temporal Processing of Asynchronous Audiovisual Speech. J Cogn Neurosci 2018; 30:319-337. [DOI: 10.1162/jocn_a_01205] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Multisensory integration of visual mouth movements with auditory speech is known to offer substantial perceptual benefits, particularly under challenging (i.e., noisy) acoustic conditions. Previous work characterizing this process has found that ERPs to auditory speech are of shorter latency and smaller magnitude in the presence of visual speech. We sought to determine the dependency of these effects on the temporal relationship between the auditory and visual speech streams using EEG. We found that reductions in ERP latency and suppression of ERP amplitude are maximal when the visual signal precedes the auditory signal by a small interval and that increasing amounts of asynchrony reduce these effects in a continuous manner. Time–frequency analysis revealed that these effects are found primarily in the theta (4–8 Hz) and alpha (8–12 Hz) bands, with a central topography consistent with auditory generators. Theta effects also persisted in the lower portion of the band (3.5–5 Hz), and this late activity was more frontally distributed. Importantly, the magnitude of these late theta oscillations not only differed with the temporal characteristics of the stimuli but also served to predict participants' task performance. Our analysis thus reveals that suppression of single-trial brain responses by visual speech depends strongly on the temporal concordance of the auditory and visual inputs. It further illustrates that processes in the lower theta band, which we suggest as an index of incongruity processing, might serve to reflect the neural correlates of individual differences in multisensory temporal perception.
Collapse
|
16
|
Shatzer H, Shen S, Kerlin JR, Pitt MA, Shahin AJ. Neurophysiology underlying influence of stimulus reliability on audiovisual integration. Eur J Neurosci 2018; 48:2836-2848. [PMID: 29363844 DOI: 10.1111/ejn.13843] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 12/15/2017] [Accepted: 01/08/2018] [Indexed: 12/01/2022]
Abstract
We tested the predictions of the dynamic reweighting model (DRM) of audiovisual (AV) speech integration, which posits that spectrotemporally reliable (informative) AV speech stimuli induce a reweighting of processing from low-level to high-level auditory networks. This reweighting decreases sensitivity to acoustic onsets and in turn increases tolerance to AV onset asynchronies (AVOA). EEG was recorded while subjects watched videos of a speaker uttering trisyllabic nonwords that varied in spectrotemporal reliability and asynchrony of the visual and auditory inputs. Subjects judged the stimuli as in-sync or out-of-sync. Results showed that subjects exhibited greater AVOA tolerance for non-blurred than blurred visual speech and for less than more degraded acoustic speech. Increased AVOA tolerance was reflected in reduced amplitude of the P1-P2 auditory evoked potentials, a neurophysiological indication of reduced sensitivity to acoustic onsets and successful AV integration. There was also sustained visual alpha band (8-14 Hz) suppression (desynchronization) following acoustic speech onsets for non-blurred vs. blurred visual speech, consistent with continuous engagement of the visual system as the speech unfolds. The current findings suggest that increased spectrotemporal reliability of acoustic and visual speech promotes robust AV integration, partly by suppressing sensitivity to acoustic onsets, in support of the DRM's reweighting mechanism. Increased visual signal reliability also sustains the engagement of the visual system with the auditory system to maintain alignment of information across modalities.
Collapse
Affiliation(s)
- Hannah Shatzer
- Department of Psychology, The Ohio State University, Columbus, OH, USA
| | - Stanley Shen
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| | - Jess R Kerlin
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| | - Mark A Pitt
- Department of Psychology, The Ohio State University, Columbus, OH, USA
| | - Antoine J Shahin
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| |
Collapse
|
17
|
The effect of semantic congruence for visual-auditory bimodal stimuli. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2017:998-1001. [PMID: 29060042 DOI: 10.1109/embc.2017.8036994] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
It is commonly believed that brain has faster reaction speed and higher reaction accuracy on visual-auditory bimodal stimuli than single modal stimuli in current neuropsychological researches, while visual-auditory bimodal stimuli (VABS) do not show corresponding superiority in BCI system. This paper aims at investigating whether semantically congruent stimuli could also get better performance than semantically incongruent stimuli in Brain Computer Interface (BCI) system. Two VABS based paradigms (semantically congruent or incongruent) were conducted in this study. 10 healthy subjects participated in the experiment in order to compare the two paradigms. The results indicated that the higher Event-related potential (ERP) amplitude of semantic incongruent paradigm were observed both in target and non-target stimuli. Nevertheless, we didn't observe significant difference of classification accuracy between congruent and incongruent conditions. Most participants showed their preference on semantically congruent condition for less workload needed. This finding demonstrated that semantic congruency has positive effect on behavioral results (less workload) and insignificant effect on system efficiency.
Collapse
|
18
|
Moradi S, Wahlin A, Hällgren M, Rönnberg J, Lidestam B. The Efficacy of Short-term Gated Audiovisual Speech Training for Improving Auditory Sentence Identification in Noise in Elderly Hearing Aid Users. Front Psychol 2017; 8:368. [PMID: 28348542 PMCID: PMC5346541 DOI: 10.3389/fpsyg.2017.00368] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 02/27/2017] [Indexed: 11/13/2022] Open
Abstract
This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants’ auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Conclusion: Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Anna Wahlin
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Mathias Hällgren
- Department of Otorhinolaryngology and Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - Jerker Rönnberg
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Björn Lidestam
- Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| |
Collapse
|
19
|
Yeh PW, Geangu E, Reid V. Coherent emotional perception from body expressions and the voice. Neuropsychologia 2016; 91:99-108. [DOI: 10.1016/j.neuropsychologia.2016.07.038] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 07/07/2016] [Accepted: 07/29/2016] [Indexed: 11/26/2022]
|
20
|
A dynamical framework to relate perceptual variability with multisensory information processing. Sci Rep 2016; 6:31280. [PMID: 27502974 PMCID: PMC4977493 DOI: 10.1038/srep31280] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 07/15/2016] [Indexed: 11/29/2022] Open
Abstract
Multisensory processing involves participation of individual sensory streams, e.g., vision, audition to facilitate perception of environmental stimuli. An experimental realization of the underlying complexity is captured by the “McGurk-effect”- incongruent auditory and visual vocalization stimuli eliciting perception of illusory speech sounds. Further studies have established that time-delay between onset of auditory and visual signals (AV lag) and perturbations in the unisensory streams are key variables that modulate perception. However, as of now only few quantitative theoretical frameworks have been proposed to understand the interplay among these psychophysical variables or the neural systems level interactions that govern perceptual variability. Here, we propose a dynamic systems model consisting of the basic ingredients of any multisensory processing, two unisensory and one multisensory sub-system (nodes) as reported by several researchers. The nodes are connected such that biophysically inspired coupling parameters and time delays become key parameters of this network. We observed that zero AV lag results in maximum synchronization of constituent nodes and the degree of synchronization decreases when we have non-zero lags. The attractor states of this network can thus be interpreted as the facilitator for stabilizing specific perceptual experience. Thereby, the dynamic model presents a quantitative framework for understanding multisensory information processing.
Collapse
|
21
|
Baart M. Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays. Psychophysiology 2016; 53:1295-306. [DOI: 10.1111/psyp.12683] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 05/09/2016] [Indexed: 11/29/2022]
Affiliation(s)
- Martijn Baart
- BCBL. Basque Center on Cognition, Brain and Language; Donostia-San Sebastián Spain
- Department of Cognitive Neuropsychology; Tilburg University; Tilburg The Netherlands
| |
Collapse
|
22
|
Audio Visual Integration with Competing Sources in the Framework of Audio Visual Speech Scene Analysis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016. [DOI: 10.1007/978-3-319-25474-6_42] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register]
|
23
|
Cross-modal orienting of visual attention. Neuropsychologia 2016; 83:170-178. [DOI: 10.1016/j.neuropsychologia.2015.06.003] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 06/02/2015] [Accepted: 06/03/2015] [Indexed: 11/18/2022]
|
24
|
Tiippana K, Möttönen R, Schwartz JL. Multisensory and sensorimotor interactions in speech perception. Front Psychol 2015; 6:458. [PMID: 25941506 PMCID: PMC4403297 DOI: 10.3389/fpsyg.2015.00458] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 03/30/2015] [Indexed: 11/29/2022] Open
Affiliation(s)
- Kaisa Tiippana
- Institute of Behavioural Sciences, University of Helsinki Helsinki, Finland
| | - Riikka Möttönen
- Department of Experimental Psychology, University of Oxford Oxford, UK
| | - Jean-Luc Schwartz
- Grenoble Images Parole Signal Automatique-Lab, Speech and Cognition Department, Centre National de la Recherche Scientifique, Grenoble University Grenoble, France
| |
Collapse
|