1
|
Sato M. Audiovisual speech asynchrony asymmetrically modulates neural binding. Neuropsychologia 2024; 198:108866. [PMID: 38518889 DOI: 10.1016/j.neuropsychologia.2024.108866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/09/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024]
Abstract
Previous psychophysical and neurophysiological studies in young healthy adults have provided evidence that audiovisual speech integration occurs with a large degree of temporal tolerance around true simultaneity. To further determine whether audiovisual speech asynchrony modulates auditory cortical processing and neural binding in young healthy adults, N1/P2 auditory evoked responses were compared using an additive model during a syllable categorization task, without or with an audiovisual asynchrony ranging from 240 ms visual lead to 240 ms auditory lead. Consistent with previous psychophysical findings, the observed results converge in favor of an asymmetric temporal integration window. Three main findings were observed: 1) predictive temporal and phonetic cues from pre-phonatory visual movements before the acoustic onset appeared essential for neural binding to occur, 2) audiovisual synchrony, with visual pre-phonatory movements predictive of the onset of the acoustic signal, was a prerequisite for N1 latency facilitation, and 3) P2 amplitude suppression and latency facilitation occurred even when visual pre-phonatory movements were not predictive of the acoustic onset but the syllable to come. Taken together, these findings help further clarify how audiovisual speech integration partly operates through two stages of visually-based temporal and phonetic predictions.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, Aix-Marseille Université, Aix-en-Provence, France.
| |
Collapse
|
2
|
Marsicano G, Bertini C, Ronconi L. Alpha-band sensory entrainment improves audiovisual temporal acuity. Psychon Bull Rev 2024; 31:874-885. [PMID: 37783899 DOI: 10.3758/s13423-023-02388-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2023] [Indexed: 10/04/2023]
Abstract
Visual and auditory stimuli are transmitted from the environment to sensory cortices with different timing, requiring the brain to encode when sensory inputs must be segregated or integrated into a single percept. The probability that different audiovisual (AV) stimuli are integrated into a single percept even when presented asynchronously is reflected in the construct of temporal binding window (TBW). There is a strong interest in testing whether it is possible to broaden or shrink TBW by using different neuromodulatory approaches that can speed up or slow down ongoing alpha oscillations, which have been repeatedly hypothesized to be an important determinant of the TBWs size. Here, we employed a web-based sensory entrainment protocol combined with a simultaneity judgment task using simple flash-beep stimuli. The aim was to test whether AV temporal acuity could be modulated trial by trial by synchronizing ongoing neural oscillations in the prestimulus period to a rhythmic sensory stream presented in the upper (∼12 Hz) or lower (∼8.5 Hz) alpha range. As a control, we implemented a nonrhythmic condition where only the first and the last entrainers were employed. Results show that upper alpha entrainment shrinks AV TBW and improves AV temporal acuity when compared with lower alpha and control conditions. Our findings represent a proof of concept of the efficacy of sensory entrainment to improve AV temporal acuity in a trial-by-trial manner, and they strengthen the idea that alpha oscillations may reflect the temporal unit of AV temporal binding.
Collapse
Affiliation(s)
- Gianluca Marsicano
- Department of Psychology, University of Bologna, Viale Berti Pichat 5, 40121, Bologna, Italy
- Centre for Studies and Research in Cognitive Neuroscience, University of Bologna, Via Rasi e Spinelli 176, 47023, Cesena, Italy
| | - Caterina Bertini
- Department of Psychology, University of Bologna, Viale Berti Pichat 5, 40121, Bologna, Italy
- Centre for Studies and Research in Cognitive Neuroscience, University of Bologna, Via Rasi e Spinelli 176, 47023, Cesena, Italy
| | - Luca Ronconi
- School of Psychology, Vita-Salute San Raffaele University, Via Olgettina 58, 20132, Milan, Italy.
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy.
| |
Collapse
|
3
|
Sato M. Competing influence of visual speech on auditory neural adaptation. BRAIN AND LANGUAGE 2023; 247:105359. [PMID: 37951157 DOI: 10.1016/j.bandl.2023.105359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 09/25/2023] [Accepted: 11/06/2023] [Indexed: 11/13/2023]
Abstract
Visual information from a speaker's face enhances auditory neural processing and speech recognition. To determine whether auditory memory can be influenced by visual speech, the degree of auditory neural adaptation of an auditory syllable preceded by an auditory, visual, or audiovisual syllable was examined using EEG. Consistent with previous findings and additional adaptation of auditory neurons tuned to acoustic features, stronger adaptation of N1, P2 and N2 auditory evoked responses was observed when the auditory syllable was preceded by an auditory compared to a visual syllable. However, although stronger than when preceded by a visual syllable, lower adaptation was observed when the auditory syllable was preceded by an audiovisual compared to an auditory syllable. In addition, longer N1 and P2 latencies were then observed. These results further demonstrate that visual speech acts on auditory memory but suggest competing visual influences in the case of audiovisual stimulation.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, UMR 7309 CNRS & Aix-Marseille Université, Aix-Marseille Université, 5 avenue Pasteur, Aix-en-Provence, France.
| |
Collapse
|
4
|
Chalas N, Omigie D, Poeppel D, van Wassenhove V. Hierarchically nested networks optimize the analysis of audiovisual speech. iScience 2023; 26:106257. [PMID: 36909667 PMCID: PMC9993032 DOI: 10.1016/j.isci.2023.106257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 12/22/2022] [Accepted: 02/17/2023] [Indexed: 02/22/2023] Open
Abstract
In conversational settings, seeing the speaker's face elicits internal predictions about the upcoming acoustic utterance. Understanding how the listener's cortical dynamics tune to the temporal statistics of audiovisual (AV) speech is thus essential. Using magnetoencephalography, we explored how large-scale frequency-specific dynamics of human brain activity adapt to AV speech delays. First, we show that the amplitude of phase-locked responses parametrically decreases with natural AV speech synchrony, a pattern that is consistent with predictive coding. Second, we show that the temporal statistics of AV speech affect large-scale oscillatory networks at multiple spatial and temporal resolutions. We demonstrate a spatial nestedness of oscillatory networks during the processing of AV speech: these oscillatory hierarchies are such that high-frequency activity (beta, gamma) is contingent on the phase response of low-frequency (delta, theta) networks. Our findings suggest that the endogenous temporal multiplexing of speech processing confers adaptability within the temporal regimes that are essential for speech comprehension.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, P.C., 48149 Münster, Germany
- CEA, DRF/Joliot, NeuroSpin, INSERM, Cognitive Neuroimaging Unit; CNRS; Université Paris-Saclay, 91191 Gif/Yvette, France
- School of Biology, Faculty of Sciences, Aristotle University of Thessaloniki, P.C., 54124 Thessaloniki, Greece
- Corresponding author
| | - Diana Omigie
- Department of Psychology, Goldsmiths University London, London, UK
| | - David Poeppel
- Department of Psychology, New York University, New York, NY 10003, USA
- Ernst Struengmann Institute for Neuroscience, 60528 Frankfurt am Main, Frankfurt, Germany
| | - Virginie van Wassenhove
- CEA, DRF/Joliot, NeuroSpin, INSERM, Cognitive Neuroimaging Unit; CNRS; Université Paris-Saclay, 91191 Gif/Yvette, France
- Corresponding author
| |
Collapse
|
5
|
Sato M. The timing of visual speech modulates auditory neural processing. BRAIN AND LANGUAGE 2022; 235:105196. [PMID: 36343508 DOI: 10.1016/j.bandl.2022.105196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 10/15/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
In face-to-face communication, visual information from a speaker's face and time-varying kinematics of articulatory movements have been shown to fine-tune auditory neural processing and improve speech recognition. To further determine whether the timing of visual gestures modulates auditory cortical processing, three sets of syllables only differing in the onset and duration of silent prephonatory movements, before the acoustic speech signal, were contrasted using EEG. Despite similar visual recognition rates, an increase in the amplitude of P2 auditory evoked responses was observed from the longest to the shortest movements. Taken together, these results clarify how audiovisual speech perception partly operates through visually-based predictions and related processing time, with acoustic-phonetic neural processing paralleling the timing of visual prephonatory gestures.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, Aix-Marseille Université, Aix-en-Provence, France.
| |
Collapse
|
6
|
Pastore A, Tomassini A, Delis I, Dolfini E, Fadiga L, D'Ausilio A. Speech listening entails neural encoding of invisible articulatory features. Neuroimage 2022; 264:119724. [PMID: 36328272 DOI: 10.1016/j.neuroimage.2022.119724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/28/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022] Open
Abstract
Speech processing entails a complex interplay between bottom-up and top-down computations. The former is reflected in the neural entrainment to the quasi-rhythmic properties of speech acoustics while the latter is supposed to guide the selection of the most relevant input subspace. Top-down signals are believed to originate mainly from motor regions, yet similar activities have been shown to tune attentional cycles also for simpler, non-speech stimuli. Here we examined whether, during speech listening, the brain reconstructs articulatory patterns associated to speech production. We measured electroencephalographic (EEG) data while participants listened to sentences during the production of which articulatory kinematics of lips, jaws and tongue were also recorded (via Electro-Magnetic Articulography, EMA). We captured the patterns of articulatory coordination through Principal Component Analysis (PCA) and used Partial Information Decomposition (PID) to identify whether the speech envelope and each of the kinematic components provided unique, synergistic and/or redundant information regarding the EEG signals. Interestingly, tongue movements contain both unique as well as synergistic information with the envelope that are encoded in the listener's brain activity. This demonstrates that during speech listening the brain retrieves highly specific and unique motor information that is never accessible through vision, thus leveraging audio-motor maps that arise most likely from the acquisition of speech production during development.
Collapse
Affiliation(s)
- A Pastore
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| | - A Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
| | - I Delis
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - E Dolfini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - L Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - A D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| |
Collapse
|
7
|
Clustering Algorithm in English Language Learning Pattern Matching under Big Data Framework. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1380046. [PMID: 36110905 PMCID: PMC9470339 DOI: 10.1155/2022/1380046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 08/12/2022] [Accepted: 08/22/2022] [Indexed: 11/25/2022]
Abstract
The Internet era has brought new challenges and opportunities for English learning and English teaching. At the same time, basic education is fully implementing quality education and respecting students' individual differences. The same teacher teaches the same content to the same class of students, but some students perform well, and some students perform poorly due to the influence of intellectual and nonintellectual factors. The uneven performance of students in the same class makes it very difficult for teachers to teach. In view of the current situation of university English teaching and the trend of respecting students' individual development in the new era, this study investigates the basic concept of English language learning pattern matching, its main features, and practical application in the process of university English teaching. The clustering algorithm based on the big data framework is proposed for English language learning pattern matching, which is fault-tolerant and can quickly acquire and process the big data information in English teaching. By analyzing the characteristics of the data mining method of students' English learning behavior, the method of clustering processing for students' English learning data mining and the processing method of students' English learning clustering data are explored. The method is highly adaptable and can be used for actual English language learning pattern matching, and actively explores the main path of English teaching change and innovation.
Collapse
|
8
|
Pesnot Lerousseau J, Parise CV, Ernst MO, van Wassenhove V. Multisensory correlation computations in the human brain identified by a time-resolved encoding model. Nat Commun 2022; 13:2489. [PMID: 35513362 PMCID: PMC9072402 DOI: 10.1038/s41467-022-29687-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/14/2022] [Indexed: 11/09/2022] Open
Abstract
Neural mechanisms that arbitrate between integrating and segregating multisensory information are essential for complex scene analysis and for the resolution of the multisensory correspondence problem. However, these mechanisms and their dynamics remain largely unknown, partly because classical models of multisensory integration are static. Here, we used the Multisensory Correlation Detector, a model that provides a good explanatory power for human behavior while incorporating dynamic computations. Participants judged whether sequences of auditory and visual signals originated from the same source (causal inference) or whether one modality was leading the other (temporal order), while being recorded with magnetoencephalography. First, we confirm that the Multisensory Correlation Detector explains causal inference and temporal order behavioral judgments well. Second, we found strong fits of brain activity to the two outputs of the Multisensory Correlation Detector in temporo-parietal cortices. Finally, we report an asymmetry in the goodness of the fits, which were more reliable during the causal inference task than during the temporal order judgment task. Overall, our results suggest the existence of multisensory correlation detectors in the human brain, which explain why and how causal inference is strongly driven by the temporal correlation of multisensory signals.
Collapse
Affiliation(s)
- Jacques Pesnot Lerousseau
- Aix Marseille Univ, Inserm, INS, Inst Neurosci Syst, Marseille, France.
- Applied Cognitive Psychology, Ulm University, Ulm, Germany.
- Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, CNRS, Université Paris-Saclay, NeuroSpin, 91191, Gif/Yvette, France.
| | | | - Marc O Ernst
- Applied Cognitive Psychology, Ulm University, Ulm, Germany
| | - Virginie van Wassenhove
- Cognitive Neuroimaging Unit, CEA DRF/Joliot, INSERM, CNRS, Université Paris-Saclay, NeuroSpin, 91191, Gif/Yvette, France
| |
Collapse
|
9
|
Sato M. Motor and visual influences on auditory neural processing during speaking and listening. Cortex 2022; 152:21-35. [DOI: 10.1016/j.cortex.2022.03.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 02/02/2022] [Accepted: 03/15/2022] [Indexed: 11/03/2022]
|
10
|
Wang L, Lin L, Sun Y, Hou S, Ren J. The effect of movement speed on audiovisual temporal integration in streaming-bouncing illusion. Exp Brain Res 2022; 240:1139-1149. [PMID: 35147722 DOI: 10.1007/s00221-022-06312-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 01/18/2022] [Indexed: 11/04/2022]
Abstract
Motion perception in real situations is often stimulated by multisensory information. Speed is an essential characteristic of moving objects; however, at present, it is not clear whether speed affects the process of audiovisual temporal integration in motion perception. Therefore, this study used a streaming-bouncing task (a bistable motion perception; SB task) combined with a simultaneous judgment task (SJ task) to explore the effect of speed on audiovisual temporal integration from implicit and explicit perspectives. The experiment had a within-subjects design, two speed conditions (fast/slow), eleven audiovisual conditions [stimulus onset asynchrony (SOA): 0 ms/ ± 60 ms/ ± 120 ms/ ± 180 ms/ ± 240 ms/ ± 300 ms], and a visual-only condition. A total of 30 subjects were recruited for the study. These participants completed the SB task and the SJ task successively. The results showed the following outcomes: (1) the optimal times needed to induce the "bouncing" illusion and maximum audiovisual bounce-inducing effect (ABE) magnitude were much earlier than that for the optimal time of audiovisual synchrony, (2) speed as a bottom-up factor could affect the proportion of "bouncing" perception in SB illusions but did not affect the ABE magnitude, (3) speed could also affect the ability of audiovisual temporal integration in motion perception, and the main manifestation was that the point of subjective simultaneity (PSS) in fast speed conditions was earlier than that of slow speed conditions in the SJ task and (4) the SB task and SJ task were not related. In conclusion, the time to complete the maximum audiovisual integration was different from the optimal time for synchrony perception; moreover, speed could affect audiovisual temporal integration in motion perception but only in explicit temporal tasks.
Collapse
Affiliation(s)
- Luning Wang
- School of Psychology, Shanghai University of Sport, Shanghai, 200438, China
| | - Liyue Lin
- School of Psychology, Shanghai University of Sport, Shanghai, 200438, China
| | - Yujia Sun
- China Table Tennis College, Shanghai University of Sport, Shanghai, 200438, China
| | - Shuang Hou
- School of Psychology, Shanghai University of Sport, Shanghai, 200438, China
| | - Jie Ren
- China Table Tennis College, Shanghai University of Sport, Shanghai, 200438, China.
| |
Collapse
|
11
|
Dias JW, McClaskey CM, Harris KC. Early auditory cortical processing predicts auditory speech in noise identification and lipreading. Neuropsychologia 2021; 161:108012. [PMID: 34474065 DOI: 10.1016/j.neuropsychologia.2021.108012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 08/20/2021] [Accepted: 08/26/2021] [Indexed: 10/20/2022]
Abstract
Individuals typically exhibit better cross-sensory perception following unisensory loss, demonstrating improved perception of information available from the remaining senses and increased cross-sensory use of neural resources. Even individuals with no sensory loss will exhibit such changes in cross-sensory processing following temporary sensory deprivation, suggesting that the brain's capacity for recruiting cross-sensory sources to compensate for degraded unisensory input is a general characteristic of the perceptual process. Many studies have investigated how auditory and visual neural structures respond to within- and cross-sensory input. However, little attention has been given to how general auditory and visual neural processing relates to within and cross-sensory perception. The current investigation examines the extent to which individual differences in general auditory neural processing accounts for variability in auditory, visual, and audiovisual speech perception in a sample of young healthy adults. Auditory neural processing was assessed using a simple click stimulus. We found that individuals with a smaller P1 peak amplitude in their auditory-evoked potential (AEP) had more difficulty identifying speech sounds in difficult listening conditions, but were better lipreaders. The results suggest that individual differences in the auditory neural processing of healthy adults can account for variability in the perception of information available from the auditory and visual modalities, similar to the cross-sensory perceptual compensation observed in individuals with sensory loss.
Collapse
Affiliation(s)
- James W Dias
- Medical University of South Carolina, United States.
| | | | | |
Collapse
|
12
|
Tremblay P, Basirat A, Pinto S, Sato M. Visual prediction cues can facilitate behavioural and neural speech processing in young and older adults. Neuropsychologia 2021; 159:107949. [PMID: 34228997 DOI: 10.1016/j.neuropsychologia.2021.107949] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 06/16/2021] [Accepted: 07/01/2021] [Indexed: 02/06/2023]
Abstract
The ability to process speech evolves over the course of the lifespan. Understanding speech at low acoustic intensity and in the presence of background noise becomes harder, and the ability for older adults to benefit from audiovisual speech also appears to decline. These difficulties can have important consequences on quality of life. Yet, a consensus on the cause of these difficulties is still lacking. The objective of this study was to examine the processing of speech in young and older adults under different modalities (i.e. auditory [A], visual [V], audiovisual [AV]) and in the presence of different visual prediction cues (i.e., no predictive cue (control), temporal predictive cue, phonetic predictive cue, and combined temporal and phonetic predictive cues). We focused on recognition accuracy and four auditory evoked potential (AEP) components: P1-N1-P2 and N2. Thirty-four right-handed French-speaking adults were recruited, including 17 younger adults (28 ± 2 years; 20-42 years) and 17 older adults (67 ± 3.77 years; 60-73 years). Participants completed a forced-choice speech identification task. The main findings of the study are: (1) The faciliatory effect of visual information was reduced, but present, in older compared to younger adults, (2) visual predictive cues facilitated speech recognition in younger and older adults alike, (3) age differences in AEPs were localized to later components (P2 and N2), suggesting that aging predominantly affects higher-order cortical processes related to speech processing rather than lower-level auditory processes. (4) Specifically, AV facilitation on P2 amplitude was lower in older adults, there was a reduced effect of the temporal predictive cue on N2 amplitude for older compared to younger adults, and P2 and N2 latencies were longer for older adults. Finally (5) behavioural performance was associated with P2 amplitude in older adults. Our results indicate that aging affects speech processing at multiple levels, including audiovisual integration (P2) and auditory attentional processes (N2). These findings have important implications for understanding barriers to communication in older ages, as well as for the development of compensation strategies for those with speech processing difficulties.
Collapse
Affiliation(s)
- Pascale Tremblay
- Département de Réadaptation, Faculté de Médecine, Université Laval, Quebec City, Canada; Cervo Brain Research Centre, Quebec City, Canada.
| | - Anahita Basirat
- Univ. Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Lille, France
| | - Serge Pinto
- France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| | - Marc Sato
- France Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| |
Collapse
|
13
|
Hernández-Gutiérrez D, Muñoz F, Sánchez-García J, Sommer W, Abdel Rahman R, Casado P, Jiménez-Ortega L, Espuny J, Fondevila S, Martín-Loeches M. Situating language in a minimal social context: how seeing a picture of the speaker's face affects language comprehension. Soc Cogn Affect Neurosci 2021; 16:502-511. [PMID: 33470410 PMCID: PMC8094999 DOI: 10.1093/scan/nsab009] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 11/29/2020] [Accepted: 01/19/2021] [Indexed: 11/14/2022] Open
Abstract
Natural use of language involves at least two individuals. Some studies have focused on the interaction between senders in communicative situations and how the knowledge about the speaker can bias language comprehension. However, the mere effect of a face as a social context on language processing remains unknown. In the present study, we used event-related potentials to investigate the semantic and morphosyntactic processing of speech in the presence of a photographic portrait of the speaker. In Experiment 1, we show that the N400, a component related to semantic comprehension, increased its amplitude when processed within this minimal social context compared to a scrambled face control condition. Hence, the semantic neural processing of speech is sensitive to the concomitant perception of a picture of the speaker's face, even if irrelevant to the content of the sentences. Moreover, a late posterior negativity effect was found to the presentation of the speaker's face compared to control stimuli. In contrast, in Experiment 2, we found that morphosyntactic processing, as reflected in left anterior negativity and P600 effects, is not notably affected by the presence of the speaker's portrait. Overall, the present findings suggest that the mere presence of the speaker's image seems to trigger a minimal communicative context, increasing processing resources for language comprehension at the semantic level.
Collapse
Affiliation(s)
- David Hernández-Gutiérrez
- Cognitive Neuroscience Section, Center UCM-ISCIII for Human Evolution and Behaviour, Madrid 28029, Spain
| | - Francisco Muñoz
- Cognitive Neuroscience Section, Center UCM-ISCIII for Human Evolution and Behaviour, Madrid 28029, Spain
- Department of Psychobiology & Methods for the Behavioural Sciences, Complutense University of Madrid, Madrid 28040, Spain
| | - Jose Sánchez-García
- Cognitive Neuroscience Section, Center UCM-ISCIII for Human Evolution and Behaviour, Madrid 28029, Spain
| | - Werner Sommer
- Department of Psychology, Humboldt Universität zu Berlin, Berlin 10117, Germany
| | - Rasha Abdel Rahman
- Department of Psychology, Humboldt Universität zu Berlin, Berlin 10117, Germany
| | - Pilar Casado
- Cognitive Neuroscience Section, Center UCM-ISCIII for Human Evolution and Behaviour, Madrid 28029, Spain
- Department of Psychobiology & Methods for the Behavioural Sciences, Complutense University of Madrid, Madrid 28040, Spain
| | - Laura Jiménez-Ortega
- Cognitive Neuroscience Section, Center UCM-ISCIII for Human Evolution and Behaviour, Madrid 28029, Spain
- Department of Psychobiology & Methods for the Behavioural Sciences, Complutense University of Madrid, Madrid 28040, Spain
| | - Javier Espuny
- Cognitive Neuroscience Section, Center UCM-ISCIII for Human Evolution and Behaviour, Madrid 28029, Spain
| | - Sabela Fondevila
- Cognitive Neuroscience Section, Center UCM-ISCIII for Human Evolution and Behaviour, Madrid 28029, Spain
- Department of Psychobiology & Methods for the Behavioural Sciences, Complutense University of Madrid, Madrid 28040, Spain
| | - Manuel Martín-Loeches
- Cognitive Neuroscience Section, Center UCM-ISCIII for Human Evolution and Behaviour, Madrid 28029, Spain
- Department of Psychobiology & Methods for the Behavioural Sciences, Complutense University of Madrid, Madrid 28040, Spain
| |
Collapse
|
14
|
Tabas A, von Kriegstein K. Adjudicating Between Local and Global Architectures of Predictive Processing in the Subcortical Auditory Pathway. Front Neural Circuits 2021; 15:644743. [PMID: 33776657 PMCID: PMC7994860 DOI: 10.3389/fncir.2021.644743] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open
Abstract
Predictive processing, a leading theoretical framework for sensory processing, suggests that the brain constantly generates predictions on the sensory world and that perception emerges from the comparison between these predictions and the actual sensory input. This requires two distinct neural elements: generative units, which encode the model of the sensory world; and prediction error units, which compare these predictions against the sensory input. Although predictive processing is generally portrayed as a theory of cerebral cortex function, animal and human studies over the last decade have robustly shown the ubiquitous presence of prediction error responses in several nuclei of the auditory, somatosensory, and visual subcortical pathways. In the auditory modality, prediction error is typically elicited using so-called oddball paradigms, where sequences of repeated pure tones with the same pitch are at unpredictable intervals substituted by a tone of deviant frequency. Repeated sounds become predictable promptly and elicit decreasing prediction error; deviant tones break these predictions and elicit large prediction errors. The simplicity of the rules inducing predictability make oddball paradigms agnostic about the origin of the predictions. Here, we introduce two possible models of the organizational topology of the predictive processing auditory network: (1) the global view, that assumes that predictions on the sensory input are generated at high-order levels of the cerebral cortex and transmitted in a cascade of generative models to the subcortical sensory pathways; and (2) the local view, that assumes that independent local models, computed using local information, are used to perform predictions at each processing stage. In the global view information encoding is optimized globally but biases sensory representations along the entire brain according to the subjective views of the observer. The local view results in a diminished coding efficiency, but guarantees in return a robust encoding of the features of sensory input at each processing stage. Although most experimental results to-date are ambiguous in this respect, recent evidence favors the global model.
Collapse
Affiliation(s)
- Alejandro Tabas
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
15
|
Li S, Ding Q, Yuan Y, Yue Z. Audio-Visual Causality and Stimulus Reliability Affect Audio-Visual Synchrony Perception. Front Psychol 2021; 12:629996. [PMID: 33679553 PMCID: PMC7930005 DOI: 10.3389/fpsyg.2021.629996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 01/28/2021] [Indexed: 11/18/2022] Open
Abstract
People can discriminate the synchrony between audio-visual scenes. However, the sensitivity of audio-visual synchrony perception can be affected by many factors. Using a simultaneity judgment task, the present study investigated whether the synchrony perception of complex audio-visual stimuli was affected by audio-visual causality and stimulus reliability. In Experiment 1, the results showed that audio-visual causality could increase one's sensitivity to audio-visual onset asynchrony (AVOA) of both action stimuli and speech stimuli. Moreover, participants were more tolerant of AVOA of speech stimuli than that of action stimuli in the high causality condition, whereas no significant difference between these two kinds of stimuli was found in the low causality condition. In Experiment 2, the speech stimuli were manipulated with either high or low stimulus reliability. The results revealed a significant interaction between audio-visual causality and stimulus reliability. Under the low causality condition, the percentage of “synchronous” responses of audio-visual intact stimuli was significantly higher than that of visual_intact/auditory_blurred stimuli and audio-visual blurred stimuli. In contrast, no significant difference among all levels of stimulus reliability was observed under the high causality condition. Our study supported the synergistic effect of top-down processing and bottom-up processing in audio-visual synchrony perception.
Collapse
Affiliation(s)
- Shao Li
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Qi Ding
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Yichen Yuan
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Zhenzhu Yue
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
16
|
Lee W, Seong JJ, Ozlu B, Shim BS, Marakhimov A, Lee S. Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review. SENSORS (BASEL, SWITZERLAND) 2021; 21:1399. [PMID: 33671282 PMCID: PMC7922488 DOI: 10.3390/s21041399] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 02/01/2021] [Accepted: 02/12/2021] [Indexed: 11/16/2022]
Abstract
Voice is one of the essential mechanisms for communicating and expressing one's intentions as a human being. There are several causes of voice inability, including disease, accident, vocal abuse, medical surgery, ageing, and environmental pollution, and the risk of voice loss continues to increase. Novel approaches should have been developed for speech recognition and production because that would seriously undermine the quality of life and sometimes leads to isolation from society. In this review, we survey mouth interface technologies which are mouth-mounted devices for speech recognition, production, and volitional control, and the corresponding research to develop artificial mouth technologies based on various sensors, including electromyography (EMG), electroencephalography (EEG), electropalatography (EPG), electromagnetic articulography (EMA), permanent magnet articulography (PMA), gyros, images and 3-axial magnetic sensors, especially with deep learning techniques. We especially research various deep learning technologies related to voice recognition, including visual speech recognition, silent speech interface, and analyze its flow, and systematize them into a taxonomy. Finally, we discuss methods to solve the communication problems of people with disabilities in speaking and future research with respect to deep learning components.
Collapse
Affiliation(s)
- Wookey Lee
- Biomedical Science and Engineering & Dept. of Industrial Security Governance & IE, Inha University, 100 Inharo, Incheon 22212, Korea;
| | - Jessica Jiwon Seong
- Department of Industrial Security Governance, Inha University, 100 Inharo, Incheon 22212, Korea;
| | - Busra Ozlu
- Biomedical Science and Engineering & Department of Chemical Engineering, Inha University, 100 Inharo, Incheon 22212, Korea; (B.O.); (B.S.S.)
| | - Bong Sup Shim
- Biomedical Science and Engineering & Department of Chemical Engineering, Inha University, 100 Inharo, Incheon 22212, Korea; (B.O.); (B.S.S.)
| | | | - Suan Lee
- School of Computer Science, Semyung University, Jecheon 27136, Korea
| |
Collapse
|
17
|
Sorati M, Behne DM. Considerations in Audio-Visual Interaction Models: An ERP Study of Music Perception by Musicians and Non-musicians. Front Psychol 2021; 11:594434. [PMID: 33551911 PMCID: PMC7854916 DOI: 10.3389/fpsyg.2020.594434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Accepted: 12/03/2020] [Indexed: 11/13/2022] Open
Abstract
Previous research with speech and non-speech stimuli suggested that in audiovisual perception, visual information starting prior to the onset of corresponding sound can provide visual cues, and form a prediction about the upcoming auditory sound. This prediction leads to audiovisual (AV) interaction. Auditory and visual perception interact and induce suppression and speeding up of the early auditory event-related potentials (ERPs) such as N1 and P2. To investigate AV interaction, previous research examined N1 and P2 amplitudes and latencies in response to audio only (AO), video only (VO), audiovisual, and control (CO) stimuli, and compared AV with auditory perception based on four AV interaction models (AV vs. AO+VO, AV-VO vs. AO, AV-VO vs. AO-CO, AV vs. AO). The current study addresses how different models of AV interaction express N1 and P2 suppression in music perception. Furthermore, the current study took one step further and examined whether previous musical experience, which can potentially lead to higher N1 and P2 amplitudes in auditory perception, influenced AV interaction in different models. Musicians and non-musicians were presented the recordings (AO, AV, VO) of a keyboard /C4/ key being played, as well as CO stimuli. Results showed that AV interaction models differ in their expression of N1 and P2 amplitude and latency suppression. The calculation of model (AV-VO vs. AO) and (AV-VO vs. AO-CO) has consequences for the resulting N1 and P2 difference waves. Furthermore, while musicians, compared to non-musicians, showed higher N1 amplitude in auditory perception, suppression of amplitudes and latencies for N1 and P2 was similar for the two groups across the AV models. Collectively, these results suggest that when visual cues from finger and hand movements predict the upcoming sound in AV music perception, suppression of early ERPs is similar for musicians and non-musicians. Notably, the calculation differences across models do not lead to the same pattern of results for N1 and P2, demonstrating that the four models are not interchangeable and are not directly comparable.
Collapse
Affiliation(s)
- Marzieh Sorati
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dawn M Behne
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
18
|
Michon M, Boncompte G, López V. Electrophysiological Dynamics of Visual Speech Processing and the Role of Orofacial Effectors for Cross-Modal Predictions. Front Hum Neurosci 2020; 14:538619. [PMID: 33192386 PMCID: PMC7653187 DOI: 10.3389/fnhum.2020.538619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
The human brain generates predictions about future events. During face-to-face conversations, visemic information is used to predict upcoming auditory input. Recent studies suggest that the speech motor system plays a role in these cross-modal predictions, however, usually only audio-visual paradigms are employed. Here we tested whether speech sounds can be predicted on the basis of visemic information only, and to what extent interfering with orofacial articulatory effectors can affect these predictions. We registered EEG and employed N400 as an index of such predictions. Our results show that N400's amplitude was strongly modulated by visemic salience, coherent with cross-modal speech predictions. Additionally, N400 ceased to be evoked when syllables' visemes were presented backwards, suggesting that predictions occur only when the observed viseme matched an existing articuleme in the observer's speech motor system (i.e., the articulatory neural sequence required to produce a particular phoneme/viseme). Importantly, we found that interfering with the motor articulatory system strongly disrupted cross-modal predictions. We also observed a late P1000 that was evoked only for syllable-related visual stimuli, but whose amplitude was not modulated by interfering with the motor system. The present study provides further evidence of the importance of the speech production system for speech sounds predictions based on visemic information at the pre-lexical level. The implications of these results are discussed in the context of a hypothesized trimodal repertoire for speech, in which speech perception is conceived as a highly interactive process that involves not only your ears but also your eyes, lips and tongue.
Collapse
Affiliation(s)
- Maëva Michon
- Laboratorio de Neurociencia Cognitiva y Evolutiva, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
- Laboratorio de Neurociencia Cognitiva y Social, Facultad de Psicología, Universidad Diego Portales, Santiago, Chile
| | - Gonzalo Boncompte
- Laboratorio de Neurodinámicas de la Cognición, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Vladimir López
- Laboratorio de Psicología Experimental, Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
19
|
Sorati M, Behne DM. Audiovisual Modulation in Music Perception for Musicians and Non-musicians. Front Psychol 2020; 11:1094. [PMID: 32547458 PMCID: PMC7273518 DOI: 10.3389/fpsyg.2020.01094] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 04/29/2020] [Indexed: 11/13/2022] Open
Abstract
In audiovisual music perception, visual information from a musical instrument being played is available prior to the onset of the corresponding musical sound and consequently allows a perceiver to form a prediction about the upcoming audio music. This prediction in audiovisual music perception, compared to auditory music perception, leads to lower N1 and P2 amplitudes and latencies. Although previous research suggests that audiovisual experience, such as previous musical experience may enhance this prediction, a remaining question is to what extent musical experience modifies N1 and P2 amplitudes and latencies. Furthermore, corresponding event-related phase modulations quantified as inter-trial phase coherence (ITPC) have not previously been reported for audiovisual music perception. In the current study, audio video recordings of a keyboard key being played were presented to musicians and non-musicians in audio only (AO), video only (VO), and audiovisual (AV) conditions. With predictive movements from playing the keyboard isolated from AV music perception (AV-VO), the current findings demonstrated that, compared to the AO condition, both groups had a similar decrease in N1 amplitude and latency, and P2 amplitude, along with correspondingly lower ITPC values in the delta, theta, and alpha frequency bands. However, while musicians showed lower ITPC values in the beta-band in AV-VO compared to the AO, non-musicians did not show this pattern. Findings indicate that AV perception may be broadly correlated with auditory perception, and differences between musicians and non-musicians further indicate musical experience to be a specific factor influencing AV perception. Predicting an upcoming sound in AV music perception may involve visual predictory processes, as well as beta-band oscillations, which may be influenced by years of musical training. This study highlights possible interconnectivity in AV perception as well as potential modulation with experience.
Collapse
Affiliation(s)
- Marzieh Sorati
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dawn Marie Behne
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
20
|
Randazzo M, Priefer R, Smith PJ, Nagler A, Avery T, Froud K. Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm. Brain Sci 2020; 10:brainsci10060328. [PMID: 32481538 PMCID: PMC7348766 DOI: 10.3390/brainsci10060328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/15/2020] [Accepted: 05/25/2020] [Indexed: 11/16/2022] Open
Abstract
The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.
Collapse
Affiliation(s)
- Melissa Randazzo
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
- Correspondence: ; Tel.: +1-516-877-4769
| | - Ryan Priefer
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Paul J. Smith
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Amanda Nagler
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Trey Avery
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Karen Froud
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| |
Collapse
|
21
|
Gori M, Ober KM, Tinelli F, Coubard OA. Temporal representation impairment in developmental dyslexia for unisensory and multisensory stimuli. Dev Sci 2020; 23:e12977. [PMID: 32333455 PMCID: PMC7507191 DOI: 10.1111/desc.12977] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 12/16/2019] [Accepted: 12/24/2019] [Indexed: 11/29/2022]
Abstract
Dyslexia has been associated with a problem in visual-audio integration mechanisms. Here, we investigate for the first time the contribution of unisensory cues on multisensory audio and visual integration in 32 dyslexic children by modelling results using the Bayesian approach. Non-linguistic stimuli were used. Children performed a temporal task: they had to report whether the middle of three stimuli was closer in time to the first one or to the last one presented. Children with dyslexia, compared with typical children, exhibited poorer unimodal thresholds, requiring greater temporal distance between items for correct judgements, while multisensory thresholds were well predicted by the Bayesian model. This result suggests that the multisensory deficit in dyslexia is due to impaired audio and visual inputs rather than impaired multisensory processing per se. We also observed that poorer temporal skills correlated with lower reading skills in dyslexic children, suggesting that this temporal capability can be linked to reading abilities.
Collapse
Affiliation(s)
- Monica Gori
- U-VIP Unit for Visually Impaired People, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Kinga M Ober
- Faculty of Educational Studies, Adam Mickiewicz University, Poznan, Poland
| | - Francesca Tinelli
- Department of Developmental Neuroscience, Stella Maris Scientific Institute, Pisa, Italy
| | | |
Collapse
|
22
|
Kramer A, Röder B, Bruns P. Feedback Modulates Audio-Visual Spatial Recalibration. Front Integr Neurosci 2020; 13:74. [PMID: 32009913 PMCID: PMC6979315 DOI: 10.3389/fnint.2019.00074] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 12/10/2019] [Indexed: 11/13/2022] Open
Abstract
In an ever-changing environment, crossmodal recalibration is crucial to maintain precise and coherent spatial estimates across different sensory modalities. Accordingly, it has been found that perceived auditory space is recalibrated toward vision after consistent exposure to spatially misaligned audio-visual stimuli (VS). While this so-called ventriloquism aftereffect (VAE) yields internal consistency between vision and audition, it does not necessarily lead to consistency between the perceptual representation of space and the actual environment. For this purpose, feedback about the true state of the external world might be necessary. Here, we tested whether the size of the VAE is modulated by external feedback and reward. During adaptation audio-VS with a fixed spatial discrepancy were presented. Participants had to localize the sound and received feedback about the magnitude of their localization error. In half of the sessions the feedback was based on the position of the VS and in the other half it was based on the position of the auditory stimulus. An additional monetary reward was given if the localization error fell below a certain threshold that was based on participants’ performance in the pretest. As expected, when error feedback was based on the position of the VS, auditory localization during adaptation trials shifted toward the position of the VS. Conversely, feedback based on the position of the auditory stimuli reduced the visual influence on auditory localization (i.e., the ventriloquism effect) and improved sound localization accuracy. After adaptation with error feedback based on the VS position, a typical auditory VAE (but no visual aftereffect) was observed in subsequent unimodal localization tests. By contrast, when feedback was based on the position of the auditory stimuli during adaptation, no auditory VAE was observed in subsequent unimodal auditory trials. Importantly, in this situation no visual aftereffect was found either. As feedback did not change the physical attributes of the audio-visual stimulation during adaptation, the present findings suggest that crossmodal recalibration is subject to top–down influences. Such top–down influences might help prevent miscalibration of audition toward conflicting visual stimulation in situations in which external feedback indicates that visual information is inaccurate.
Collapse
Affiliation(s)
- Alexander Kramer
- Biological Psychology and Neuropsychology, University of Hamburg, Hamburg, Germany
| | - Brigitte Röder
- Biological Psychology and Neuropsychology, University of Hamburg, Hamburg, Germany
| | - Patrick Bruns
- Biological Psychology and Neuropsychology, University of Hamburg, Hamburg, Germany
| |
Collapse
|
23
|
La Rocca D, Ciuciu P, Engemann DA, van Wassenhove V. Emergence of β and γ networks following multisensory training. Neuroimage 2020; 206:116313. [PMID: 31676416 PMCID: PMC7355235 DOI: 10.1016/j.neuroimage.2019.116313] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 10/22/2019] [Accepted: 10/23/2019] [Indexed: 12/31/2022] Open
Abstract
Our perceptual reality relies on inferences about the causal structure of the world given by multiple sensory inputs. In ecological settings, multisensory events that cohere in time and space benefit inferential processes: hearing and seeing a speaker enhances speech comprehension, and the acoustic changes of flapping wings naturally pace the motion of a flock of birds. Here, we asked how a few minutes of (multi)sensory training could shape cortical interactions in a subsequent unisensory perceptual task. For this, we investigated oscillatory activity and functional connectivity as a function of individuals' sensory history during training. Human participants performed a visual motion coherence discrimination task while being recorded with magnetoencephalography. Three groups of participants performed the same task with visual stimuli only, while listening to acoustic textures temporally comodulated with the strength of visual motion coherence, or with auditory noise uncorrelated with visual motion. The functional connectivity patterns before and after training were contrasted to resting-state networks to assess the variability of common task-relevant networks, and the emergence of new functional interactions as a function of sensory history. One major finding is the emergence of a large-scale synchronization in the high γ (gamma: 60-120Hz) and β (beta: 15-30Hz) bands for individuals who underwent comodulated multisensory training. The post-training network involved prefrontal, parietal, and visual cortices. Our results suggest that the integration of evidence and decision-making strategies become more efficient following congruent multisensory training through plasticity in network routing and oscillatory regimes.
Collapse
Affiliation(s)
- Daria La Rocca
- CEA/DRF/Joliot, Université Paris-Saclay, 91191, Gif-sur-Yvette, France; Université Paris-Saclay, Inria, CEA, Palaiseau, 91120, France
| | - Philippe Ciuciu
- CEA/DRF/Joliot, Université Paris-Saclay, 91191, Gif-sur-Yvette, France; Université Paris-Saclay, Inria, CEA, Palaiseau, 91120, France
| | - Denis-Alexander Engemann
- CEA/DRF/Joliot, Université Paris-Saclay, 91191, Gif-sur-Yvette, France; Université Paris-Saclay, Inria, CEA, Palaiseau, 91120, France
| | - Virginie van Wassenhove
- CEA/DRF/Joliot, Université Paris-Saclay, 91191, Gif-sur-Yvette, France; Cognitive Neuroimaging Unit, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin Center, 91191, Gif-sur-Yvette, France.
| |
Collapse
|
24
|
Migliorati D, Zappasodi F, Perrucci MG, Donno B, Northoff G, Romei V, Costantini M. Individual Alpha Frequency Predicts Perceived Visuotactile Simultaneity. J Cogn Neurosci 2020; 32:1-11. [DOI: 10.1162/jocn_a_01464] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Abstract
Temporal encoding is a key feature in multisensory processing that leads to the integration versus segregation of perceived events over time. Whether or not two events presented at different offsets are perceived as simultaneous varies widely across the general population. Such tolerance to temporal delays is known as the temporal binding window (TBW). It has been recently suggested that individual oscillatory alpha frequency (IAF) peak may represent the electrophysiological correlate of TBW, with IAF also showing a wide variability in the general population (8–12 Hz). In our work, we directly tested this hypothesis by measuring each individual's TBW during a visuotactile simultaneity judgment task while concurrently recording their electrophysiological activity. We found that the individual's TBW significantly correlated with their left parietal IAF, such that faster IAF accounted for narrower TBW. Furthermore, we found that higher prestimulus alpha power measured over the same left parietal regions accounted for more veridical responses of non-simultaneity, which may be explained either by accuracy in perceptual simultaneity or, alternatively, in line with recent proposals by a shift in response bias from more conservative (high alpha power) to more liberal (low alpha power). We propose that the length of an alpha cycle constrains the temporal resolution within which perceptual processes take place.
Collapse
|
25
|
Derrick D, Hansmann D, Theys C. Tri-modal speech: Audio-visual-tactile integration in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3495. [PMID: 31795693 DOI: 10.1121/1.5134064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 10/25/2019] [Indexed: 06/10/2023]
Abstract
Speech perception is a multi-sensory experience. Visual information enhances [Sumby and Pollack (1954). J. Acoust. Soc. Am. 25, 212-215] and interferes [McGurk and MacDonald (1976). Nature 264, 746-748] with speech perception. Similarly, tactile information, transmitted by puffs of air arriving at the skin and aligned with speech audio, alters [Gick and Derrick (2009). Nature 462, 502-504] auditory speech perception in noise. It has also been shown that aero-tactile information influences visual speech perception when an auditory signal is absent [Derrick, Bicevskis, and Gick (2019a). Front. Commun. Lang. Sci. 3(61), 1-11]. However, researchers have not yet identified the combined influence of aero-tactile, visual, and auditory information on speech perception. The effects of matching and mismatching visual and tactile speech on two-way forced-choice auditory syllable-in-noise classification tasks were tested. The results showed that both visual and tactile information altered the signal-to-noise threshold for accurate identification of auditory signals. Similar to previous studies, the visual component has a strong influence on auditory syllable-in-noise identification, as evidenced by a 28.04 dB improvement in SNR between matching and mismatching visual stimulus presentations. In comparison, the tactile component had a small influence resulting in a 1.58 dB SNR match-mismatch range. The effects of both the audio and tactile information were shown to be additive.
Collapse
Affiliation(s)
- Donald Derrick
- New Zealand Institute of Language, Brain, and Behaviour, University of Canterbury, 20 Kirkwood Avenue, Upper Riccarton, Christchurch 8041, New Zealand
| | - Doreen Hansmann
- School of Psychology, Speech and Hearing, University of Canterbury, 20 Kirkwood Avenue, Upper Riccarton, Christchurch 8041, New Zealand
| | - Catherine Theys
- School of Psychology, Speech and Hearing, University of Canterbury, 20 Kirkwood Avenue, Upper Riccarton, Christchurch 8041, New Zealand
| |
Collapse
|
26
|
The impact of when, what and how predictions on auditory speech perception. Exp Brain Res 2019; 237:3143-3153. [DOI: 10.1007/s00221-019-05661-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 09/24/2019] [Indexed: 11/26/2022]
|
27
|
Modelska M, Pourquié M, Baart M. No "Self" Advantage for Audiovisual Speech Aftereffects. Front Psychol 2019; 10:658. [PMID: 30967827 PMCID: PMC6440388 DOI: 10.3389/fpsyg.2019.00658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 03/08/2019] [Indexed: 11/13/2022] Open
Abstract
Although the default state of the world is that we see and hear other people talking, there is evidence that seeing and hearing ourselves rather than someone else may lead to visual (i.e., lip-read) or auditory "self" advantages. We assessed whether there is a "self" advantage for phonetic recalibration (a lip-read driven cross-modal learning effect) and selective adaptation (a contrastive effect in the opposite direction of recalibration). We observed both aftereffects as well as an on-line effect of lip-read information on auditory perception (i.e., immediate capture), but there was no evidence for a "self" advantage in any of the tasks (as additionally supported by Bayesian statistics). These findings strengthen the emerging notion that recalibration reflects a general learning mechanism, and bolster the argument that adaptation depends on rather low-level auditory/acoustic features of the speech signal.
Collapse
Affiliation(s)
- Maria Modelska
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Marie Pourquié
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
- UPPA, IKER (UMR5478), Bayonne, France
| | - Martijn Baart
- BCBL – Basque Center on Cognition, Brain and Language, Donostia, Spain
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, Netherlands
| |
Collapse
|
28
|
Devaraju DS, U AK, Maruthy S. Comparison of McGurk Effect across Three Consonant-Vowel Combinations in Kannada. J Audiol Otol 2019; 23:39-43. [PMID: 30518196 PMCID: PMC6348306 DOI: 10.7874/jao.2018.00234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 06/16/2018] [Accepted: 07/16/2018] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND AND OBJECTIVES The influence of visual stimulus on the auditory component in the perception of auditory-visual (AV) consonant-vowel syllables has been demonstrated in different languages. Inherent properties of unimodal stimuli are known to modulate AV integration. The present study investigated how the amount of McGurk effect (an outcome of AV integration) varies across three different consonant combinations in Kannada language. The importance of unimodal syllable identification on the amount of McGurk effect was also seen. Subjects and. METHODS Twenty-eight individuals performed an AV identification task with ba/ ga, pa/ka and ma/n· a consonant combinations in AV congruent, AV incongruent (McGurk combination), audio alone and visual alone condition. Cluster analysis was performed using the identification scores for the incongruent stimuli, to classify the individuals into two groups; one with high and the other with low McGurk scores. The differences in the audio alone and visual alone scores between these groups were compared. RESULTS The results showed significantly higher McGurk scores for ma/n· a compared to ba/ga and pa/ka combinations in both high and low McGurk score groups. No significant difference was noted between ba/ga and pa/ka combinations in either group. Identification of /n· a/ presented in the visual alone condition correlated negatively with the higher McGurk scores. CONCLUSIONS The results suggest that the final percept following the AV integration is not exclusively explained by the unimodal identification of the syllables. But there are other factors which may also contribute to making inferences about the final percept.
Collapse
Affiliation(s)
- Dhatri S Devaraju
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| | - Ajith Kumar U
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| | - Santosh Maruthy
- Department of Speech-Language Sciences, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, Karnataka, India
| |
Collapse
|
29
|
Abstract
Speech research during recent years has moved progressively away from its traditional focus on audition toward a more multisensory approach. In addition to audition and vision, many somatosenses including proprioception, pressure, vibration and aerotactile sensation are all highly relevant modalities for experiencing and/or conveying speech. In this article, we review both long-standing cross-modal effects stemming from decades of audiovisual speech research as well as new findings related to somatosensory effects. Cross-modal effects in speech perception to date are found to be constrained by temporal congruence and signal relevance, but appear to be unconstrained by spatial congruence. Far from taking place in a one-, two- or even three-dimensional space, the literature reveals that speech occupies a highly multidimensional sensory space. We argue that future research in cross-modal effects should expand to consider each of these modalities both separately and in combination with other modalities in speech.
Collapse
Affiliation(s)
- Megan Keough
- Interdisciplinary Speech Research Lab, Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Donald Derrick
- New Zealand Institute of Brain and Behaviour, University of Canterbury, Christchurch 8140, New Zealand
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
| | - Bryan Gick
- Interdisciplinary Speech Research Lab, Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Haskins Laboratories, Yale University, New Haven, CT 06511, USA
| |
Collapse
|
30
|
Hernández-Gutiérrez D, Abdel Rahman R, Martín-Loeches M, Muñoz F, Schacht A, Sommer W. Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence. Cortex 2018; 104:12-25. [DOI: 10.1016/j.cortex.2018.03.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 11/08/2017] [Accepted: 03/31/2018] [Indexed: 11/26/2022]
|
31
|
Van Ackeren MJ, Barbero FM, Mattioni S, Bottini R, Collignon O. Neuronal populations in the occipital cortex of the blind synchronize to the temporal dynamics of speech. eLife 2018; 7:e31640. [PMID: 29338838 PMCID: PMC5790372 DOI: 10.7554/elife.31640] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 01/16/2018] [Indexed: 11/13/2022] Open
Abstract
The occipital cortex of early blind individuals (EB) activates during speech processing, challenging the notion of a hard-wired neurobiology of language. But, at what stage of speech processing do occipital regions participate in EB? Here we demonstrate that parieto-occipital regions in EB enhance their synchronization to acoustic fluctuations in human speech in the theta-range (corresponding to syllabic rate), irrespective of speech intelligibility. Crucially, enhanced synchronization to the intelligibility of speech was selectively observed in primary visual cortex in EB, suggesting that this region is at the interface between speech perception and comprehension. Moreover, EB showed overall enhanced functional connectivity between temporal and occipital cortices that are sensitive to speech intelligibility and altered directionality when compared to the sighted group. These findings suggest that the occipital cortex of the blind adopts an architecture that allows the tracking of speech material, and therefore does not fully abstract from the reorganized sensory inputs it receives.
Collapse
Affiliation(s)
| | - Francesca M Barbero
- Institute of research in PsychologyUniversity of LouvainLouvainBelgium
- Institute of NeuroscienceUniversity of LouvainLouvainBelgium
| | | | - Roberto Bottini
- Center for Mind/Brain StudiesUniversity of TrentoTrentoItaly
| | - Olivier Collignon
- Center for Mind/Brain StudiesUniversity of TrentoTrentoItaly
- Institute of research in PsychologyUniversity of LouvainLouvainBelgium
- Institute of NeuroscienceUniversity of LouvainLouvainBelgium
| |
Collapse
|
32
|
Treille A, Vilain C, Schwartz JL, Hueber T, Sato M. Electrophysiological evidence for Audio-visuo-lingual speech integration. Neuropsychologia 2018; 109:126-133. [DOI: 10.1016/j.neuropsychologia.2017.12.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 11/21/2017] [Accepted: 12/13/2017] [Indexed: 01/25/2023]
|
33
|
Dormal G, Pelland M, Rezk M, Yakobov E, Lepore F, Collignon O. Functional Preference for Object Sounds and Voices in the Brain of Early Blind and Sighted Individuals. J Cogn Neurosci 2018; 30:86-106. [DOI: 10.1162/jocn_a_01186] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Sounds activate occipital regions in early blind individuals. However, how different sound categories map onto specific regions of the occipital cortex remains a matter of debate. We used fMRI to characterize brain responses of early blind and sighted individuals to familiar object sounds, human voices, and their respective low-level control sounds. In addition, sighted participants were tested while viewing pictures of faces, objects, and phase-scrambled control pictures. In both early blind and sighted, a double dissociation was evidenced in bilateral auditory cortices between responses to voices and object sounds: Voices elicited categorical responses in bilateral superior temporal sulci, whereas object sounds elicited categorical responses along the lateral fissure bilaterally, including the primary auditory cortex and planum temporale. Outside the auditory regions, object sounds also elicited categorical responses in the left lateral and in the ventral occipitotemporal regions in both groups. These regions also showed response preference for images of objects in the sighted group, thus suggesting a functional specialization that is independent of sensory input and visual experience. Between-group comparisons revealed that, only in the blind group, categorical responses to object sounds extended more posteriorly into the occipital cortex. Functional connectivity analyses evidenced a selective increase in the functional coupling between these reorganized regions and regions of the ventral occipitotemporal cortex in the blind group. In contrast, vocal sounds did not elicit preferential responses in the occipital cortex in either group. Nevertheless, enhanced voice-selective connectivity between the left temporal voice area and the right fusiform gyrus were found in the blind group. Altogether, these findings suggest that, in the absence of developmental vision, separate auditory categories are not equipotent in driving selective auditory recruitment of occipitotemporal regions and highlight the presence of domain-selective constraints on the expression of cross-modal plasticity.
Collapse
Affiliation(s)
| | | | | | | | | | - Olivier Collignon
- University of Montreal
- University of Louvain
- McGill University, Montreal, Canada
| |
Collapse
|
34
|
Baart M, Lindborg A, Andersen TS. Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception. Eur J Neurosci 2017; 46:2578-2583. [PMID: 28976045 PMCID: PMC5725699 DOI: 10.1111/ejn.13734] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Revised: 09/27/2017] [Accepted: 09/27/2017] [Indexed: 11/30/2022]
Abstract
Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech‐induced suppression of P2 amplitude (which is generally taken as a measure of audiovisual integration) for fusions was similar to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli.
Collapse
Affiliation(s)
- Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, Tilburg, 5000 LE, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
35
|
Giordano BL, Ince RAA, Gross J, Schyns PG, Panzeri S, Kayser C. Contributions of local speech encoding and functional connectivity to audio-visual speech perception. eLife 2017; 6. [PMID: 28590903 PMCID: PMC5462535 DOI: 10.7554/elife.24763] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/07/2017] [Indexed: 11/13/2022] Open
Abstract
Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI:http://dx.doi.org/10.7554/eLife.24763.001 When listening to someone in a noisy environment, such as a cocktail party, we can understand the speaker more easily if we can also see his or her face. Movements of the lips and tongue convey additional information that helps the listener’s brain separate out syllables, words and sentences. However, exactly where in the brain this effect occurs and how it works remain unclear. To find out, Giordano et al. scanned the brains of healthy volunteers as they watched clips of people speaking. The clarity of the speech varied between clips. Furthermore, in some of the clips the lip movements of the speaker corresponded to the speech in question, whereas in others the lip movements were nonsense babble. As expected, the volunteers performed better on a word recognition task when the speech was clear and when the lips movements agreed with the spoken dialogue. Watching the video clips stimulated rhythmic activity in multiple regions of the volunteers’ brains, including areas that process sound and areas that plan movements. Speech is itself rhythmic, and the volunteers’ brain activity synchronized with the rhythms of the speech they were listening to. Seeing the speaker’s face increased this degree of synchrony. However, it also made it easier for sound-processing regions within the listeners’ brains to transfer information to one other. Notably, only the latter effect predicted improved performance on the word recognition task. This suggests that seeing a person’s face makes it easier to understand his or her speech by boosting communication between brain regions, rather than through effects on individual areas. Further work is required to determine where and how the brain encodes lip movements and speech sounds. The next challenge will be to identify where these two sets of information interact, and how the brain merges them together to generate the impression of specific words. DOI:http://dx.doi.org/10.7554/eLife.24763.002
Collapse
Affiliation(s)
- Bruno L Giordano
- Institut de Neurosciences de la Timone UMR 7289, Aix Marseille Université - Centre National de la Recherche Scientifique, Marseille, France.,Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Philippe G Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Stefano Panzeri
- Neural Computation Laboratory, Center for Neuroscience and Cognitive Systems, Istituto Italiano di Tecnologia, Rovereto, Italy
| | - Christoph Kayser
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
36
|
Being First Matters: Topographical Representational Similarity Analysis of ERP Signals Reveals Separate Networks for Audiovisual Temporal Binding Depending on the Leading Sense. J Neurosci 2017; 37:5274-5287. [PMID: 28450537 PMCID: PMC5456109 DOI: 10.1523/jneurosci.2926-16.2017] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Revised: 02/20/2017] [Accepted: 02/25/2017] [Indexed: 11/30/2022] Open
Abstract
In multisensory integration, processing in one sensory modality is enhanced by complementary information from other modalities. Intersensory timing is crucial in this process because only inputs reaching the brain within a restricted temporal window are perceptually bound. Previous research in the audiovisual field has investigated various features of the temporal binding window, revealing asymmetries in its size and plasticity depending on the leading input: auditory–visual (AV) or visual–auditory (VA). Here, we tested whether separate neuronal mechanisms underlie this AV–VA dichotomy in humans. We recorded high-density EEG while participants performed an audiovisual simultaneity judgment task including various AV–VA asynchronies and unisensory control conditions (visual-only, auditory-only) and tested whether AV and VA processing generate different patterns of brain activity. After isolating the multisensory components of AV–VA event-related potentials (ERPs) from the sum of their unisensory constituents, we ran a time-resolved topographical representational similarity analysis (tRSA) comparing the AV and VA ERP maps. Spatial cross-correlation matrices were built from real data to index the similarity between the AV and VA maps at each time point (500 ms window after stimulus) and then correlated with two alternative similarity model matrices: AVmaps = VAmaps versus AVmaps ≠ VAmaps. The tRSA results favored the AVmaps ≠ VAmaps model across all time points, suggesting that audiovisual temporal binding (indexed by synchrony perception) engages different neural pathways depending on the leading sense. The existence of such dual route supports recent theoretical accounts proposing that multiple binding mechanisms are implemented in the brain to accommodate different information parsing strategies in auditory and visual sensory systems. SIGNIFICANCE STATEMENT Intersensory timing is a crucial aspect of multisensory integration, determining whether and how inputs in one modality enhance stimulus processing in another modality. Our research demonstrates that evaluating synchrony of auditory-leading (AV) versus visual-leading (VA) audiovisual stimulus pairs is characterized by two distinct patterns of brain activity. This suggests that audiovisual integration is not a unitary process and that different binding mechanisms are recruited in the brain based on the leading sense. These mechanisms may be relevant for supporting different classes of multisensory operations, for example, auditory enhancement of visual attention (AV) and visual enhancement of auditory speech (VA).
Collapse
|
37
|
Gori M, Chilosi A, Forli F, Burr D. Audio-visual temporal perception in children with restored hearing. Neuropsychologia 2017; 99:350-359. [PMID: 28365363 DOI: 10.1016/j.neuropsychologia.2017.03.025] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Revised: 03/20/2017] [Accepted: 03/22/2017] [Indexed: 11/27/2022]
Abstract
It is not clear how audio-visual temporal perception develops in children with restored hearing. In this study we measured temporal discrimination thresholds with an audio-visual temporal bisection task in 9 deaf children with restored audition, and 22 typically hearing children. In typically hearing children, audition was more precise than vision, with no gain in multisensory conditions (as previously reported in Gori et al. (2012b)). However, deaf children with restored audition showed similar thresholds for audio and visual thresholds and some evidence of gain in audio-visual temporal multisensory conditions. Interestingly, we found a strong correlation between auditory weighting of multisensory signals and quality of language: patients who gave more weight to audition had better language skills. Similarly, auditory thresholds for the temporal bisection task were also a good predictor of language skills. This result supports the idea that the temporal auditory processing is associated with language development.
Collapse
Affiliation(s)
- Monica Gori
- U-VIP Unit for Visually Impaired People, Center for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, Genoa, Italy.
| | - Anna Chilosi
- Department of Developmental Neuroscience, Stella Maris Scientific Institute, Pisa, Italy
| | - Francesca Forli
- Azienda Ospedaliero-Universitaria Pisana, Pisa, Italy; Dipartimento di Psicologia, Università Degli Studi di Firenze, Via S. Salvi 12, 50125, Florence, Italy
| | - David Burr
- School of Psychology, University of Western Australia, Perth, WA, Australia
| |
Collapse
|
38
|
Shahin AJ, Shen S, Kerlin JR. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech. LANGUAGE, COGNITION AND NEUROSCIENCE 2017; 32:1102-1118. [PMID: 28966930 PMCID: PMC5617130 DOI: 10.1080/23273798.2017.1283428] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/07/2017] [Indexed: 06/07/2023]
Abstract
We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync. Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Collapse
Affiliation(s)
- Antoine J Shahin
- Center for Mind and Brain, University of California, Davis, CA, 95618
| | - Stanley Shen
- Center for Mind and Brain, University of California, Davis, CA, 95618
| | - Jess R Kerlin
- Center for Mind and Brain, University of California, Davis, CA, 95618
| |
Collapse
|
39
|
Su YH. Visual Enhancement of Illusory Phenomenal Accents in Non-Isochronous Auditory Rhythms. PLoS One 2016; 11:e0166880. [PMID: 27880850 PMCID: PMC5120798 DOI: 10.1371/journal.pone.0166880] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 11/04/2016] [Indexed: 11/19/2022] Open
Abstract
Musical rhythms encompass temporal patterns that often yield regular metrical accents (e.g., a beat). There have been mixed results regarding perception as a function of metrical saliency, namely, whether sensitivity to a deviant was greater in metrically stronger or weaker positions. Besides, effects of metrical position have not been examined in non-isochronous rhythms, or with respect to multisensory influences. This study was concerned with two main issues: (1) In non-isochronous auditory rhythms with clear metrical accents, how would sensitivity to a deviant be modulated by metrical positions? (2) Would the effects be enhanced by multisensory information? Participants listened to strongly metrical rhythms with or without watching a point-light figure dance to the rhythm in the same meter, and detected a slight loudness increment. Both conditions were presented with or without an auditory interference that served to impair auditory metrical perception. Sensitivity to a deviant was found greater in weak beat than in strong beat positions, consistent with the Predictive Coding hypothesis and the idea of metrically induced illusory phenomenal accents. The visual rhythm of dance hindered auditory detection, but more so when the latter was itself less impaired. This pattern suggested that the visual and auditory rhythms were perceptually integrated to reinforce metrical accentuation, yielding more illusory phenomenal accents and thus lower sensitivity to deviants, in a manner consistent with the principle of inverse effectiveness. Results were discussed in the predictive framework for multisensory rhythms involving observed movements and possible mediation of the motor system.
Collapse
Affiliation(s)
- Yi-Huang Su
- Department of Movement Science, Faculty of Sport and Health Sciences, Technical University of Munich, Munich, Germany
| |
Collapse
|
40
|
Rosenblum LD, Dorsi J, Dias JW. The Impact and Status of Carol Fowler's Supramodal Theory of Multisensory Speech Perception. ECOLOGICAL PSYCHOLOGY 2016. [DOI: 10.1080/10407413.2016.1230373] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
41
|
The McGurk effect: An investigation of attentional capacity employing response times. Atten Percept Psychophys 2016; 78:1712-27. [DOI: 10.3758/s13414-016-1133-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
42
|
Rosenblum LD, Dias JW, Dorsi J. The supramodal brain: implications for auditory perception. JOURNAL OF COGNITIVE PSYCHOLOGY 2016. [DOI: 10.1080/20445911.2016.1181691] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
43
|
Cecere R, Gross J, Thut G. Behavioural evidence for separate mechanisms of audiovisual temporal binding as a function of leading sensory modality. Eur J Neurosci 2016; 43:1561-8. [PMID: 27003546 PMCID: PMC4915493 DOI: 10.1111/ejn.13242] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Revised: 02/09/2016] [Accepted: 03/17/2016] [Indexed: 11/30/2022]
Abstract
The ability to integrate auditory and visual information is critical for effective perception and interaction with the environment, and is thought to be abnormal in some clinical populations. Several studies have investigated the time window over which audiovisual events are integrated, also called the temporal binding window, and revealed asymmetries depending on the order of audiovisual input (i.e. the leading sense). When judging audiovisual simultaneity, the binding window appears narrower and non-malleable for auditory-leading stimulus pairs and wider and trainable for visual-leading pairs. Here we specifically examined the level of independence of binding mechanisms when auditory-before-visual vs. visual-before-auditory input is bound. Three groups of healthy participants practiced audiovisual simultaneity detection with feedback, selectively training on auditory-leading stimulus pairs (group 1), visual-leading stimulus pairs (group 2) or both (group 3). Subsequently, we tested for learning transfer (crossover) from trained stimulus pairs to non-trained pairs with opposite audiovisual input. Our data confirmed the known asymmetry in size and trainability for auditory-visual vs. visual-auditory binding windows. More importantly, practicing one type of audiovisual integration (e.g. auditory-visual) did not affect the other type (e.g. visual-auditory), even if trainable by within-condition practice. Together, these results provide crucial evidence that audiovisual temporal binding for auditory-leading vs. visual-leading stimulus pairs are independent, possibly tapping into different circuits for audiovisual integration due to engagement of different multisensory sampling mechanisms depending on leading sense. Our results have implications for informing the study of multisensory interactions in healthy participants and clinical populations with dysfunctional multisensory integration.
Collapse
Affiliation(s)
- Roberto Cecere
- Centre for Cognitive Neuroimaging (CCNi), Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, G12 8QB, Glasgow, UK
| | - Joachim Gross
- Centre for Cognitive Neuroimaging (CCNi), Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, G12 8QB, Glasgow, UK
| | - Gregor Thut
- Centre for Cognitive Neuroimaging (CCNi), Institute of Neuroscience and Psychology, University of Glasgow, 58 Hillhead Street, G12 8QB, Glasgow, UK
| |
Collapse
|
44
|
Using EEG and stimulus context to probe the modelling of auditory-visual speech. Cortex 2016; 75:220-230. [DOI: 10.1016/j.cortex.2015.03.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 03/20/2015] [Accepted: 03/20/2015] [Indexed: 01/22/2023]
|
45
|
Gau R, Noppeney U. How prior expectations shape multisensory perception. Neuroimage 2016; 124:876-886. [DOI: 10.1016/j.neuroimage.2015.09.045] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Accepted: 09/20/2015] [Indexed: 11/24/2022] Open
|
46
|
Rhone AE, Nourski KV, Oya H, Kawasaki H, Howard MA, McMurray B. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex. LANGUAGE, COGNITION AND NEUROSCIENCE 2015; 31:284-302. [PMID: 27182530 PMCID: PMC4865257 DOI: 10.1080/23273798.2015.1101145] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
Collapse
|
47
|
Abstract
From an ecological point of view, approaching objects are potentially more harmful than receding objects. A predator, a dominant conspecific, or a mere branch coming up at high speed can all be dangerous if one does not detect them and produce the appropriate escape behavior fast enough. And indeed, looming stimuli trigger stereotyped defensive responses in both monkeys and human infants. However, while the heteromodal somatosensory consequences of visual looming stimuli can be fully predicted by their spatiotemporal dynamics, few studies if any have explored whether visual stimuli looming toward the face predictively enhance heteromodal tactile sensitivity around the expected time of impact and at its expected location on the body. In the present study, we report that, in addition to triggering a defensive motor repertoire, looming stimuli toward the face provide the nervous system with predictive cues that enhance tactile sensitivity on the face. Specifically, we describe an enhancement of tactile processes at the expected time and location of impact of the stimulus on the face. We additionally show that a looming stimulus that brushes past the face also enhances tactile sensitivity on the nearby cheek, suggesting that the space close to the face is incorporated into the subjects' body schema. We propose that this cross-modal predictive facilitation involves multisensory convergence areas subserving the representation of a peripersonal space and a safety boundary of self.
Collapse
|
48
|
Prediction across sensory modalities: A neurocomputational model of the McGurk effect. Cortex 2015; 68:61-75. [PMID: 26009260 DOI: 10.1016/j.cortex.2015.04.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 02/17/2015] [Accepted: 04/14/2015] [Indexed: 01/22/2023]
Abstract
The McGurk effect is a textbook illustration of the automaticity with which the human brain integrates audio-visual speech. It shows that even incongruent audiovisual (AV) speech stimuli can be combined into percepts that correspond neither to the auditory nor to the visual input, but to a mix of both. Typically, when presented with, e.g., visual /aga/ and acoustic /aba/ we perceive an illusory /ada/. In the inverse situation, however, when acoustic /aga/ is paired with visual /aba/, we perceive a combination of both stimuli, i.e., /abga/ or /agba/. Here we assessed the role of dynamic cross-modal predictions in the outcome of AV speech integration using a computational model that processes continuous audiovisual speech sensory inputs in a predictive coding framework. The model involves three processing levels: sensory units, units that encode the dynamics of stimuli, and multimodal recognition/identity units. The model exhibits a dynamic prediction behavior because evidence about speech tokens can be asynchronous across sensory modality, allowing for updating the activity of the recognition units from one modality while sending top-down predictions to the other modality. We explored the model's response to congruent and incongruent AV stimuli and found that, in the two-dimensional feature space spanned by the speech second formant and lip aperture, fusion stimuli are located in the neighborhood of congruent /ada/, which therefore provides a valid match. Conversely, stimuli that lead to combination percepts do not have a unique valid neighbor. In that case, acoustic and visual cues are both highly salient and generate conflicting predictions in the other modality that cannot be fused, forcing the elaboration of a combinatorial solution. We propose that dynamic predictive mechanisms play a decisive role in the dichotomous perception of incongruent audiovisual inputs.
Collapse
|
49
|
van Wassenhove V, Grzeczkowski L. Visual-induced expectations modulate auditory cortical responses. Front Neurosci 2015; 9:11. [PMID: 25705174 PMCID: PMC4319385 DOI: 10.3389/fnins.2015.00011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 01/11/2015] [Indexed: 11/13/2022] Open
Abstract
Active sensing has important consequences on multisensory processing (Schroeder et al., 2010). Here, we asked whether in the absence of saccades, the position of the eyes and the timing of transient color changes of visual stimuli could selectively affect the excitability of auditory cortex by predicting the “where” and the “when” of a sound, respectively. Human participants were recorded with magnetoencephalography (MEG) while maintaining the position of their eyes on the left, right, or center of the screen. Participants counted color changes of the fixation cross while neglecting sounds which could be presented to the left, right, or both ears. First, clear alpha power increases were observed in auditory cortices, consistent with participants' attention directed to visual inputs. Second, color changes elicited robust modulations of auditory cortex responses (“when” prediction) seen as ramping activity, early alpha phase-locked responses, and enhanced high-gamma band responses in the contralateral side of sound presentation. Third, no modulations of auditory evoked or oscillatory activity were found to be specific to eye position. Altogether, our results suggest that visual transience can automatically elicit a prediction of “when” a sound will occur by changing the excitability of auditory cortices irrespective of the attended modality, eye position or spatial congruency of auditory and visual events. To the contrary, auditory cortical responses were not significantly affected by eye position suggesting that “where” predictions may require active sensing or saccadic reset to modulate auditory cortex responses, notably in the absence of spatial orientation to sounds.
Collapse
Affiliation(s)
- Virginie van Wassenhove
- CEA, DSV/I2BM, NeuroSpin; INSERM, Cognitive Neuroimaging Unit, U992; Université Paris-Sud Gif-sur-Yvette, France
| | - Lukasz Grzeczkowski
- CEA, DSV/I2BM, NeuroSpin; INSERM, Cognitive Neuroimaging Unit, U992; Université Paris-Sud Gif-sur-Yvette, France ; Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland
| |
Collapse
|
50
|
Kesner L. The predictive mind and the experience of visual art work. Front Psychol 2014; 5:1417. [PMID: 25566111 PMCID: PMC4267174 DOI: 10.3389/fpsyg.2014.01417] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 11/19/2014] [Indexed: 11/14/2022] Open
Abstract
Among the main challenges of the predictive brain/mind concept is how to link prediction at the neural level to prediction at the cognitive-psychological level and finding conceptually robust and empirically verifiable ways to harness this theoretical framework toward explaining higher-order mental and cognitive phenomena, including the subjective experience of aesthetic and symbolic forms. Building on the tentative prediction error account of visual art, this article extends the application of the predictive coding framework to the visual arts. It does so by linking this theoretical discussion to a subjective, phenomenological account of how a work of art is experienced. In order to engage more deeply with a work of art, viewers must be able to tune or adapt their prediction mechanism to recognize art as a specific class of objects whose ontological nature defies predictability, and they must be able to sustain a productive flow of predictions from low-level sensory, recognitional to abstract semantic, conceptual, and affective inferences. The affective component of the process of predictive error optimization that occurs when a viewer enters into dialog with a painting is constituted both by activating the affective affordances within the image and by the affective consequences of prediction error minimization itself. The predictive coding framework also has implications for the problem of the culturality of vision. A person's mindset, which determines what top-down expectations and predictions are generated, is co-constituted by culture-relative skills and knowledge, which form hyperpriors that operate in the perception of art.
Collapse
Affiliation(s)
- Ladislav Kesner
- Department of Art History, Masaryk UniversityBrno, Czech Republic
| |
Collapse
|