1
|
Zhang Y, Huang J, Huang L, Peng L, Wang X, Zhang Q, Zeng Y, Yang J, Li Z, Sun X, Liang S. Atypical characteristic changes of surface morphology and structural covariance network in developmental dyslexia. Neurol Sci 2024; 45:2261-2270. [PMID: 37996775 DOI: 10.1007/s10072-023-07193-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 11/03/2023] [Indexed: 11/25/2023]
Abstract
BACKGROUND Developmental dyslexia (DD) is a neurodevelopmental disorder that is characterized by difficulties with all aspects of information acquisition in the written word, including slow and inaccurate word recognition. The neural basis behind DD has not been fully elucidated. METHOD The study included 22 typically developing (TD) children, 16 children with isolated spelling disorder (SpD), and 20 children with DD. The cortical thickness, folding index, and mean curvature of Broca's area, including the triangular part of the left inferior frontal gyrus (IFGtriang) and the opercular part of the left inferior frontal gyrus, were assessed to explore the differences of surface morphology among the TD, SpD, and DD groups. Furthermore, the structural covariance network (SCN) of the triangular part of the left inferior frontal gyrus was analyzed to explore the changes of structural connectivity in the SpD and DD groups. RESULTS The DD group showed higher curvature and cortical folding of the left IFGtriang than the TD group and SpD group. In addition, compared with the TD group and the SpD group, the structural connectivity between the left IFGtriang and the left middle-frontal gyrus and the right mid-orbital frontal gyrus was increased in the DD group, and the structural connectivity between the left IFGtriang and the right precuneus and anterior cingulate was decreased in the DD group. CONCLUSION DD had atypical structural connectivity in brain regions related to visual attention, memory and which might impact the information input and integration needed for reading and spelling.
Collapse
Affiliation(s)
- Yusi Zhang
- National-Local Joint Engineering Research Center of Rehabilitation Medicine Technology, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
- Rehabilitation Industry Institute, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
- Fujian Key Laboratory of Cognitive Rehabilitation, Affiliated Rehabilitation Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, 350001, Fujian, China
| | - Jiayang Huang
- National-Local Joint Engineering Research Center of Rehabilitation Medicine Technology, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
| | - Li Huang
- National-Local Joint Engineering Research Center of Rehabilitation Medicine Technology, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
| | - Lixin Peng
- National-Local Joint Engineering Research Center of Rehabilitation Medicine Technology, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
| | - Xiuxiu Wang
- National-Local Joint Engineering Research Center of Rehabilitation Medicine Technology, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
| | - Qingqing Zhang
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
| | - Yi Zeng
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
| | - Junchao Yang
- National-Local Joint Engineering Research Center of Rehabilitation Medicine Technology, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
- College of Rehabilitation Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
| | - Zuanfang Li
- National-Local Joint Engineering Research Center of Rehabilitation Medicine Technology, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China
| | - Xi Sun
- College of Information Engineering, Nanyang Institute of Technology, Nanyang, 473004, China
| | - Shengxiang Liang
- National-Local Joint Engineering Research Center of Rehabilitation Medicine Technology, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China.
- Rehabilitation Industry Institute, Fujian University of Traditional Chinese Medicine, Fuzhou, 350122, China.
- Fujian Key Laboratory of Cognitive Rehabilitation, Affiliated Rehabilitation Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, 350001, Fujian, China.
| |
Collapse
|
2
|
Jertberg RM, Begeer S, Geurts HM, Chakrabarti B, Van der Burg E. Perception of temporal synchrony not a prerequisite for multisensory integration. Sci Rep 2024; 14:4982. [PMID: 38424118 PMCID: PMC10904801 DOI: 10.1038/s41598-024-55572-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 02/25/2024] [Indexed: 03/02/2024] Open
Abstract
Temporal alignment is often viewed as the most essential cue the brain can use to integrate information from across sensory modalities. However, the importance of conscious perception of synchrony to multisensory integration is a controversial topic. Conversely, the influence of cross-modal incongruence of higher level stimulus features such as phonetics on temporal processing is poorly understood. To explore the nuances of this relationship between temporal processing and multisensory integration, we presented 101 participants (ranging from 19 to 73 years of age) with stimuli designed to elicit the McGurk/MacDonald illusion (either matched or mismatched pairs of phonemes and visemes) with varying degrees of stimulus onset asynchrony between the visual and auditory streams. We asked them to indicate which syllable they perceived and whether the video and audio were synchronized on each trial. We found that participants often experienced the illusion despite not perceiving the stimuli as synchronous, and the same phonetic incongruence that produced the illusion also led to significant interference in simultaneity judgments. These findings challenge the longstanding assumption that perception of synchrony is a prerequisite to multisensory integration, support a more flexible view of multisensory integration, and suggest a complex, reciprocal relationship between temporal and multisensory processing.
Collapse
Affiliation(s)
- Robert M Jertberg
- Department of Clinical and Developmental Psychology, The Netherlands and Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sander Begeer
- Department of Clinical and Developmental Psychology, The Netherlands and Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Hilde M Geurts
- Brain and Cognition, Department of Psychology, Dutch Autism and ADHD Research Center (d'Arc), Universiteit van Amsterdam, Amsterdam, The Netherlands
| | - Bhismadev Chakrabarti
- Centre for Autism, School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK.
- India Autism Center, Kolkata, India.
- Department of Psychology, Ashoka University, Sonipat, India.
| | - Erik Van der Burg
- Brain and Cognition, Department of Psychology, Dutch Autism and ADHD Research Center (d'Arc), Universiteit van Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
3
|
Haider CL, Park H, Hauswald A, Weisz N. Neural Speech Tracking Highlights the Importance of Visual Speech in Multi-speaker Situations. J Cogn Neurosci 2024; 36:128-142. [PMID: 37977156 DOI: 10.1162/jocn_a_02059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Visual speech plays a powerful role in facilitating auditory speech processing and has been a publicly noticed topic with the wide usage of face masks during the COVID-19 pandemic. In a previous magnetoencephalography study, we showed that occluding the mouth area significantly impairs neural speech tracking. To rule out the possibility that this deterioration is because of degraded sound quality, in the present follow-up study, we presented participants with audiovisual (AV) and audio-only (A) speech. We further independently manipulated the trials by adding a face mask and a distractor speaker. Our results clearly show that face masks only affect speech tracking in AV conditions, not in A conditions. This shows that face masks indeed primarily impact speech processing by blocking visual speech and not by acoustic degradation. We can further highlight how the spectrogram, lip movements and lexical units are tracked on a sensor level. We can show visual benefits for tracking the spectrogram especially in the multi-speaker condition. While lip movements only show additional improvement and visual benefit over tracking of the spectrogram in clear speech conditions, lexical units (phonemes and word onsets) do not show visual enhancement at all. We hypothesize that in young normal hearing individuals, information from visual input is less used for specific feature extraction, but acts more as a general resource for guiding attention.
Collapse
Affiliation(s)
| | | | | | - Nathan Weisz
- Paris Lodron Universität Salzburg
- Paracelsus Medical University Salzburg
| |
Collapse
|
4
|
Nidiffer AR, Cao CZ, O'Sullivan A, Lalor EC. A representation of abstract linguistic categories in the visual system underlies successful lipreading. Neuroimage 2023; 282:120391. [PMID: 37757989 DOI: 10.1016/j.neuroimage.2023.120391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 09/29/2023] Open
Abstract
There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.
Collapse
Affiliation(s)
- Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Cody Zhewei Cao
- Department of Psychology, University of Michigan, Ann Arbor, MI, USA
| | - Aisling O'Sullivan
- School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland.
| |
Collapse
|
5
|
Schmidt F, Chen Y, Keitel A, Rösch S, Hannemann R, Serman M, Hauswald A, Weisz N. Neural speech tracking shifts from the syllabic to the modulation rate of speech as intelligibility decreases. Psychophysiology 2023; 60:e14362. [PMID: 37350379 PMCID: PMC10909526 DOI: 10.1111/psyp.14362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 04/24/2023] [Accepted: 05/10/2023] [Indexed: 06/24/2023]
Abstract
The most prominent acoustic features in speech are intensity modulations, represented by the amplitude envelope of speech. Synchronization of neural activity with these modulations supports speech comprehension. As the acoustic modulation of speech is related to the production of syllables, investigations of neural speech tracking commonly do not distinguish between lower-level acoustic (envelope modulation) and higher-level linguistic (syllable rate) information. Here we manipulated speech intelligibility using noise-vocoded speech and investigated the spectral dynamics of neural speech processing, across two studies at cortical and subcortical levels of the auditory hierarchy, using magnetoencephalography. Overall, cortical regions mostly track the syllable rate, whereas subcortical regions track the acoustic envelope. Furthermore, with less intelligible speech, tracking of the modulation rate becomes more dominant. Our study highlights the importance of distinguishing between envelope modulation and syllable rate and provides novel possibilities to better understand differences between auditory processing and speech/language processing disorders.
Collapse
Affiliation(s)
- Fabian Schmidt
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Ya‐Ping Chen
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Anne Keitel
- Psychology, School of Social SciencesUniversity of DundeeDundeeUK
| | - Sebastian Rösch
- Department of OtorhinolaryngologyParacelsus Medical UniversitySalzburgAustria
| | | | - Maja Serman
- Audiological Research UnitSivantos GmbHErlangenGermany
| | - Anne Hauswald
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Nathan Weisz
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical UniversitySalzburgAustria
| |
Collapse
|
6
|
Jiang Z, An X, Liu S, Yin E, Yan Y, Ming D. Neural oscillations reflect the individual differences in the temporal perception of audiovisual speech. Cereb Cortex 2023; 33:10575-10583. [PMID: 37727958 DOI: 10.1093/cercor/bhad304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 09/21/2023] Open
Abstract
Multisensory integration occurs within a limited time interval between multimodal stimuli. Multisensory temporal perception varies widely among individuals and involves perceptual synchrony and temporal sensitivity processes. Previous studies explored the neural mechanisms of individual differences for beep-flash stimuli, whereas there was no study for speech. In this study, 28 subjects (16 male) performed an audiovisual speech/ba/simultaneity judgment task while recording their electroencephalography. We examined the relationship between prestimulus neural oscillations (i.e. the pre-pronunciation movement-related oscillations) and temporal perception. The perceptual synchrony was quantified using the Point of Subjective Simultaneity and temporal sensitivity using the Temporal Binding Window. Our results revealed dissociated neural mechanisms for individual differences in Temporal Binding Window and Point of Subjective Simultaneity. The frontocentral delta power, reflecting top-down attention control, is positively related to the magnitude of individual auditory leading Temporal Binding Windows (auditory Temporal Binding Windows; LTBWs), whereas the parieto-occipital theta power, indexing bottom-up visual temporal attention specific to speech, is negatively associated with the magnitude of individual visual leading Temporal Binding Windows (visual Temporal Binding Windows; RTBWs). In addition, increased left frontal and bilateral temporoparietal occipital alpha power, reflecting general attentional states, is associated with increased Points of Subjective Simultaneity. Strengthening attention abilities might improve the audiovisual temporal perception of speech and further impact speech integration.
Collapse
Affiliation(s)
- Zeliang Jiang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, 300072 Tianjin, China
| | - Xingwei An
- Academy of Medical Engineering and Translational Medicine, Tianjin University, 300072 Tianjin, China
| | - Shuang Liu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, 300072 Tianjin, China
| | - Erwei Yin
- Academy of Medical Engineering and Translational Medicine, Tianjin University, 300072 Tianjin, China
- Defense Innovation Institute, Academy of Military Sciences (AMS), 100071 Beijing, China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), 300457 Tianjin, China
| | - Ye Yan
- Academy of Medical Engineering and Translational Medicine, Tianjin University, 300072 Tianjin, China
- Defense Innovation Institute, Academy of Military Sciences (AMS), 100071 Beijing, China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), 300457 Tianjin, China
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, 300072 Tianjin, China
| |
Collapse
|
7
|
Saalasti S, Alho J, Lahnakoski JM, Bacha-Trams M, Glerean E, Jääskeläinen IP, Hasson U, Sams M. Lipreading a naturalistic narrative in a female population: Neural characteristics shared with listening and reading. Brain Behav 2023; 13:e2869. [PMID: 36579557 PMCID: PMC9927859 DOI: 10.1002/brb3.2869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 12/30/2022] Open
Abstract
INTRODUCTION Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.
Collapse
Affiliation(s)
- Satu Saalasti
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.,Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Advanced Magnetic Imaging (AMI) Centre, Aalto NeuroImaging, School of Science, Aalto University, Espoo, Finland
| | - Jussi Alho
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Juha M Lahnakoski
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Munich, Germany.,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany.,Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mareike Bacha-Trams
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Enrico Glerean
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Iiro P Jääskeläinen
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto Studios - MAGICS, Aalto University, Espoo, Finland
| |
Collapse
|
8
|
Pastore A, Tomassini A, Delis I, Dolfini E, Fadiga L, D'Ausilio A. Speech listening entails neural encoding of invisible articulatory features. Neuroimage 2022; 264:119724. [PMID: 36328272 DOI: 10.1016/j.neuroimage.2022.119724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/28/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022] Open
Abstract
Speech processing entails a complex interplay between bottom-up and top-down computations. The former is reflected in the neural entrainment to the quasi-rhythmic properties of speech acoustics while the latter is supposed to guide the selection of the most relevant input subspace. Top-down signals are believed to originate mainly from motor regions, yet similar activities have been shown to tune attentional cycles also for simpler, non-speech stimuli. Here we examined whether, during speech listening, the brain reconstructs articulatory patterns associated to speech production. We measured electroencephalographic (EEG) data while participants listened to sentences during the production of which articulatory kinematics of lips, jaws and tongue were also recorded (via Electro-Magnetic Articulography, EMA). We captured the patterns of articulatory coordination through Principal Component Analysis (PCA) and used Partial Information Decomposition (PID) to identify whether the speech envelope and each of the kinematic components provided unique, synergistic and/or redundant information regarding the EEG signals. Interestingly, tongue movements contain both unique as well as synergistic information with the envelope that are encoded in the listener's brain activity. This demonstrates that during speech listening the brain retrieves highly specific and unique motor information that is never accessible through vision, thus leveraging audio-motor maps that arise most likely from the acquisition of speech production during development.
Collapse
Affiliation(s)
- A Pastore
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| | - A Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
| | - I Delis
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - E Dolfini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - L Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - A D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| |
Collapse
|
9
|
Suess N, Hauswald A, Reisinger P, Rösch S, Keitel A, Weisz N. Cortical tracking of formant modulations derived from silently presented lip movements and its decline with age. Cereb Cortex 2022; 32:4818-4833. [PMID: 35062025 PMCID: PMC9627034 DOI: 10.1093/cercor/bhab518] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 12/15/2021] [Accepted: 12/16/2021] [Indexed: 11/26/2022] Open
Abstract
The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers' lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations.
Collapse
Affiliation(s)
- Nina Suess
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Anne Hauswald
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Patrick Reisinger
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Head and Neck Surgery, Paracelsus Medical University Salzburg, University Hospital Salzburg, Salzburg 5020, Austria
| | - Anne Keitel
- School of Social Sciences, University of Dundee, Dundee DD1 4HN, UK
| | - Nathan Weisz
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
- Department of Psychology, Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg 5020, Austria
| |
Collapse
|
10
|
Influence of linguistic properties and hearing impairment on visual speech perception skills in the German language. PLoS One 2022; 17:e0275585. [PMID: 36178907 PMCID: PMC9524625 DOI: 10.1371/journal.pone.0275585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open
Abstract
Visual input is crucial for understanding speech under noisy conditions, but there are hardly any tools to assess the individual ability to lipread. With this study, we wanted to (1) investigate how linguistic characteristics of language on the one hand and hearing impairment on the other hand have an impact on lipreading abilities and (2) provide a tool to assess lipreading abilities for German speakers. 170 participants (22 prelingually deaf) completed the online assessment, which consisted of a subjective hearing impairment scale and silent videos in which different item categories (numbers, words, and sentences) were spoken. The task for our participants was to recognize the spoken stimuli just by visual inspection. We used different versions of one test and investigated the impact of item categories, word frequency in the spoken language, articulation, sentence frequency in the spoken language, sentence length, and differences between speakers on the recognition score. We found an effect of item categories, articulation, sentence frequency, and sentence length on the recognition score. With respect to hearing impairment we found that higher subjective hearing impairment is associated with higher test score. We did not find any evidence that prelingually deaf individuals show enhanced lipreading skills over people with postlingual acquired hearing impairment. However, we see an interaction with education only in the prelingual deaf, but not in the population with postlingual acquired hearing loss. This points to the fact that there are different factors contributing to enhanced lipreading abilities depending on the onset of hearing impairment (prelingual vs. postlingual). Overall, lipreading skills vary strongly in the general population independent of hearing impairment. Based on our findings we constructed a new and efficient lipreading assessment tool (SaLT) that can be used to test behavioral lipreading abilities in the German speaking population.
Collapse
|
11
|
Aller M, Økland HS, MacGregor LJ, Blank H, Davis MH. Differential Auditory and Visual Phase-Locking Are Observed during Audio-Visual Benefit and Silent Lip-Reading for Speech Perception. J Neurosci 2022; 42:6108-6120. [PMID: 35760528 PMCID: PMC9351641 DOI: 10.1523/jneurosci.2476-21.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/04/2022] [Accepted: 04/12/2022] [Indexed: 11/21/2022] Open
Abstract
Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase-locking to auditory and visual signals in MEG recordings from 14 human participants (6 females, 8 males) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual, or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6 Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase-locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared with audio-only speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus did not show above-chance partial coherence with visual speech signals during AV conditions but did show partial coherence in visual-only conditions. Hence, visual speech enabled stronger phase-locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.SIGNIFICANCE STATEMENT Verbal communication in noisy environments is challenging, especially for hearing-impaired individuals. Seeing facial movements of communication partners improves speech perception when auditory signals are degraded or absent. The neural mechanisms supporting lip-reading or audio-visual benefit are not fully understood. Using MEG recordings and partial coherence analysis, we show that speech information is used differently in brain regions that respond to auditory and visual speech. While visual areas use visual speech to improve phase-locking to auditory speech signals, auditory areas do not show phase-locking to visual speech unless auditory speech is absent and visual speech is used to substitute for missing auditory signals. These findings highlight brain processes that combine visual and auditory signals to support speech understanding.
Collapse
Affiliation(s)
- Máté Aller
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Heidi Solberg Økland
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Lucy J MacGregor
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Helen Blank
- University Medical Center Hamburg-Eppendorf, Hamburg, 20246, Germany
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| |
Collapse
|
12
|
Hauswald A, Keitel A, Chen Y, Rösch S, Weisz N. Degradation levels of continuous speech affect neural speech tracking and alpha power differently. Eur J Neurosci 2022; 55:3288-3302. [PMID: 32687616 PMCID: PMC9540197 DOI: 10.1111/ejn.14912] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 07/12/2020] [Accepted: 07/13/2020] [Indexed: 11/26/2022]
Abstract
Making sense of a poor auditory signal can pose a challenge. Previous attempts to quantify speech intelligibility in neural terms have usually focused on one of two measures, namely low-frequency speech-brain synchronization or alpha power modulations. However, reports have been mixed concerning the modulation of these measures, an issue aggravated by the fact that they have normally been studied separately. We present two MEG studies analyzing both measures. In study 1, participants listened to unimodal auditory speech with three different levels of degradation (original, 7-channel and 3-channel vocoding). Intelligibility declined with declining clarity, but speech was still intelligible to some extent even for the lowest clarity level (3-channel vocoding). Low-frequency (1-7 Hz) speech tracking suggested a U-shaped relationship with strongest effects for the medium-degraded speech (7-channel) in bilateral auditory and left frontal regions. To follow up on this finding, we implemented three additional vocoding levels (5-channel, 2-channel and 1-channel) in a second MEG study. Using this wider range of degradation, the speech-brain synchronization showed a similar pattern as in study 1, but further showed that when speech becomes unintelligible, synchronization declines again. The relationship differed for alpha power, which continued to decrease across vocoding levels reaching a floor effect for 5-channel vocoding. Predicting subjective intelligibility based on models either combining both measures or each measure alone showed superiority of the combined model. Our findings underline that speech tracking and alpha power are modified differently by the degree of degradation of continuous speech but together contribute to the subjective speech understanding.
Collapse
Affiliation(s)
- Anne Hauswald
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Anne Keitel
- Psychology, School of Social SciencesUniversity of DundeeDundeeUK
- Centre for Cognitive NeuroimagingUniversity of GlasgowGlasgowUK
| | - Ya‐Ping Chen
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Sebastian Rösch
- Department of OtorhinolaryngologyParacelsus Medical UniversitySalzburgAustria
| | - Nathan Weisz
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| |
Collapse
|
13
|
Zhang L, Du Y. Lip movements enhance speech representations and effective connectivity in auditory dorsal stream. Neuroimage 2022; 257:119311. [PMID: 35589000 DOI: 10.1016/j.neuroimage.2022.119311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/09/2022] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Viewing speaker's lip movements facilitates speech perception, especially under adverse listening conditions, but the neural mechanisms of this perceptual benefit at the phonemic and feature levels remain unclear. This fMRI study addressed this question by quantifying regional multivariate representation and network organization underlying audiovisual speech-in-noise perception. Behaviorally, valid lip movements improved recognition of place of articulation to aid phoneme identification. Meanwhile, lip movements enhanced neural representations of phonemes in left auditory dorsal stream regions, including frontal speech motor areas and supramarginal gyrus (SMG). Moreover, neural representations of place of articulation and voicing features were promoted differentially by lip movements in these regions, with voicing enhanced in Broca's area while place of articulation better encoded in left ventral premotor cortex and SMG. Next, dynamic causal modeling (DCM) analysis showed that such local changes were accompanied by strengthened effective connectivity along the dorsal stream. Moreover, the neurite orientation dispersion of the left arcuate fasciculus, the bearing skeleton of auditory dorsal stream, predicted the visual enhancements of neural representations and effective connectivity. Our findings provide novel insight to speech science that lip movements promote both local phonemic and feature encoding and network connectivity in the dorsal pathway and the functional enhancement is mediated by the microstructural architecture of the circuit.
Collapse
Affiliation(s)
- Lei Zhang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049
| | - Yi Du
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China 200031; Chinese Institute for Brain Research, Beijing, China 102206.
| |
Collapse
|
14
|
Haider CL, Suess N, Hauswald A, Park H, Weisz N. Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker. Neuroimage 2022; 252:119044. [PMID: 35240298 DOI: 10.1016/j.neuroimage.2022.119044] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/26/2022] [Accepted: 02/27/2022] [Indexed: 11/29/2022] Open
Abstract
Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.
Collapse
Affiliation(s)
- Chandra Leon Haider
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria.
| | - Nina Suess
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Hyojin Park
- School of Psychology & Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham, UK
| | - Nathan Weisz
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
15
|
Holroyd CB. Interbrain synchrony: on wavy ground. Trends Neurosci 2022; 45:346-357. [PMID: 35236639 DOI: 10.1016/j.tins.2022.02.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 01/08/2022] [Accepted: 02/04/2022] [Indexed: 12/15/2022]
Abstract
In recent years the study of dynamic, between-brain coupling mechanisms has taken social neuroscience by storm. In particular, interbrain synchrony (IBS) is a putative neural mechanism said to promote social interactions by enabling the functional integration of multiple brains. In this article, I argue that this research is beset with three pervasive and interrelated problems. First, the field lacks a widely accepted definition of IBS. Second, IBS wants for theories that can guide the design and interpretation of experiments. Third, a potpourri of tasks and empirical methods permits undue flexibility when testing the hypothesis. These factors synergistically undermine IBS as a theoretical construct. I finish by recommending measures that can address these issues.
Collapse
Affiliation(s)
- Clay B Holroyd
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000 Gent, Belgium.
| |
Collapse
|
16
|
Expertise Modulates Neural Stimulus-Tracking. eNeuro 2021; 8:ENEURO.0065-21.2021. [PMID: 34341067 PMCID: PMC8371925 DOI: 10.1523/eneuro.0065-21.2021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 06/14/2021] [Accepted: 06/16/2021] [Indexed: 11/21/2022] Open
Abstract
How does the brain anticipate information in language? When people perceive speech, low-frequency (<10 Hz) activity in the brain synchronizes with bursts of sound and visual motion. This phenomenon, called cortical stimulus-tracking, is thought to be one way that the brain predicts the timing of upcoming words, phrases, and syllables. In this study, we test whether stimulus-tracking depends on domain-general expertise or on language-specific prediction mechanisms. We go on to examine how the effects of expertise differ between frontal and sensory cortex. We recorded electroencephalography (EEG) from human participants who were experts in either sign language or ballet, and we compared stimulus-tracking between groups while participants watched videos of sign language or ballet. We measured stimulus-tracking by computing coherence between EEG recordings and visual motion in the videos. Results showed that stimulus-tracking depends on domain-general expertise, and not on language-specific prediction mechanisms. At frontal channels, fluent signers showed stronger coherence to sign language than to dance, whereas expert dancers showed stronger coherence to dance than to sign language. At occipital channels, however, the two groups of participants did not show different patterns of coherence. These results are difficult to explain by entrainment of endogenous oscillations, because neither sign language nor dance show any periodicity at the frequencies of significant expertise-dependent stimulus-tracking. These results suggest that the brain may rely on domain-general predictive mechanisms to optimize perception of temporally-predictable stimuli such as speech, sign language, and dance.
Collapse
|
17
|
O'Sullivan AE, Crosse MJ, Liberto GMD, de Cheveigné A, Lalor EC. Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects. J Neurosci 2021; 41:4991-5003. [PMID: 33824190 PMCID: PMC8197638 DOI: 10.1523/jneurosci.0906-20.2021] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 12/27/2022] Open
Abstract
Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Michael J Crosse
- X, The Moonshot Factory, Mountain View, CA and Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
- University College London Ear Institute, University College London, London WC1X 8EE, United Kingdom
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering and Department of Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
18
|
Auditory detection is modulated by theta phase of silent lip movements. CURRENT RESEARCH IN NEUROBIOLOGY 2021; 2:100014. [PMID: 36246505 PMCID: PMC9559921 DOI: 10.1016/j.crneur.2021.100014] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Revised: 05/12/2021] [Accepted: 05/19/2021] [Indexed: 11/23/2022] Open
Abstract
Audiovisual speech perception relies, among other things, on our expertise to map a speaker's lip movements with speech sounds. This multimodal matching is facilitated by salient syllable features that align lip movements and acoustic envelope signals in the 4–8 Hz theta band. Although non-exclusive, the predominance of theta rhythms in speech processing has been firmly established by studies showing that neural oscillations track the acoustic envelope in the primary auditory cortex. Equivalently, theta oscillations in the visual cortex entrain to lip movements, and the auditory cortex is recruited during silent speech perception. These findings suggest that neuronal theta oscillations may play a functional role in organising information flow across visual and auditory sensory areas. We presented silent speech movies while participants performed a pure tone detection task to test whether entrainment to lip movements directs the auditory system and drives behavioural outcomes. We showed that auditory detection varied depending on the ongoing theta phase conveyed by lip movements in the movies. In a complementary experiment presenting the same movies while recording participants' electro-encephalogram (EEG), we found that silent lip movements entrained neural oscillations in the visual and auditory cortices with the visual phase leading the auditory phase. These results support the idea that the visual cortex entrained by lip movements filtered the sensitivity of the auditory cortex via theta phase synchronization. Subjects entrain to visual activity conveyed by speakers' lip movements. Visual entrainment modulates auditory perception and performances. Silent lips perception recruits both visual and auditory cortices. Visual and auditory cortices synchronize via theta phase coupling.
Collapse
|
19
|
Visual speech cues recruit neural oscillations to optimise auditory perception: Ways forward for research on human communication. CURRENT RESEARCH IN NEUROBIOLOGY 2021; 2:100015. [PMID: 36246513 PMCID: PMC9559900 DOI: 10.1016/j.crneur.2021.100015] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 06/21/2021] [Accepted: 06/21/2021] [Indexed: 11/22/2022] Open
|
20
|
Michon M, Boncompte G, López V. Electrophysiological Dynamics of Visual Speech Processing and the Role of Orofacial Effectors for Cross-Modal Predictions. Front Hum Neurosci 2020; 14:538619. [PMID: 33192386 PMCID: PMC7653187 DOI: 10.3389/fnhum.2020.538619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
The human brain generates predictions about future events. During face-to-face conversations, visemic information is used to predict upcoming auditory input. Recent studies suggest that the speech motor system plays a role in these cross-modal predictions, however, usually only audio-visual paradigms are employed. Here we tested whether speech sounds can be predicted on the basis of visemic information only, and to what extent interfering with orofacial articulatory effectors can affect these predictions. We registered EEG and employed N400 as an index of such predictions. Our results show that N400's amplitude was strongly modulated by visemic salience, coherent with cross-modal speech predictions. Additionally, N400 ceased to be evoked when syllables' visemes were presented backwards, suggesting that predictions occur only when the observed viseme matched an existing articuleme in the observer's speech motor system (i.e., the articulatory neural sequence required to produce a particular phoneme/viseme). Importantly, we found that interfering with the motor articulatory system strongly disrupted cross-modal predictions. We also observed a late P1000 that was evoked only for syllable-related visual stimuli, but whose amplitude was not modulated by interfering with the motor system. The present study provides further evidence of the importance of the speech production system for speech sounds predictions based on visemic information at the pre-lexical level. The implications of these results are discussed in the context of a hypothesized trimodal repertoire for speech, in which speech perception is conceived as a highly interactive process that involves not only your ears but also your eyes, lips and tongue.
Collapse
Affiliation(s)
- Maëva Michon
- Laboratorio de Neurociencia Cognitiva y Evolutiva, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
- Laboratorio de Neurociencia Cognitiva y Social, Facultad de Psicología, Universidad Diego Portales, Santiago, Chile
| | - Gonzalo Boncompte
- Laboratorio de Neurodinámicas de la Cognición, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Vladimir López
- Laboratorio de Psicología Experimental, Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
21
|
Destoky F, Bertels J, Niesen M, Wens V, Vander Ghinst M, Leybaert J, Lallier M, Ince RAA, Gross J, De Tiège X, Bourguignon M. Cortical tracking of speech in noise accounts for reading strategies in children. PLoS Biol 2020; 18:e3000840. [PMID: 32845876 PMCID: PMC7478533 DOI: 10.1371/journal.pbio.3000840] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 09/08/2020] [Accepted: 08/12/2020] [Indexed: 11/29/2022] Open
Abstract
Humans' propensity to acquire literacy relates to several factors, including the ability to understand speech in noise (SiN). Still, the nature of the relation between reading and SiN perception abilities remains poorly understood. Here, we dissect the interplay between (1) reading abilities, (2) classical behavioral predictors of reading (phonological awareness, phonological memory, and rapid automatized naming), and (3) electrophysiological markers of SiN perception in 99 elementary school children (26 with dyslexia). We demonstrate that, in typical readers, cortical representation of the phrasal content of SiN relates to the degree of development of the lexical (but not sublexical) reading strategy. In contrast, classical behavioral predictors of reading abilities and the ability to benefit from visual speech to represent the syllabic content of SiN account for global reading performance (i.e., speed and accuracy of lexical and sublexical reading). In individuals with dyslexia, we found preserved integration of visual speech information to optimize processing of syntactic information but not to sustain acoustic/phonemic processing. Finally, within children with dyslexia, measures of cortical representation of the phrasal content of SiN were negatively related to reading speed and positively related to the compromise between reading precision and reading speed, potentially owing to compensatory attentional mechanisms. These results clarify the nature of the relation between SiN perception and reading abilities in typical child readers and children with dyslexia and identify novel electrophysiological markers of emergent literacy.
Collapse
Affiliation(s)
- Florian Destoky
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Julie Bertels
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Consciousness, Cognition and Computation group, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Maxime Niesen
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Service d'ORL et de chirurgie cervico-faciale, ULB-Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Vincent Wens
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Department of Functional Neuroimaging, Service of Nuclear Medicine, CUB Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Marc Vander Ghinst
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Jacqueline Leybaert
- Laboratoire Cognition Langage et Développement, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Marie Lallier
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain
| | - Robin A. A. Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
- Institute for Biomagnetism and Biosignal analysis, University of Muenster, Muenster, Germany
| | - Xavier De Tiège
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Department of Functional Neuroimaging, Service of Nuclear Medicine, CUB Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Mathieu Bourguignon
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- Laboratoire Cognition Langage et Développement, UNI–ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels, Belgium
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Spain
| |
Collapse
|
22
|
Maffei V, Indovina I, Mazzarella E, Giusti MA, Macaluso E, Lacquaniti F, Viviani P. Sensitivity of occipito-temporal cortex, premotor and Broca's areas to visible speech gestures in a familiar language. PLoS One 2020; 15:e0234695. [PMID: 32559213 PMCID: PMC7304574 DOI: 10.1371/journal.pone.0234695] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 06/01/2020] [Indexed: 11/18/2022] Open
Abstract
When looking at a speaking person, the analysis of facial kinematics contributes to language discrimination and to the decoding of the time flow of visual speech. To disentangle these two factors, we investigated behavioural and fMRI responses to familiar and unfamiliar languages when observing speech gestures with natural or reversed kinematics. Twenty Italian volunteers viewed silent video-clips of speech shown as recorded (Forward, biological motion) or reversed in time (Backward, non-biological motion), in Italian (familiar language) or Arabic (non-familiar language). fMRI revealed that language (Italian/Arabic) and time-rendering (Forward/Backward) modulated distinct areas in the ventral occipito-temporal cortex, suggesting that visual speech analysis begins in this region, earlier than previously thought. Left premotor ventral (superior subdivision) and dorsal areas were preferentially activated with the familiar language independently of time-rendering, challenging the view that the role of these regions in speech processing is purely articulatory. The left premotor ventral region in the frontal operculum, thought to include part of the Broca's area, responded to the natural familiar language, consistent with the hypothesis of motor simulation of speech gestures.
Collapse
Affiliation(s)
- Vincenzo Maffei
- Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy.,Centre of Space BioMedicine and Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy.,Data Lake & BI, DOT - Technology, Poste Italiane, Rome, Italy
| | - Iole Indovina
- Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy.,Departmental Faculty of Medicine and Surgery, Saint Camillus International University of Health and Medical Sciences, Rome, Italy
| | | | - Maria Assunta Giusti
- Centre of Space BioMedicine and Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy
| | - Emiliano Macaluso
- ImpAct Team, Lyon Neuroscience Research Center, Lyon, France.,Laboratory of Neuroimaging, IRCCS Santa Lucia Foundation, Rome, Italy
| | - Francesco Lacquaniti
- Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy.,Centre of Space BioMedicine and Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy
| | - Paolo Viviani
- Laboratory of Neuromotor Physiology, IRCCS Santa Lucia Foundation, Rome, Italy.,Centre of Space BioMedicine and Department of Systems Medicine, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
23
|
Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech. J Neurosci 2019; 40:1053-1065. [PMID: 31889007 DOI: 10.1523/jneurosci.1101-19.2019] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 11/28/2019] [Accepted: 12/04/2019] [Indexed: 11/21/2022] Open
Abstract
Lip-reading is crucial for understanding speech in challenging conditions. But how the brain extracts meaning from, silent, visual speech is still under debate. Lip-reading in silence activates the auditory cortices, but it is not known whether such activation reflects immediate synthesis of the corresponding auditory stimulus or imagery of unrelated sounds. To disentangle these possibilities, we used magnetoencephalography to evaluate how cortical activity in 28 healthy adult humans (17 females) entrained to the auditory speech envelope and lip movements (mouth opening) when listening to a spoken story without visual input (audio-only), and when seeing a silent video of a speaker articulating another story (video-only). In video-only, auditory cortical activity entrained to the absent auditory signal at frequencies <1 Hz more than to the seen lip movements. This entrainment process was characterized by an auditory-speech-to-brain delay of ∼70 ms in the left hemisphere, compared with ∼20 ms in audio-only. Entrainment to mouth opening was found in the right angular gyrus at <1 Hz, and in early visual cortices at 1-8 Hz. These findings demonstrate that the brain can use a silent lip-read signal to synthesize a coarse-grained auditory speech representation in early auditory cortices. Our data indicate the following underlying oscillatory mechanism: seeing lip movements first modulates neuronal activity in early visual cortices at frequencies that match articulatory lip movements; the right angular gyrus then extracts slower features of lip movements, mapping them onto the corresponding speech sound features; this information is fed to auditory cortices, most likely facilitating speech parsing.SIGNIFICANCE STATEMENT Lip-reading consists in decoding speech based on visual information derived from observation of a speaker's articulatory facial gestures. Lip-reading is known to improve auditory speech understanding, especially when speech is degraded. Interestingly, lip-reading in silence still activates the auditory cortices, even when participants do not know what the absent auditory signal should be. However, it was uncertain what such activation reflected. Here, using magnetoencephalographic recordings, we demonstrate that it reflects fast synthesis of the auditory stimulus rather than mental imagery of unrelated, speech or non-speech, sounds. Our results also shed light on the oscillatory dynamics underlying lip-reading.
Collapse
|
24
|
Spatial attention enhances cortical tracking of quasi-rhythmic visual stimuli. Neuroimage 2019; 208:116444. [PMID: 31816422 DOI: 10.1016/j.neuroimage.2019.116444] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 11/06/2019] [Accepted: 12/04/2019] [Indexed: 11/22/2022] Open
Abstract
Successfully interpreting and navigating our natural visual environment requires us to track its dynamics constantly. Additionally, we focus our attention on behaviorally relevant stimuli to enhance their neural processing. Little is known, however, about how sustained attention affects the ongoing tracking of stimuli with rich natural temporal dynamics. Here, we used MRI-informed source reconstructions of magnetoencephalography (MEG) data to map to what extent various cortical areas track concurrent continuous quasi-rhythmic visual stimulation. Further, we tested how top-down visuo-spatial attention influences this tracking process. Our bilaterally presented quasi-rhythmic stimuli covered a dynamic range of 4-20 Hz, subdivided into three distinct bands. As an experimental control, we also included strictly rhythmic stimulation (10 vs 12 Hz). Using a spectral measure of brain-stimulus coupling, we were able to track the neural processing of left vs. right stimuli independently, even while fluctuating within the same frequency range. The fidelity of neural tracking depended on the stimulation frequencies, decreasing for higher frequency bands. Both attended and non-attended stimuli were tracked beyond early visual cortices, in ventral and dorsal streams depending on the stimulus frequency. In general, tracking improved with the deployment of visuo-spatial attention to the stimulus location. Our results provide new insights into how human visual cortices process concurrent dynamic stimuli and provide a potential mechanism - namely increasing the temporal precision of tracking - for boosting the neural representation of attended input.
Collapse
|
25
|
Obleser J, Kayser C. Neural Entrainment and Attentional Selection in the Listening Brain. Trends Cogn Sci 2019; 23:913-926. [PMID: 31606386 DOI: 10.1016/j.tics.2019.08.004] [Citation(s) in RCA: 181] [Impact Index Per Article: 36.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 08/16/2019] [Accepted: 08/20/2019] [Indexed: 01/07/2023]
Abstract
The streams of sounds we typically attend to abound in acoustic regularities. Neural entrainment is seen as an important mechanism that the listening brain exploits to attune to these regularities and to enhance the representation of attended sounds. We delineate the neurophysiology underlying this mechanism and review entrainment alongside its more pragmatic signature, often called 'speech tracking'. The latter has become a popular analytical approach to trace the reflection of acoustic and linguistic information at different levels of granularity, from neurophysiology to neuroimaging. As we discuss, the concept of entrainment offers both a putative neurophysiological mechanism for selective listening and a versatile window onto the neural basis of hearing and speech comprehension.
Collapse
Affiliation(s)
- Jonas Obleser
- Department of Psychology, University of Lübeck, 23562 Lübeck, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience and Cognitive Interaction Technology, Center of Excellence, Bielefeld University, 33615 Bielefeld, Germany.
| |
Collapse
|
26
|
O'Sullivan AE, Lim CY, Lalor EC. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations. Eur J Neurosci 2019; 50:3282-3295. [PMID: 31013361 DOI: 10.1111/ejn.14425] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 03/25/2019] [Accepted: 04/17/2019] [Indexed: 11/30/2022]
Abstract
Recent work using electroencephalography has applied stimulus reconstruction techniques to identify the attended speaker in a cocktail party environment. The success of these approaches has been primarily based on the ability to detect cortical tracking of the acoustic envelope at the scalp level. However, most studies have ignored the effects of visual input, which is almost always present in naturalistic scenarios. In this study, we investigated the effects of visual input on envelope-based cocktail party decoding in two multisensory cocktail party situations: (a) Congruent AV-facing the attended speaker while ignoring another speaker represented by the audio-only stream and (b) Incongruent AV (eavesdropping)-attending the audio-only speaker while looking at the unattended speaker. We trained and tested decoders for each condition separately and found that we can successfully decode attention to congruent audiovisual speech and can also decode attention when listeners were eavesdropping, i.e., looking at the face of the unattended talker. In addition to this, we found alpha power to be a reliable measure of attention to the visual speech. Using parieto-occipital alpha power, we found that we can distinguish whether subjects are attending or ignoring the speaker's face. Considering the practical applications of these methods, we demonstrate that with only six near-ear electrodes we can successfully determine the attended speech. This work extends the current framework for decoding attention to speech to more naturalistic scenarios, and in doing so provides additional neural measures which may be incorporated to improve decoding accuracy.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Chantelle Y Lim
- Department of Biomedical Engineering, University of Rochester, Rochester, New York
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.,Department of Biomedical Engineering, University of Rochester, Rochester, New York.,Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York
| |
Collapse
|
27
|
Keitel C, Keitel A, Benwell CSY, Daube C, Thut G, Gross J. Stimulus-Driven Brain Rhythms within the Alpha Band: The Attentional-Modulation Conundrum. J Neurosci 2019; 39:3119-3129. [PMID: 30770401 PMCID: PMC6468105 DOI: 10.1523/jneurosci.1633-18.2019] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 01/16/2019] [Accepted: 02/03/2019] [Indexed: 01/23/2023] Open
Abstract
Two largely independent research lines use rhythmic sensory stimulation to study visual processing. Despite the use of strikingly similar experimental paradigms, they differ crucially in their notion of the stimulus-driven periodic brain responses: one regards them mostly as synchronized (entrained) intrinsic brain rhythms; the other assumes they are predominantly evoked responses [classically termed steady-state responses (SSRs)] that add to the ongoing brain activity. This conceptual difference can produce contradictory predictions about, and interpretations of, experimental outcomes. The effect of spatial attention on brain rhythms in the alpha band (8-13 Hz) is one such instance: alpha-range SSRs have typically been found to increase in power when participants focus their spatial attention on laterally presented stimuli, in line with a gain control of the visual evoked response. In nearly identical experiments, retinotopic decreases in entrained alpha-band power have been reported, in line with the inhibitory function of intrinsic alpha. Here we reconcile these contradictory findings by showing that they result from a small but far-reaching difference between two common approaches to EEG spectral decomposition. In a new analysis of previously published human EEG data, recorded during bilateral rhythmic visual stimulation, we find the typical SSR gain effect when emphasizing stimulus-locked neural activity and the typical retinotopic alpha suppression when focusing on ongoing rhythms. These opposite but parallel effects suggest that spatial attention may bias the neural processing of dynamic visual stimulation via two complementary neural mechanisms.SIGNIFICANCE STATEMENT Attending to a visual stimulus strengthens its representation in visual cortex and leads to a retinotopic suppression of spontaneous alpha rhythms. To further investigate this process, researchers often attempt to phase lock, or entrain, alpha through rhythmic visual stimulation under the assumption that this entrained alpha retains the characteristics of spontaneous alpha. Instead, we show that the part of the brain response that is phase locked to the visual stimulation increased with attention (as do steady-state evoked potentials), while the typical suppression was only present in non-stimulus-locked alpha activity. The opposite signs of these effects suggest that attentional modulation of dynamic visual stimulation relies on two parallel cortical mechanisms-retinotopic alpha suppression and increased temporal tracking.
Collapse
Affiliation(s)
- Christian Keitel
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QB, UK,
| | - Anne Keitel
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QB, UK
- Psychology, School of Social Sciences, University of Dundee, Dundee DD1 4HN, UK, and
| | - Christopher S Y Benwell
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QB, UK
- Psychology, School of Social Sciences, University of Dundee, Dundee DD1 4HN, UK, and
| | - Christoph Daube
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QB, UK
| | - Gregor Thut
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QB, UK
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QB, UK
- Institut für Biomagnetismus und Biosignalanalyse, Westfälische Wilhelms-Universität, 48149 Münster, Germany
| |
Collapse
|
28
|
Origin and evolution of human speech: Emergence from a trimodal auditory, visual and vocal network. PROGRESS IN BRAIN RESEARCH 2019; 250:345-371. [PMID: 31703907 DOI: 10.1016/bs.pbr.2019.01.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In recent years, there have been important additions to the classical model of speech processing as originally depicted by the Broca-Wernicke model consisting of an anterior, productive region and a posterior, perceptive region, both connected via the arcuate fasciculus. The modern view implies a separation into a dorsal and a ventral pathway conveying different kinds of linguistic information, which parallels the organization of the visual system. Furthermore, this organization is highly conserved in evolution and can be seen as the neural scaffolding from which the speech networks originated. In this chapter we emphasize that the speech networks are embedded in a multimodal system encompassing audio-vocal and visuo-vocal connections, which can be referred to an ancestral audio-visuo-motor pathway present in nonhuman primates. Likewise, we propose a trimodal repertoire for speech processing and acquisition involving auditory, visual and motor representations of the basic elements of speech: phoneme, observation of mouth movements, and articulatory processes. Finally, we discuss this proposal in the context of a scenario for early speech acquisition in infants and in human evolution.
Collapse
|