1
|
Doll L, Dykstra AR, Gutschalk A. Perceptual awareness of near-threshold tones scales gradually with auditory cortex activity and pupil dilation. iScience 2024; 27:110530. [PMID: 39175766 PMCID: PMC11338958 DOI: 10.1016/j.isci.2024.110530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 04/16/2024] [Accepted: 07/15/2024] [Indexed: 08/24/2024] Open
Abstract
Negative-going responses in sensory cortex co-vary with perceptual awareness of sensory stimuli. Given that this awareness negativity has also been observed for undetected stimuli, some have challenged its role for perception. To address this question, we combined magnetoencephalography, electroencephalography, and pupillometry to study how sustained attention and response criterion affect the auditory awareness negativity. Participants first detected distractor sounds and denied hearing task-irrelevant near-threshold tones, which evoked neither awareness negativity nor pupil dilation. These same tones evoked both responses when task-relevant, stronger for hit but also present for miss trials. Participants then rated their perception on a six-point scale to test whether response criterion explains the presence of these responses for miss trials. Decreasing perception ratings were associated with gradually reduced evoked responses, consistent with signal detection theory. These results support the concept of an awareness negativity that is modulated by attention but does not require a non-linear threshold mechanism.
Collapse
Affiliation(s)
- Laura Doll
- Department of Neurology, Ruprecht-Karls-Universität Heidelberg, 69120 Heidelberg, Germany
| | - Andrew R. Dykstra
- School of Communication Sciences and Disorders, University of Central Florida, Orlando, FL, USA
| | - Alexander Gutschalk
- Department of Neurology, Ruprecht-Karls-Universität Heidelberg, 69120 Heidelberg, Germany
| |
Collapse
|
2
|
Chalas N, Meyer L, Lo CW, Park H, Kluger DS, Abbasi O, Kayser C, Nitsch R, Gross J. Dissociating prosodic from syntactic delta activity during natural speech comprehension. Curr Biol 2024; 34:3537-3549.e5. [PMID: 39047734 DOI: 10.1016/j.cub.2024.06.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/24/2024] [Accepted: 06/27/2024] [Indexed: 07/27/2024]
Abstract
Decoding human speech requires the brain to segment the incoming acoustic signal into meaningful linguistic units, ranging from syllables and words to phrases. Integrating these linguistic constituents into a coherent percept sets the root of compositional meaning and hence understanding. One important cue for segmentation in natural speech is prosodic cues, such as pauses, but their interplay with higher-level linguistic processing is still unknown. Here, we dissociate the neural tracking of prosodic pauses from the segmentation of multi-word chunks using magnetoencephalography (MEG). We find that manipulating the regularity of pauses disrupts slow speech-brain tracking bilaterally in auditory areas (below 2 Hz) and in turn increases left-lateralized coherence of higher-frequency auditory activity at speech onsets (around 25-45 Hz). Critically, we also find that multi-word chunks-defined as short, coherent bundles of inter-word dependencies-are processed through the rhythmic fluctuations of low-frequency activity (below 2 Hz) bilaterally and independently of prosodic cues. Importantly, low-frequency alignment at chunk onsets increases the accuracy of an encoding model in bilateral auditory and frontal areas while controlling for the effect of acoustics. Our findings provide novel insights into the neural basis of speech perception, demonstrating that both acoustic features (prosodic cues) and abstract linguistic processing at the multi-word timescale are underpinned independently by low-frequency electrophysiological brain activity in the delta frequency range.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany; Institute for Translational Neuroscience, University of Münster, Münster, Germany.
| | - Lars Meyer
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Chia-Wen Lo
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Hyojin Park
- Centre for Human Brain Health (CHBH), School of Psychology, University of Birmingham, Birmingham, UK
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
3
|
Liu H, Bai Y, Zheng Q, Liu J, Zhu J, Ni G. Electrophysiological correlation of auditory selective spatial attention in the "cocktail party" situation. Hum Brain Mapp 2024; 45:e26793. [PMID: 39037186 PMCID: PMC11261592 DOI: 10.1002/hbm.26793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/04/2024] [Accepted: 07/09/2024] [Indexed: 07/23/2024] Open
Abstract
The auditory system can selectively attend to the target source in complex environments, the phenomenon known as the "cocktail party" effect. However, the spatiotemporal dynamics of electrophysiological activity associated with auditory selective spatial attention (ASSA) remain largely unexplored. In this study, single-source and multiple-source paradigms were designed to simulate different auditory environments, and microstate analysis was introduced to reveal the electrophysiological correlates of ASSA. Furthermore, cortical source analysis was employed to reveal the neural activity regions of these microstates. The results showed that five microstates could explain the spatiotemporal dynamics of ASSA, ranging from MS1 to MS5. Notably, MS2 and MS3 showed significantly lower partial properties in multiple-source situations than in single-source situations, whereas MS4 had shorter durations and MS5 longer durations in multiple-source situations than in single-source situations. MS1 had insignificant differences between the two situations. Cortical source analysis showed that the activation regions of these microstates initially transferred from the right temporal cortex to the temporal-parietal cortex, and subsequently to the dorsofrontal cortex. Moreover, the neural activity of the single-source situations was greater than that of the multiple-source situations in MS2 and MS3, correlating with the N1 and P2 components, with the greatest differences observed in the superior temporal gyrus and inferior parietal lobule. These findings suggest that these specific microstates and their associated activation regions may serve as promising substrates for decoding ASSA in complex environments.
Collapse
Affiliation(s)
- Hongxing Liu
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
| | - Yanru Bai
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
- Haihe Laboratory of Brain‐computer Interaction and Human‐machine IntegrationTianjinChina
| | - Qi Zheng
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
| | - Jihan Liu
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
| | - Jianing Zhu
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
| | - Guangjian Ni
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
- Haihe Laboratory of Brain‐computer Interaction and Human‐machine IntegrationTianjinChina
- Tianjin Key Laboratory of Brain Science and NeuroengineeringTianjinChina
| |
Collapse
|
4
|
Pérez-Navarro J, Klimovich-Gray A, Lizarazu M, Piazza G, Molinaro N, Lallier M. Early language experience modulates the tradeoff between acoustic-temporal and lexico-semantic cortical tracking of speech. iScience 2024; 27:110247. [PMID: 39006483 PMCID: PMC11246002 DOI: 10.1016/j.isci.2024.110247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 03/14/2024] [Accepted: 06/07/2024] [Indexed: 07/16/2024] Open
Abstract
Cortical tracking of speech is relevant for the development of speech perception skills. However, no study to date has explored whether and how cortical tracking of speech is shaped by accumulated language experience, the central question of this study. In 35 bilingual children (6-year-old) with considerably bigger experience in one language, we collected electroencephalography data while they listened to continuous speech in their two languages. Cortical tracking of speech was assessed at acoustic-temporal and lexico-semantic levels. Children showed more robust acoustic-temporal tracking in the least experienced language, and more sensitive cortical tracking of semantic information in the most experienced language. Additionally, and only for the most experienced language, acoustic-temporal tracking was specifically linked to phonological abilities, and lexico-semantic tracking to vocabulary knowledge. Our results indicate that accumulated linguistic experience is a relevant maturational factor for the cortical tracking of speech at different levels during early language acquisition.
Collapse
Affiliation(s)
- Jose Pérez-Navarro
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
| | | | - Mikel Lizarazu
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
| | - Giorgio Piazza
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
| | - Nicola Molinaro
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
- Ikerbasque, Basque Foundation for Science, 48009 Bilbao, Spain
| | - Marie Lallier
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
| |
Collapse
|
5
|
Vissani M, Bush A, Lipski WJ, Fischer P, Neudorfer C, Holt LL, Fiez JA, Turner RS, Richardson RM. Spike-phase coupling of subthalamic neurons to posterior opercular cortex predicts speech sound accuracy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.18.562969. [PMID: 37905141 PMCID: PMC10614892 DOI: 10.1101/2023.10.18.562969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Speech provides a rich context for understanding how cortical interactions with the basal ganglia contribute to unique human behaviors, but opportunities for direct intracranial recordings across cortical-basal ganglia networks are rare. We recorded electrocorticographic signals in the cortex synchronously with single units in the basal ganglia during awake neurosurgeries where subjects spoke syllable repetitions. We discovered that individual STN neurons have transient (200ms) spike-phase coupling (SPC) events with multiple cortical regions. The spike timing of STN neurons was coordinated with the phase of theta-alpha oscillations in the posterior supramarginal and superior temporal gyrus during speech planning and production. Speech sound errors occurred when this STN-cortical interaction was delayed. Our results suggest that the STN supports mechanisms of speech planning and auditory-sensorimotor integration during speech production that are required to achieve high fidelity of the phonological and articulatory representation of the target phoneme. These findings establish a framework for understanding cortical-basal ganglia interaction in other human behaviors, and additionally indicate that firing-rate based models are insufficient for explaining basal ganglia circuit behavior.
Collapse
Affiliation(s)
- Matteo Vissani
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, 02114, USA
- Harvard Medical School, Boston, MA, 02115, USA
| | - Alan Bush
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, 02114, USA
- Harvard Medical School, Boston, MA, 02115, USA
| | - Witold J. Lipski
- Department of Neurobiology, Systems Neuroscience Center and Center for Neuroscience, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Petra Fischer
- School of Physiology, Pharmacology & Neuroscience, University of Bristol, University Walk, BS8 1TD Bristol, United Kingdom
| | - Clemens Neudorfer
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, 02114, USA
- Harvard Medical School, Boston, MA, 02115, USA
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, Austin, TX 78712 USA
| | - Julie A. Fiez
- Department of Psychology, University of Pittsburgh, Pittsburgh 15260, PA, USA
| | - Robert S. Turner
- Department of Neurobiology, Systems Neuroscience Center and Center for Neuroscience, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - R. Mark Richardson
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, 02114, USA
- Harvard Medical School, Boston, MA, 02115, USA
| |
Collapse
|
6
|
Kries J, De Clercq P, Gillis M, Vanthornhout J, Lemmens R, Francart T, Vandermosten M. Exploring neural tracking of acoustic and linguistic speech representations in individuals with post-stroke aphasia. Hum Brain Mapp 2024; 45:e26676. [PMID: 38798131 PMCID: PMC11128780 DOI: 10.1002/hbm.26676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/04/2024] [Accepted: 03/21/2024] [Indexed: 05/29/2024] Open
Abstract
Aphasia is a communication disorder that affects processing of language at different levels (e.g., acoustic, phonological, semantic). Recording brain activity via Electroencephalography while people listen to a continuous story allows to analyze brain responses to acoustic and linguistic properties of speech. When the neural activity aligns with these speech properties, it is referred to as neural tracking. Even though measuring neural tracking of speech may present an interesting approach to studying aphasia in an ecologically valid way, it has not yet been investigated in individuals with stroke-induced aphasia. Here, we explored processing of acoustic and linguistic speech representations in individuals with aphasia in the chronic phase after stroke and age-matched healthy controls. We found decreased neural tracking of acoustic speech representations (envelope and envelope onsets) in individuals with aphasia. In addition, word surprisal displayed decreased amplitudes in individuals with aphasia around 195 ms over frontal electrodes, although this effect was not corrected for multiple comparisons. These results show that there is potential to capture language processing impairments in individuals with aphasia by measuring neural tracking of continuous speech. However, more research is needed to validate these results. Nonetheless, this exploratory study shows that neural tracking of naturalistic, continuous speech presents a powerful approach to studying aphasia.
Collapse
Affiliation(s)
- Jill Kries
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
- Department of PsychologyStanford UniversityStanfordCaliforniaUSA
| | - Pieter De Clercq
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| | - Marlies Gillis
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| | - Jonas Vanthornhout
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| | - Robin Lemmens
- Experimental Neurology, Department of NeurosciencesKU LeuvenLeuvenBelgium
- Laboratory of Neurobiology, VIB‐KU Leuven Center for Brain and Disease ResearchLeuvenBelgium
- Department of NeurologyUniversity Hospitals LeuvenLeuvenBelgium
| | - Tom Francart
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| | - Maaike Vandermosten
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| |
Collapse
|
7
|
Levy O, Hackmon SL, Zvilichovsky Y, Korisky A, Bidet-Caulet A, Schweitzer JB, Golumbic EZ. Neurophysiological Patterns of Attention and Distraction during Realistic Virtual-Reality Classroom Learning in Adults with and without ADHD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.17.590012. [PMID: 38659916 PMCID: PMC11042341 DOI: 10.1101/2024.04.17.590012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Many people, and particularly those diagnosed with ADHD, report difficulties maintaining attention and proneness to distraction during classroom learning. However, the behavioral, neural and physiological basis of attention in realistic learning contexts is not well understood, since current clinical and scientific tools used for evaluating and quantifying the constructs of "distractibility" and "inattention", are removed from the real-life experience in organic classrooms. Here we introduce a novel Virtual Reality (VR) platform for studying students' brain activity and physiological responses as they immerse in realistic frontal classroom learning. Using this approach, we studied whether adults with and without ADHD (N=49) exhibit differences in neurophysiological metrics associated with sustained attention, such as speech-tracking of the teacher's voice, power of alpha-oscillations and levels of arousal, as well as responses to potential disturbances by background sound-events in the classroom. Under these ecological conditions, we find that adults with ADHD exhibit higher auditory neural response to background sounds relative to their control-peers, which also contributed to explaining variance in the severity of ADHD symptoms, together with higher power of alpha-oscillations and more frequent gaze-shifts around the classroom. These results are in-line with higher sensitivity to irrelevant stimuli in the environment and increased mind-wandering/boredom. At the same time, both groups exhibited similar learning outcomes and showed similar neural tracking of the teacher's speech. This suggests that in this context, attention may not operate as a zero-sum game and that allocating some resources to irrelevant stimuli does not always detract from performing the task at hand. Given the dire need for more objective, dimensional and ecologically-valid measures of attention and its real-life deficits, this work provides new insights into the neurophysiological manifestations of attention and distraction experienced in real-life contexts, while challenging some prevalent notions regarding the nature of attentional challenges experienced by those with ADHD.
Collapse
Affiliation(s)
- Orel Levy
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | | | - Yair Zvilichovsky
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | - Adi Korisky
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | | | - Julie B. Schweitzer
- Department of Psychiatry and Behavioral Sciences, University of California, Davis, Sacramento, CA U.S.A
| | | |
Collapse
|
8
|
EskandariNasab M, Raeisi Z, Lashaki RA, Najafi H. A GRU-CNN model for auditory attention detection using microstate and recurrence quantification analysis. Sci Rep 2024; 14:8861. [PMID: 38632246 PMCID: PMC11024110 DOI: 10.1038/s41598-024-58886-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open
Abstract
Attention as a cognition ability plays a crucial role in perception which helps humans to concentrate on specific objects of the environment while discarding others. In this paper, auditory attention detection (AAD) is investigated using different dynamic features extracted from multichannel electroencephalography (EEG) signals when listeners attend to a target speaker in the presence of a competing talker. To this aim, microstate and recurrence quantification analysis are utilized to extract different types of features that reflect changes in the brain state during cognitive tasks. Then, an optimized feature set is determined by employing the processes of significant feature selection based on classification performance. The classifier model is developed by hybrid sequential learning that employs Gated Recurrent Units (GRU) and Convolutional Neural Network (CNN) into a unified framework for accurate attention detection. The proposed AAD method shows that the selected feature set achieves the most discriminative features for the classification process. Also, it yields the best performance as compared with state-of-the-art AAD approaches from the literature in terms of various measures. The current study is the first to validate the use of microstate and recurrence quantification parameters to differentiate auditory attention using reinforcement learning without access to stimuli.
Collapse
Affiliation(s)
| | - Zahra Raeisi
- Department of Computer Science, University of Fairleigh Dickinson, Vancouver Campus, Vancouver, Canada
| | - Reza Ahmadi Lashaki
- Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Hamidreza Najafi
- Biomedical Engineering Department, School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| |
Collapse
|
9
|
Sergeeva A, Christensen CB, Kidmose P. Towards ASSR-based hearing assessment using natural sounds. J Neural Eng 2024; 21:026045. [PMID: 38579741 DOI: 10.1088/1741-2552/ad3b6b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 04/05/2024] [Indexed: 04/07/2024]
Abstract
Objective. The auditory steady-state response (ASSR) allows estimation of hearing thresholds. The ASSR can be estimated from electroencephalography (EEG) recordings from electrodes positioned on both the scalp and within the ear (ear-EEG). Ear-EEG can potentially be integrated into hearing aids, which would enable automatic fitting of the hearing device in daily life. The conventional stimuli for ASSR-based hearing assessment, such as pure tones and chirps, are monotonous and tiresome, making them inconvenient for repeated use in everyday situations. In this study we investigate the use of natural speech sounds for ASSR estimation.Approach.EEG was recorded from 22 normal hearing subjects from both scalp and ear electrodes. Subjects were stimulated monaurally with 180 min of speech stimulus modified by applying a 40 Hz amplitude modulation (AM) to an octave frequency sub-band centered at 1 kHz. Each 50 ms sub-interval in the AM sub-band was scaled to match one of 10 pre-defined levels (0-45 dB sensation level, 5 dB steps). The apparent latency for the ASSR was estimated as the maximum average cross-correlation between the envelope of the AM sub-band and the recorded EEG and was used to align the EEG signal with the audio signal. The EEG was then split up into sub-epochs of 50 ms length and sorted according to the stimulation level. ASSR was estimated for each level for both scalp- and ear-EEG.Main results. Significant ASSRs with increasing amplitude as a function of presentation level were recorded from both scalp and ear electrode configurations.Significance. Utilizing natural sounds in ASSR estimation offers the potential for electrophysiological hearing assessment that are more comfortable and less fatiguing compared to existing ASSR methods. Combined with ear-EEG, this approach may allow convenient hearing threshold estimation in everyday life, utilizing ambient sounds. Additionally, it may facilitate both initial fitting and subsequent adjustments of hearing aids outside of clinical settings.
Collapse
Affiliation(s)
- Anna Sergeeva
- Department of Electrical and Computer Engineering, Aarhus University, Finlandsgade 22, 8200 Aarhus N, Denmark
| | - Christian Bech Christensen
- Department of Electrical and Computer Engineering, Aarhus University, Finlandsgade 22, 8200 Aarhus N, Denmark
| | - Preben Kidmose
- Department of Electrical and Computer Engineering, Aarhus University, Finlandsgade 22, 8200 Aarhus N, Denmark
| |
Collapse
|
10
|
Simon A, Bech S, Loquet G, Østergaard J. Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. Eur J Neurosci 2024; 59:2059-2074. [PMID: 38303522 DOI: 10.1111/ejn.16265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 11/02/2023] [Accepted: 01/12/2024] [Indexed: 02/03/2024]
Abstract
Linear models are becoming increasingly popular to investigate brain activity in response to continuous and naturalistic stimuli. In the context of auditory perception, these predictive models can be 'encoding', when stimulus features are used to reconstruct brain activity, or 'decoding' when neural features are used to reconstruct the audio stimuli. These linear models are a central component of some brain-computer interfaces that can be integrated into hearing assistive devices (e.g., hearing aids). Such advanced neurotechnologies have been widely investigated when listening to speech stimuli but rarely when listening to music. Recent attempts at neural tracking of music show that the reconstruction performances are reduced compared with speech decoding. The present study investigates the performance of stimuli reconstruction and electroencephalogram prediction (decoding and encoding models) based on the cortical entrainment of temporal variations of the audio stimuli for both music and speech listening. Three hypotheses that may explain differences between speech and music stimuli reconstruction were tested to assess the importance of the speech-specific acoustic and linguistic factors. While the results obtained with encoding models suggest different underlying cortical processing between speech and music listening, no differences were found in terms of reconstruction of the stimuli or the cortical data. The results suggest that envelope-based linear modelling can be used to study both speech and music listening, despite the differences in the underlying cortical mechanisms.
Collapse
Affiliation(s)
- Adèle Simon
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Søren Bech
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Gérard Loquet
- Department of Audiology and Speech Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Jan Østergaard
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
| |
Collapse
|
11
|
Karunathilake IMD, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are modulated by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers of the computations underlying speech perception and comprehension.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| |
Collapse
|
12
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
13
|
McClaskey CM. Neural hyperactivity and altered envelope encoding in the central auditory system: Changes with advanced age and hearing loss. Hear Res 2024; 442:108945. [PMID: 38154191 PMCID: PMC10942735 DOI: 10.1016/j.heares.2023.108945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/04/2023] [Accepted: 12/22/2023] [Indexed: 12/30/2023]
Abstract
Temporal modulations are ubiquitous features of sound signals that are important for auditory perception. The perception of temporal modulations, or temporal processing, is known to decline with aging and hearing loss and negatively impact auditory perception in general and speech recognition specifically. However, neurophysiological literature also provides evidence of exaggerated or enhanced encoding of specifically temporal envelopes in aging and hearing loss, which may arise from changes in inhibitory neurotransmission and neuronal hyperactivity. This review paper describes the physiological changes to the neural encoding of temporal envelopes that have been shown to occur with age and hearing loss and discusses the role of disinhibition and neural hyperactivity in contributing to these changes. Studies in both humans and animal models suggest that aging and hearing loss are associated with stronger neural representations of both periodic amplitude modulation envelopes and of naturalistic speech envelopes, but primarily for low-frequency modulations (<80 Hz). Although the frequency dependence of these results is generally taken as evidence of amplified envelope encoding at the cortex and impoverished encoding at the midbrain and brainstem, there is additional evidence to suggest that exaggerated envelope encoding may also occur subcortically, though only for envelopes with low modulation rates. A better understanding of how temporal envelope encoding is altered in aging and hearing loss, and the contexts in which neural responses are exaggerated/diminished, may aid in the development of interventions, assistive devices, and treatment strategies that work to ameliorate age- and hearing-loss-related auditory perceptual deficits.
Collapse
Affiliation(s)
- Carolyn M McClaskey
- Department of Otolaryngology - Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Ave, MSC 550, Charleston, SC 29425, United States.
| |
Collapse
|
14
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
15
|
Shan T, Cappelloni MS, Maddox RK. Subcortical responses to music and speech are alike while cortical responses diverge. Sci Rep 2024; 14:789. [PMID: 38191488 PMCID: PMC10774448 DOI: 10.1038/s41598-023-50438-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 12/20/2023] [Indexed: 01/10/2024] Open
Abstract
Music and speech are encountered daily and are unique to human beings. Both are transformed by the auditory pathway from an initial acoustical encoding to higher level cognition. Studies of cortex have revealed distinct brain responses to music and speech, but differences may emerge in the cortex or may be inherited from different subcortical encoding. In the first part of this study, we derived the human auditory brainstem response (ABR), a measure of subcortical encoding, to recorded music and speech using two analysis methods. The first method, described previously and acoustically based, yielded very different ABRs between the two sound classes. The second method, however, developed here and based on a physiological model of the auditory periphery, gave highly correlated responses to music and speech. We determined the superiority of the second method through several metrics, suggesting there is no appreciable impact of stimulus class (i.e., music vs speech) on the way stimulus acoustics are encoded subcortically. In this study's second part, we considered the cortex. Our new analysis method resulted in cortical music and speech responses becoming more similar but with remaining differences. The subcortical and cortical results taken together suggest that there is evidence for stimulus-class dependent processing of music and speech at the cortical but not subcortical level.
Collapse
Affiliation(s)
- Tong Shan
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Madeline S Cappelloni
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Ross K Maddox
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA.
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA.
- Center for Visual Science, University of Rochester, Rochester, NY, USA.
- Department of Neuroscience, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
16
|
Ha J, Baek SC, Lim Y, Chung JH. Validation of cost-efficient EEG experimental setup for neural tracking in an auditory attention task. Sci Rep 2023; 13:22682. [PMID: 38114579 PMCID: PMC10730561 DOI: 10.1038/s41598-023-49990-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
When individuals listen to speech, their neural activity phase-locks to the slow temporal rhythm, which is commonly referred to as "neural tracking". The neural tracking mechanism allows for the detection of an attended sound source in a multi-talker situation by decoding neural signals obtained by electroencephalography (EEG), known as auditory attention decoding (AAD). Neural tracking with AAD can be utilized as an objective measurement tool for diverse clinical contexts, and it has potential to be applied to neuro-steered hearing devices. To effectively utilize this technology, it is essential to enhance the accessibility of EEG experimental setup and analysis. The aim of the study was to develop a cost-efficient neural tracking system and validate the feasibility of neural tracking measurement by conducting an AAD task using an offline and real-time decoder model outside the soundproof environment. We devised a neural tracking system capable of conducting AAD experiments using an OpenBCI and Arduino board. Nine participants were recruited to assess the performance of the AAD using the developed system, which involved presenting competing speech signals in an experiment setting without soundproofing. As a result, the offline decoder model demonstrated an average performance of 90%, and real-time decoder model exhibited a performance of 78%. The present study demonstrates the feasibility of implementing neural tracking and AAD using cost-effective devices in a practical environment.
Collapse
Affiliation(s)
- Jiyeon Ha
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, 60322, Frankfurt\ Main, Germany
| | - Yoonseob Lim
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
| | - Jae Ho Chung
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul, 04763, Korea.
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Hanyang University, 222-Wangshimni-ro, Seongdong-gu, Seoul, 133-792, Korea.
| |
Collapse
|
17
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. Front Hum Neurosci 2023; 17:1283206. [PMID: 38162285 PMCID: PMC10754997 DOI: 10.3389/fnhum.2023.1283206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/20/2023] [Indexed: 01/03/2024] Open
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers-an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
| | | | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
18
|
Vanbilsen N, Kotz SA, Rosso M, Leman M, Triccas LT, Feys P, Moumdjian L. Auditory attention measured by EEG in neurological populations: systematic review of literature and meta-analysis. Sci Rep 2023; 13:21064. [PMID: 38030693 PMCID: PMC10687139 DOI: 10.1038/s41598-023-47597-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/16/2023] [Indexed: 12/01/2023] Open
Abstract
Sensorimotor synchronization strategies have been frequently used for gait rehabilitation in different neurological populations. Despite these positive effects on gait, attentional processes required to dynamically attend to the auditory stimuli needs elaboration. Here, we investigate auditory attention in neurological populations compared to healthy controls quantified by EEG recordings. Literature was systematically searched in databases PubMed and Web of Science. Inclusion criteria were investigation of auditory attention quantified by EEG recordings in neurological populations in cross-sectional studies. In total, 35 studies were included, including participants with Parkinson's disease (PD), stroke, Traumatic Brain Injury (TBI), Multiple Sclerosis (MS), Amyotrophic Lateral Sclerosis (ALS). A meta-analysis was performed on P3 amplitude and latency separately to look at the differences between neurological populations and healthy controls in terms of P3 amplitude and latency. Overall, neurological populations showed impairments in auditory processing in terms of magnitude and delay compared to healthy controls. Consideration of individual auditory processes and thereafter selecting and/or designing the auditory structure during sensorimotor synchronization paradigms in neurological physical rehabilitation is recommended.
Collapse
Affiliation(s)
- Nele Vanbilsen
- Universitair Multiple Sclerosis Centrum (UMSC), Hasselt-Pelt, Hasselt, Belgium.
- Faculty of Rehabilitation Sciences, REVAL Rehabilitation Research Center, University of Hasselt, Agoralaan Gebouw A, 3590, Diepenbeek, Belgium.
| | - Sonja A Kotz
- Department of Neuropsychology and Psychopharmacology, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Mattia Rosso
- Faculty of Arts and Philosophy, IPEM Institute of Psychoacoustics and Electronic Music, University of Ghent, Miriam Makebaplein 1, 9000, Gent, Belgium
- Université de Lille, ULR 4072 - PSITEC - Psychologie: Interactions, Temps, Emotions, Cognition, Lille, France
| | - Marc Leman
- Faculty of Arts and Philosophy, IPEM Institute of Psychoacoustics and Electronic Music, University of Ghent, Miriam Makebaplein 1, 9000, Gent, Belgium
| | - Lisa Tedesco Triccas
- Faculty of Rehabilitation Sciences, REVAL Rehabilitation Research Center, University of Hasselt, Agoralaan Gebouw A, 3590, Diepenbeek, Belgium
- Department of Movement and Clinical Neurosciences, Institute of Neurology, University College London, 33 Queen Square, London, UK
| | - Peter Feys
- Universitair Multiple Sclerosis Centrum (UMSC), Hasselt-Pelt, Hasselt, Belgium
- Faculty of Rehabilitation Sciences, REVAL Rehabilitation Research Center, University of Hasselt, Agoralaan Gebouw A, 3590, Diepenbeek, Belgium
| | - Lousin Moumdjian
- Universitair Multiple Sclerosis Centrum (UMSC), Hasselt-Pelt, Hasselt, Belgium
- Faculty of Rehabilitation Sciences, REVAL Rehabilitation Research Center, University of Hasselt, Agoralaan Gebouw A, 3590, Diepenbeek, Belgium
- Faculty of Arts and Philosophy, IPEM Institute of Psychoacoustics and Electronic Music, University of Ghent, Miriam Makebaplein 1, 9000, Gent, Belgium
| |
Collapse
|
19
|
Li J, Hong B, Nolte G, Engel AK, Zhang D. EEG-based speaker-listener neural coupling reflects speech-selective attentional mechanisms beyond the speech stimulus. Cereb Cortex 2023; 33:11080-11091. [PMID: 37814353 DOI: 10.1093/cercor/bhad347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/01/2023] [Accepted: 09/04/2023] [Indexed: 10/11/2023] Open
Abstract
When we pay attention to someone, do we focus only on the sound they make, the word they use, or do we form a mental space shared with the speaker we want to pay attention to? Some would argue that the human language is no other than a simple signal, but others claim that human beings understand each other because they form a shared mental ground between the speaker and the listener. Our study aimed to explore the neural mechanisms of speech-selective attention by investigating the electroencephalogram-based neural coupling between the speaker and the listener in a cocktail party paradigm. The temporal response function method was employed to reveal how the listener was coupled to the speaker at the neural level. The results showed that the neural coupling between the listener and the attended speaker peaked 5 s before speech onset at the delta band over the left frontal region, and was correlated with speech comprehension performance. In contrast, the attentional processing of speech acoustics and semantics occurred primarily at a later stage after speech onset and was not significantly correlated with comprehension performance. These findings suggest a predictive mechanism to achieve speaker-listener neural coupling for successful speech comprehension.
Collapse
Affiliation(s)
- Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee, Berlin 14195, Germany
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| |
Collapse
|
20
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
21
|
Van Hirtum T, Somers B, Dieudonné B, Verschueren E, Wouters J, Francart T. Neural envelope tracking predicts speech intelligibility and hearing aid benefit in children with hearing loss. Hear Res 2023; 439:108893. [PMID: 37806102 DOI: 10.1016/j.heares.2023.108893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/01/2023] [Accepted: 09/27/2023] [Indexed: 10/10/2023]
Abstract
Early assessment of hearing aid benefit is crucial, as the extent to which hearing aids provide audible speech information predicts speech and language outcomes. A growing body of research has proposed neural envelope tracking as an objective measure of speech intelligibility, particularly for individuals unable to provide reliable behavioral feedback. However, its potential for evaluating speech intelligibility and hearing aid benefit in children with hearing loss remains unexplored. In this study, we investigated neural envelope tracking in children with permanent hearing loss through two separate experiments. EEG data were recorded while children listened to age-appropriate stories (Experiment 1) or an animated movie (Experiment 2) under aided and unaided conditions (using personal hearing aids) at multiple stimulus intensities. Neural envelope tracking was evaluated using a linear decoder reconstructing the speech envelope from the EEG in the delta band (0.5-4 Hz). Additionally, we calculated temporal response functions (TRFs) to investigate the spatio-temporal dynamics of the response. In both experiments, neural tracking increased with increasing stimulus intensity, but only in the unaided condition. In the aided condition, neural tracking remained stable across a wide range of intensities, as long as speech intelligibility was maintained. Similarly, TRF amplitudes increased with increasing stimulus intensity in the unaided condition, while in the aided condition significant differences were found in TRF latency rather than TRF amplitude. This suggests that decreasing stimulus intensity does not necessarily impact neural tracking. Furthermore, the use of personal hearing aids significantly enhanced neural envelope tracking, particularly in challenging speech conditions that would be inaudible when unaided. Finally, we found a strong correlation between neural envelope tracking and behaviorally measured speech intelligibility for both narrated stories (Experiment 1) and movie stimuli (Experiment 2). Altogether, these findings indicate that neural envelope tracking could be a valuable tool for predicting speech intelligibility benefits derived from personal hearing aids in hearing-impaired children. Incorporating narrated stories or engaging movies expands the accessibility of these methods even in clinical settings, offering new avenues for using objective speech measures to guide pediatric audiology decision-making.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Jan Wouters
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium.
| |
Collapse
|
22
|
Zhang B, Hu S, Zhang T, Hai M, Wang Y, Li Y, Wang Y. Different patterns of foreground and background processing contribute to texture segregation in humans: an electrophysiological study. PeerJ 2023; 11:e16139. [PMID: 37810782 PMCID: PMC10552746 DOI: 10.7717/peerj.16139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 08/29/2023] [Indexed: 10/10/2023] Open
Abstract
Background Figure-ground segregation is a necessary process for accurate visual recognition. Previous neurophysiological and human brain imaging studies have suggested that foreground-background segregation relies on both enhanced foreground representation and suppressed background representation. However, in humans, it is not known when and how foreground and background processing play a role in texture segregation. Methods To answer this question, it is crucial to extract and dissociate the neural signals elicited by the foreground and background of a figure texture with high temporal resolution. Here, we combined an electroencephalogram (EEG) recording and a temporal response function (TRF) approach to specifically track the neural responses to the foreground and background of a figure texture from the overall EEG recordings in the luminance-tracking TRF. A uniform texture was included as a neutral condition. The texture segregation visual evoked potential (tsVEP) was calculated by subtracting the uniform TRF from the foreground and background TRFs, respectively, to index the specific segregation activity. Results We found that the foreground and background of a figure texture were processed differently during texture segregation. In the posterior region of the brain, we found a negative component for the foreground tsVEP in the early stage of foreground-background segregation, and two negative components for the background tsVEP in the early and late stages. In the anterior region, we found a positive component for the foreground tsVEP in the late stage, and two positive components for the background tsVEP in the early and late stages of texture processing. Discussion In this study we investigated the temporal profile of foreground and background processing during texture segregation in human participants at a high time resolution. The results demonstrated that the foreground and background jointly contribute to figure-ground segregation in both the early and late phases of texture processing. Our findings provide novel evidence for the neural correlates of foreground-background modulation during figure-ground segregation in humans.
Collapse
Affiliation(s)
- Baoqiang Zhang
- School of Psychology, Shaanxi Normal University, Xi’an, China
- Shaanxi Provincial Key Laboratory of Behavior & Cognitive Neuroscience, Xi’an, China
- Shaanxi Provincial Key Research Center of Child Mental and Behavioral Health, Xi’an, China
| | - Saisai Hu
- School of Psychology, Shaanxi Normal University, Xi’an, China
- Shaanxi Provincial Key Laboratory of Behavior & Cognitive Neuroscience, Xi’an, China
- Shaanxi Provincial Key Research Center of Child Mental and Behavioral Health, Xi’an, China
| | - Tingkang Zhang
- School of Psychology, Shaanxi Normal University, Xi’an, China
- Shaanxi Provincial Key Laboratory of Behavior & Cognitive Neuroscience, Xi’an, China
- Shaanxi Provincial Key Research Center of Child Mental and Behavioral Health, Xi’an, China
| | - Min Hai
- School of Psychology, Shaanxi Normal University, Xi’an, China
- Shaanxi Provincial Key Laboratory of Behavior & Cognitive Neuroscience, Xi’an, China
- Shaanxi Provincial Key Research Center of Child Mental and Behavioral Health, Xi’an, China
| | - Yongchun Wang
- School of Psychology, Shaanxi Normal University, Xi’an, China
- Shaanxi Provincial Key Laboratory of Behavior & Cognitive Neuroscience, Xi’an, China
- Shaanxi Provincial Key Research Center of Child Mental and Behavioral Health, Xi’an, China
| | - Ya Li
- School of Psychology, Shaanxi Normal University, Xi’an, China
- Shaanxi Provincial Key Laboratory of Behavior & Cognitive Neuroscience, Xi’an, China
- Shaanxi Provincial Key Research Center of Child Mental and Behavioral Health, Xi’an, China
| | - Yonghui Wang
- School of Psychology, Shaanxi Normal University, Xi’an, China
- Shaanxi Provincial Key Laboratory of Behavior & Cognitive Neuroscience, Xi’an, China
- Shaanxi Provincial Key Research Center of Child Mental and Behavioral Health, Xi’an, China
| |
Collapse
|
23
|
Ling Y, Xu C, Wen X, Li J, Gao J, Luo B. Cortical responses to auditory stimulation predict the prognosis of patients with disorders of consciousness. Clin Neurophysiol 2023; 153:11-20. [PMID: 37385110 DOI: 10.1016/j.clinph.2023.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 05/15/2023] [Accepted: 06/03/2023] [Indexed: 07/01/2023]
Abstract
OBJECTIVE This study aimed to assess the prognosis of patients with disorders of consciousness (DoC) using auditory stimulation with electroencephalogram (EEG) recordings. METHODS We enrolled 72 patients with DoC in the study, which involved subjecting patients to auditory stimulation while EEG responses were recorded. Coma Recovery Scale-Revised (CRS-R) scores and Glasgow Outcome Scale (GOS) were determined for each patient and followed up for three months. A frequency spectrum analysis was performed on the EEG recordings. Finally, the power spectral density (PSD) index was used to predict the prognosis of patients with DoC based on a support vector machine (SVM) model. RESULTS Power spectral analyses revealed that the cortical response to auditory stimulation showed a decreasing trend with decreasing consciousness levels. Auditory stimulation-induced changes in absolute PSD at the delta and theta bands were positively correlated with the CRS-R and GOS scores. Furthermore, these cortical responses to auditory stimulation had a good ability to discriminate between good and poor prognoses of patients with DoC. CONCLUSIONS Auditory stimulation-induced changes in the PSD were highly predictive of DoC outcomes. SIGNIFICANCE Our findings showed that cortical responses to auditory stimulation may be an important electrophysiological indicator of prognosis in patients with DoC.
Collapse
Affiliation(s)
- Yi Ling
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310000, China
| | - Chuan Xu
- Department of Neurology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou 310016, China
| | - Xinrui Wen
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310000, China
| | - Jingqi Li
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Jian Gao
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Benyan Luo
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310000, China.
| |
Collapse
|
24
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554451. [PMID: 37662393 PMCID: PMC10473711 DOI: 10.1101/2023.08.23.554451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers - an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R. Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| |
Collapse
|
25
|
Morrel J, Singapuri K, Landa RJ, Reetzke R. Neural correlates and predictors of speech and language development in infants at elevated likelihood for autism: a systematic review. Front Hum Neurosci 2023; 17:1211676. [PMID: 37662636 PMCID: PMC10469683 DOI: 10.3389/fnhum.2023.1211676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 07/25/2023] [Indexed: 09/05/2023] Open
Abstract
Autism spectrum disorder (ASD) is an increasingly prevalent and heterogeneous neurodevelopmental condition, characterized by social communicative differences, and a combination of repetitive behaviors, focused interests, and sensory sensitivities. Early speech and language delays are characteristic of young autistic children and are one of the first concerns reported by parents; often before their child's second birthday. Elucidating the neural mechanisms underlying these delays has the potential to improve early detection and intervention efforts. To fill this gap, this systematic review aimed to synthesize evidence on early neurobiological correlates and predictors of speech and language development across different neuroimaging modalities in infants with and without a family history of autism [at an elevated (EL infants) and low likelihood (LL infants) for developing autism, respectively]. A comprehensive, systematic review identified 24 peer-reviewed articles published between 2012 and 2023, utilizing structural magnetic resonance imaging (MRI; n = 2), functional MRI (fMRI; n = 4), functional near-infrared spectroscopy (fNIRS; n = 4), and electroencephalography (EEG; n = 14). Three main themes in results emerged: compared to LL infants, EL infants exhibited (1) atypical language-related neural lateralization; (2) alterations in structural and functional connectivity; and (3) mixed profiles of neural sensitivity to speech and non-speech stimuli, with some differences detected as early as 6 weeks of age. These findings suggest that neuroimaging techniques may be sensitive to early indicators of speech and language delays well before overt behavioral delays emerge. Future research should aim to harmonize experimental paradigms both within and across neuroimaging modalities and additionally address the feasibility, acceptability, and scalability of implementing such methodologies in non-academic, community-based settings.
Collapse
Affiliation(s)
- Jessica Morrel
- Center for Autism and Related Disorders, Kennedy Krieger Institute, Baltimore, MD, United States
| | - Kripi Singapuri
- Center for Neurodevelopmental and Imaging Research, Kennedy Krieger Institute, Baltimore, MD, United States
| | - Rebecca J. Landa
- Center for Autism and Related Disorders, Kennedy Krieger Institute, Baltimore, MD, United States
- Department of Psychiatry and Behavioral Sciences, The Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Rachel Reetzke
- Center for Autism and Related Disorders, Kennedy Krieger Institute, Baltimore, MD, United States
- Department of Psychiatry and Behavioral Sciences, The Johns Hopkins University School of Medicine, Baltimore, MD, United States
| |
Collapse
|
26
|
Liang B, Li Y, Zhao W, Du Y. Bilateral human laryngeal motor cortex in perceptual decision of lexical tone and voicing of consonant. Nat Commun 2023; 14:4710. [PMID: 37543659 PMCID: PMC10404239 DOI: 10.1038/s41467-023-40445-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 07/27/2023] [Indexed: 08/07/2023] Open
Abstract
Speech perception is believed to recruit the left motor cortex. However, the exact role of the laryngeal subregion and its right counterpart in speech perception, as well as their temporal patterns of involvement remain unclear. To address these questions, we conducted a hypothesis-driven study, utilizing transcranial magnetic stimulation on the left or right dorsal laryngeal motor cortex (dLMC) when participants performed perceptual decision on Mandarin lexical tone or consonant (voicing contrast) presented with or without noise. We used psychometric function and hierarchical drift-diffusion model to disentangle perceptual sensitivity and dynamic decision-making parameters. Results showed that bilateral dLMCs were engaged with effector specificity, and this engagement was left-lateralized with right upregulation in noise. Furthermore, the dLMC contributed to various decision stages depending on the hemisphere and task difficulty. These findings substantially advance our understanding of the hemispherical lateralization and temporal dynamics of bilateral dLMC in sensorimotor integration during speech perceptual decision-making.
Collapse
Affiliation(s)
- Baishen Liang
- Institute of Psychology, CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, 100101, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yanchang Li
- Institute of Psychology, CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, 100101, China
| | - Wanying Zhao
- Institute of Psychology, CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yi Du
- Institute of Psychology, CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, 100101, China.
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, 200031, China.
- Chinese Institute for Brain Research, Beijing, 102206, China.
| |
Collapse
|
27
|
Jia Z, Xu C, Li J, Gao J, Ding N, Luo B, Zou J. Phase Property of Envelope-Tracking EEG Response Is Preserved in Patients with Disorders of Consciousness. eNeuro 2023; 10:ENEURO.0130-23.2023. [PMID: 37500493 PMCID: PMC10420405 DOI: 10.1523/eneuro.0130-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/16/2023] [Accepted: 07/20/2023] [Indexed: 07/29/2023] Open
Abstract
When listening to speech, the low-frequency cortical response below 10 Hz can track the speech envelope. Previous studies have demonstrated that the phase lag between speech envelope and cortical response can reflect the mechanism by which the envelope-tracking response is generated. Here, we analyze whether the mechanism to generate the envelope-tracking response is modulated by the level of consciousness, by studying how the stimulus-response phase lag is modulated by the disorder of consciousness (DoC). It is observed that DoC patients in general show less reliable neural tracking of speech. Nevertheless, the stimulus-response phase lag changes linearly with frequency between 3.5 and 8 Hz, for DoC patients who show reliable cortical tracking to speech, regardless of the consciousness state. The mean phase lag is also consistent across these DoC patients. These results suggest that the envelope-tracking response to speech can be generated by an automatic process that is barely modulated by the consciousness state.
Collapse
Affiliation(s)
- Ziting Jia
- The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250033, China
| | - Chuan Xu
- Department of Neurology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou 310019, China
| | - Jingqi Li
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Jian Gao
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Benyan Luo
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310003, China
| | - Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
28
|
Abbasi O, Steingräber N, Chalas N, Kluger DS, Gross J. Spatiotemporal dynamics characterise spectral connectivity profiles of continuous speaking and listening. PLoS Biol 2023; 21:e3002178. [PMID: 37478152 DOI: 10.1371/journal.pbio.3002178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/31/2023] [Indexed: 07/23/2023] Open
Abstract
Speech production and perception are fundamental processes of human cognition that both rely on intricate processing mechanisms that are still poorly understood. Here, we study these processes by using magnetoencephalography (MEG) to comprehensively map connectivity of regional brain activity within the brain and to the speech envelope during continuous speaking and listening. Our results reveal not only a partly shared neural substrate for both processes but also a dissociation in space, delay, and frequency. Neural activity in motor and frontal areas is coupled to succeeding speech in delta band (1 to 3 Hz), whereas coupling in the theta range follows speech in temporal areas during speaking. Neural connectivity results showed a separation of bottom-up and top-down signalling in distinct frequency bands during speaking. Here, we show that frequency-specific connectivity channels for bottom-up and top-down signalling support continuous speaking and listening. These findings further shed light on the complex interplay between different brain regions involved in speech production and perception.
Collapse
Affiliation(s)
- Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Nadine Steingräber
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
29
|
Kurmanavičiūtė D, Kataja H, Jas M, Välilä A, Parkkonen L. Target of selective auditory attention can be robustly followed with MEG. Sci Rep 2023; 13:10959. [PMID: 37414861 PMCID: PMC10325959 DOI: 10.1038/s41598-023-37959-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/30/2023] [Indexed: 07/08/2023] Open
Abstract
Selective auditory attention enables filtering of relevant acoustic information from irrelevant. Specific auditory responses, measurable by magneto- and electroencephalography (MEG/EEG), are known to be modulated by attention to the evoking stimuli. However, such attention effects have typically been studied in unnatural conditions (e.g. during dichotic listening of pure tones) and have been demonstrated mostly in averaged auditory evoked responses. To test how reliably we can detect the attention target from unaveraged brain responses, we recorded MEG data from 15 healthy subjects that were presented with two human speakers uttering continuously the words "Yes" and "No" in an interleaved manner. The subjects were asked to attend to one speaker. To investigate which temporal and spatial aspects of the responses carry the most information about the target of auditory attention, we performed spatially and temporally resolved classification of the unaveraged MEG responses using a support vector machine. Sensor-level decoding of the responses to attended vs. unattended words resulted in a mean accuracy of [Formula: see text] (N = 14) for both stimulus words. The discriminating information was mostly available 200-400 ms after the stimulus onset. Spatially-resolved source-level decoding indicated that the most informative sources were in the auditory cortices, in both the left and right hemisphere. Our result corroborates attention modulation of auditory evoked responses and shows that such modulations are detectable in unaveraged MEG responses at high accuracy, which could be exploited e.g. in an intuitive brain-computer interface.
Collapse
Affiliation(s)
- Dovilė Kurmanavičiūtė
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland.
| | - Hanna Kataja
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland
| | - Mainak Jas
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland
- Athinoula A. Martinos Center for Biomedical Imaging, 149 Thirteenth Street, Charlestown, MA, 02129, USA
| | - Anne Välilä
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland
| | - Lauri Parkkonen
- Department of Neuroscience and Biomedical Engineering, Aalto University, P.O. Box 12200, 00076, Aalto, Finland
- Aalto NeuroImaging, Aalto University, 00076, Aalto, Finland
| |
Collapse
|
30
|
Lindboom E, Nidiffer A, Carney LH, Lalor EC. Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. Hear Res 2023; 433:108767. [PMID: 37060895 PMCID: PMC10559335 DOI: 10.1016/j.heares.2023.108767] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/29/2023] [Accepted: 04/09/2023] [Indexed: 04/17/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyzes resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.
Collapse
Affiliation(s)
- Elsa Lindboom
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Aaron Nidiffer
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA.
| | - Edmund C Lalor
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| |
Collapse
|
31
|
Karunathilake IMD, Dunlap JL, Perera J, Presacco A, Decruy L, Anderson S, Kuchinsky SE, Simon JZ. Effects of aging on cortical representations of continuous speech. J Neurophysiol 2023; 129:1359-1377. [PMID: 37096924 PMCID: PMC10202479 DOI: 10.1152/jn.00356.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/04/2023] [Accepted: 04/20/2023] [Indexed: 04/26/2023] Open
Abstract
Understanding speech in a noisy environment is crucial in day-to-day interactions and yet becomes more challenging with age, even for healthy aging. Age-related changes in the neural mechanisms that enable speech-in-noise listening have been investigated previously; however, the extent to which age affects the timing and fidelity of encoding of target and interfering speech streams is not well understood. Using magnetoencephalography (MEG), we investigated how continuous speech is represented in auditory cortex in the presence of interfering speech in younger and older adults. Cortical representations were obtained from neural responses that time-locked to the speech envelopes with speech envelope reconstruction and temporal response functions (TRFs). TRFs showed three prominent peaks corresponding to auditory cortical processing stages: early (∼50 ms), middle (∼100 ms), and late (∼200 ms). Older adults showed exaggerated speech envelope representations compared with younger adults. Temporal analysis revealed both that the age-related exaggeration starts as early as ∼50 ms and that older adults needed a substantially longer integration time window to achieve their better reconstruction of the speech envelope. As expected, with increased speech masking envelope reconstruction for the attended talker decreased and all three TRF peaks were delayed, with aging contributing additionally to the reduction. Interestingly, for older adults the late peak was delayed, suggesting that this late peak may receive contributions from multiple sources. Together these results suggest that there are several mechanisms at play compensating for age-related temporal processing deficits at several stages but which are not able to fully reestablish unimpaired speech perception.NEW & NOTEWORTHY We observed age-related changes in cortical temporal processing of continuous speech that may be related to older adults' difficulty in understanding speech in noise. These changes occur in both timing and strength of the speech representations at different cortical processing stages and depend on both noise condition and selective attention. Critically, their dependence on noise condition changes dramatically among the early, middle, and late cortical processing stages, underscoring how aging differentially affects these stages.
Collapse
Affiliation(s)
- I M Dushyanthi Karunathilake
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
| | - Jason L Dunlap
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Janani Perera
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Alessandro Presacco
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Stefanie E Kuchinsky
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland, United States
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
- Department of Biology, University of Maryland, College Park, Maryland, United States
| |
Collapse
|
32
|
Zioga I, Weissbart H, Lewis AG, Haegens S, Martin AE. Naturalistic Spoken Language Comprehension Is Supported by Alpha and Beta Oscillations. J Neurosci 2023; 43:3718-3732. [PMID: 37059462 PMCID: PMC10198453 DOI: 10.1523/jneurosci.1500-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 03/17/2023] [Accepted: 03/23/2023] [Indexed: 04/16/2023] Open
Abstract
Brain oscillations are prevalent in all species and are involved in numerous perceptual operations. α oscillations are thought to facilitate processing through the inhibition of task-irrelevant networks, while β oscillations are linked to the putative reactivation of content representations. Can the proposed functional role of α and β oscillations be generalized from low-level operations to higher-level cognitive processes? Here we address this question focusing on naturalistic spoken language comprehension. Twenty-two (18 female) Dutch native speakers listened to stories in Dutch and French while MEG was recorded. We used dependency parsing to identify three dependency states at each word: the number of (1) newly opened dependencies, (2) dependencies that remained open, and (3) resolved dependencies. We then constructed forward models to predict α and β power from the dependency features. Results showed that dependency features predict α and β power in language-related regions beyond low-level linguistic features. Left temporal, fundamental language regions are involved in language comprehension in α, while frontal and parietal, higher-order language regions, and motor regions are involved in β. Critically, α- and β-band dynamics seem to subserve language comprehension tapping into syntactic structure building and semantic composition by providing low-level mechanistic operations for inhibition and reactivation processes. Because of the temporal similarity of the α-β responses, their potential functional dissociation remains to be elucidated. Overall, this study sheds light on the role of α and β oscillations during naturalistic spoken language comprehension, providing evidence for the generalizability of these dynamics from perceptual to complex linguistic processes.SIGNIFICANCE STATEMENT It remains unclear whether the proposed functional role of α and β oscillations in perceptual and motor function is generalizable to higher-level cognitive processes, such as spoken language comprehension. We found that syntactic features predict α and β power in language-related regions beyond low-level linguistic features when listening to naturalistic speech in a known language. We offer experimental findings that integrate a neuroscientific framework on the role of brain oscillations as "building blocks" with spoken language comprehension. This supports the view of a domain-general role of oscillations across the hierarchy of cognitive functions, from low-level sensory operations to abstract linguistic processes.
Collapse
Affiliation(s)
- Ioanna Zioga
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| | - Hugo Weissbart
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
| | - Ashley G Lewis
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| | - Saskia Haegens
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Department of Psychiatry, Columbia University, New York, New York 10032
- Division of Systems Neuroscience, New York State Psychiatric Institute, New York, New York 10032
| | - Andrea E Martin
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands
| |
Collapse
|
33
|
Van Hirtum T, Somers B, Verschueren E, Dieudonné B, Francart T. Delta-band neural envelope tracking predicts speech intelligibility in noise in preschoolers. Hear Res 2023; 434:108785. [PMID: 37172414 DOI: 10.1016/j.heares.2023.108785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/24/2023] [Accepted: 05/05/2023] [Indexed: 05/15/2023]
Abstract
Behavioral tests are currently the gold standard in measuring speech intelligibility. However, these tests can be difficult to administer in young children due to factors such as motivation, linguistic knowledge and cognitive skills. It has been shown that measures of neural envelope tracking can be used to predict speech intelligibility and overcome these issues. However, its potential as an objective measure for speech intelligibility in noise remains to be investigated in preschool children. Here, we evaluated neural envelope tracking as a function of signal-to-noise ratio (SNR) in 14 5-year-old children. We examined EEG responses to natural, continuous speech presented at different SNRs ranging from -8 (very difficult) to 8 dB SNR (very easy). As expected delta band (0.5-4 Hz) tracking increased with increasing stimulus SNR. However, this increase was not strictly monotonic as neural tracking reached a plateau between 0 and 4 dB SNR, similarly to the behavioral speech intelligibility outcomes. These findings indicate that neural tracking in the delta band remains stable, as long as the acoustical degradation of the speech signal does not reflect significant changes in speech intelligibility. Theta band tracking (4-8 Hz), on the other hand, was found to be drastically reduced and more easily affected by noise in children, making it less reliable as a measure of speech intelligibility. By contrast, neural envelope tracking in the delta band was directly associated with behavioral measures of speech intelligibility. This suggests that neural envelope tracking in the delta band is a valuable tool for evaluating speech-in-noise intelligibility in preschoolers, highlighting its potential as an objective measure of speech in difficult-to-test populations.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium.
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| |
Collapse
|
34
|
Park JJ, Baek SC, Suh MW, Choi J, Kim SJ, Lim Y. The effect of topic familiarity and volatility of auditory scene on selective auditory attention. Hear Res 2023; 433:108770. [PMID: 37104990 DOI: 10.1016/j.heares.2023.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/06/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023]
Abstract
Selective auditory attention has been shown to modulate the cortical representation of speech. This effect has been well documented in acoustically more challenging environments. However, the influence of top-down factors, in particular topic familiarity, on this process remains unclear, despite evidence that semantic information can promote speech-in-noise perception. Apart from individual features forming a static listening condition, dynamic and irregular changes of auditory scenes-volatile listening environments-have been less studied. To address these gaps, we explored the influence of topic familiarity and volatile listening on the selective auditory attention process during dichotic listening using electroencephalography. When stories with unfamiliar topics were presented, participants' comprehension was severely degraded. However, their cortical activity selectively tracked the speech of the target story well. This implies that topic familiarity hardly influences the speech tracking neural index, possibly when the bottom-up information is sufficient. However, when the listening environment was volatile and the listeners had to re-engage in new speech whenever auditory scenes altered, the neural correlates of the attended speech were degraded. In particular, the cortical response to the attended speech and the spatial asymmetry of the response to the left and right attention were significantly attenuated around 100-200 ms after the speech onset. These findings suggest that volatile listening environments could adversely affect the modulation effect of selective attention, possibly by hampering proper attention due to increased perceptual load.
Collapse
Affiliation(s)
- Jonghwa Jeonglok Park
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul 03080, South Korea
| | - Jongsuk Choi
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of AI Robotics, KIST School, Korea University of Science and Technology, Seoul 02792, South Korea
| | - Sung June Kim
- Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, South Korea.
| |
Collapse
|
35
|
Kaufman M, Zion Golumbic E. Listening to two speakers: Capacity and tradeoffs in neural speech tracking during Selective and Distributed Attention. Neuroimage 2023; 270:119984. [PMID: 36854352 DOI: 10.1016/j.neuroimage.2023.119984] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 02/06/2023] [Accepted: 02/24/2023] [Indexed: 02/27/2023] Open
Abstract
Speech comprehension is severely compromised when several people talk at once, due to limited perceptual and cognitive resources. In such circumstances, top-down attention mechanisms can actively prioritize processing of task-relevant speech. However, behavioral and neural evidence suggest that this selection is not exclusive, and the system may have sufficient capacity to process additional speech input as well. Here we used a data-driven approach to contrast two opposing hypotheses regarding the system's capacity to co-represent competing speech: Can the brain represent two speakers equally or is the system fundamentally limited, resulting in tradeoffs between them? Neural activity was measured using magnetoencephalography (MEG) as human participants heard concurrent speech narratives and engaged in two tasks: Selective Attention, where only one speaker was task-relevant and Distributed Attention, where both speakers were equally relevant. Analysis of neural speech-tracking revealed that both tasks engaged a similar network of brain regions involved in auditory processing, attentional control and speech processing. Interestingly, during both Selective and Distributed Attention the neural representation of competing speech showed a bias towards one speaker. This is in line with proposed 'bottlenecks' for co-representation of concurrent speech and suggests that good performance on distributed attention tasks may be achieved by toggling attention between speakers over time.
Collapse
Affiliation(s)
- Maya Kaufman
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, Israel.
| |
Collapse
|
36
|
Xie Z, Brodbeck C, Chandrasekaran B. Cortical Tracking of Continuous Speech Under Bimodal Divided Attention. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:318-343. [PMID: 37229509 PMCID: PMC10205152 DOI: 10.1162/nol_a_00100] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/11/2023] [Indexed: 05/27/2023]
Abstract
Speech processing often occurs amid competing inputs from other modalities, for example, listening to the radio while driving. We examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) impacts neural processing of natural continuous speech from acoustic to linguistic levels of representation. We recorded electroencephalographic (EEG) responses when human participants performed a challenging primary visual task, imposing low or high cognitive load while listening to audiobook stories as a secondary task. The two dual-task conditions were contrasted with an auditory single-task condition in which participants attended to stories while ignoring visual stimuli. Behaviorally, the high load dual-task condition was associated with lower speech comprehension accuracy relative to the other two conditions. We fitted multivariate temporal response function encoding models to predict EEG responses from acoustic and linguistic speech features at different representation levels, including auditory spectrograms and information-theoretic models of sublexical-, word-form-, and sentence-level representations. Neural tracking of most acoustic and linguistic features remained unchanged with increasing dual-task load, despite unambiguous behavioral and neural evidence of the high load dual-task condition being more demanding. Compared to the auditory single-task condition, dual-task conditions selectively reduced neural tracking of only some acoustic and linguistic features, mainly at latencies >200 ms, while earlier latencies were surprisingly unaffected. These findings indicate that behavioral effects of bimodal divided attention on continuous speech processing occur not because of impaired early sensory representations but likely at later cognitive processing stages. Crossmodal attention-related mechanisms may not be uniform across different speech processing levels.
Collapse
Affiliation(s)
- Zilong Xie
- School of Communication Science and Disorders, Florida State University, Tallahassee, FL, USA
| | - Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
37
|
Richter B, Putze F, Ivucic G, Brandt M, Schütze C, Reisenhofer R, Wrede B, Schultz T. EEG Correlates of Distractions and Hesitations in Human–Robot Interaction: A LabLinking Pilot Study. MULTIMODAL TECHNOLOGIES AND INTERACTION 2023. [DOI: 10.3390/mti7040037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023] Open
Abstract
In this paper, we investigate the effect of distractions and hesitations as a scaffolding strategy. Recent research points to the potential beneficial effects of a speaker’s hesitations on the listeners’ comprehension of utterances, although results from studies on this issue indicate that humans do not make strategic use of them. The role of hesitations and their communicative function in human-human interaction is a much-discussed topic in current research. To better understand the underlying cognitive processes, we developed a human–robot interaction (HRI) setup that allows the measurement of the electroencephalogram (EEG) signals of a human participant while interacting with a robot. We thereby address the research question of whether we find effects on single-trial EEG based on the distraction and the corresponding robot’s hesitation scaffolding strategy. To carry out the experiments, we leverage our LabLinking method, which enables interdisciplinary joint research between remote labs. This study could not have been conducted without LabLinking, as the two involved labs needed to combine their individual expertise and equipment to achieve the goal together. The results of our study indicate that the EEG correlates in the distracted condition are different from the baseline condition without distractions. Furthermore, we could differentiate the EEG correlates of distraction with and without a hesitation scaffolding strategy. This proof-of-concept study shows that LabLinking makes it possible to conduct collaborative HRI studies in remote laboratories and lays the first foundation for more in-depth research into robotic scaffolding strategies.
Collapse
|
38
|
Manting CL, Gulyas B, Ullén F, Lundqvist D. Steady-state responses to concurrent melodies: source distribution, top-down, and bottom-up attention. Cereb Cortex 2023; 33:3053-3066. [PMID: 35858223 PMCID: PMC10016039 DOI: 10.1093/cercor/bhac260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 06/03/2022] [Accepted: 06/03/2022] [Indexed: 11/13/2022] Open
Abstract
Humans can direct attentional resources to a single sound occurring simultaneously among others to extract the most behaviourally relevant information present. To investigate this cognitive phenomenon in a precise manner, we used frequency-tagging to separate neural auditory steady-state responses (ASSRs) that can be traced back to each auditory stimulus, from the neural mix elicited by multiple simultaneous sounds. Using a mixture of 2 frequency-tagged melody streams, we instructed participants to selectively attend to one stream or the other while following the development of the pitch contour. Bottom-up attention towards either stream was also manipulated with salient changes in pitch. Distributed source analyses of magnetoencephalography measurements showed that the effect of ASSR enhancement from top-down driven attention was strongest at the left frontal cortex, while that of bottom-up driven attention was dominant at the right temporal cortex. Furthermore, the degree of ASSR suppression from simultaneous stimuli varied across cortical lobes and hemisphere. The ASSR source distribution changes from temporal-dominance during single-stream perception, to proportionally more activity in the frontal and centro-parietal cortical regions when listening to simultaneous streams. These findings are a step forward to studying cognition in more complex and naturalistic soundscapes using frequency-tagging.
Collapse
Affiliation(s)
| | - Balazs Gulyas
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
- Cognitive Neuroimaging Centre (CoNiC), Lee Kong Chien School of Medicine, Nanyang Technological University, Singapore 636921, Singapore
| | - Fredrik Ullén
- Department of Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt 60322, Germany
| | - Daniel Lundqvist
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
| |
Collapse
|
39
|
Xu N, Zhao B, Luo L, Zhang K, Shao X, Luan G, Wang Q, Hu W, Wang Q. Two stages of speech envelope tracking in human auditory cortex modulated by speech intelligibility. Cereb Cortex 2023; 33:2215-2228. [PMID: 35695785 DOI: 10.1093/cercor/bhac203] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/01/2022] [Accepted: 05/02/2022] [Indexed: 11/13/2022] Open
Abstract
The envelope is essential for speech perception. Recent studies have shown that cortical activity can track the acoustic envelope. However, whether the tracking strength reflects the extent of speech intelligibility processing remains controversial. Here, using stereo-electroencephalogram technology, we directly recorded the activity in human auditory cortex while subjects listened to either natural or noise-vocoded speech. These 2 stimuli have approximately identical envelopes, but the noise-vocoded speech does not have speech intelligibility. According to the tracking lags, we revealed 2 stages of envelope tracking: an early high-γ (60-140 Hz) power stage that preferred the noise-vocoded speech and a late θ (4-8 Hz) phase stage that preferred the natural speech. Furthermore, the decoding performance of high-γ power was better in primary auditory cortex than in nonprimary auditory cortex, consistent with its short tracking delay, while θ phase showed better decoding performance in right auditory cortex. In addition, high-γ responses with sustained temporal profiles in nonprimary auditory cortex were dominant in both envelope tracking and decoding. In sum, we suggested a functional dissociation between high-γ power and θ phase: the former reflects fast and automatic processing of brief acoustic features, while the latter correlates to slow build-up processing facilitated by speech intelligibility.
Collapse
Affiliation(s)
- Na Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China.,National Clinical Research Center for Neurological Diseases, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Baotian Zhao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Lu Luo
- School of Psychology, Beijing Sport University, No. 48 Xinxi Road, Haidian District, Beijing 100084, China
| | - Kai Zhang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Xiaoqiu Shao
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Guoming Luan
- Beijing Key Laboratory of Epilepsy, Epilepsy Center, Sanbo Brain Hospital, Capital Medical University, No. 50 Yikesong Xiangshan Road, Haidian District, Beijing 100093, China.,Beijing Institute of Brain Disorders, Collaborative Innovation Center for Brain Disorders, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing 100069, China
| | - Qian Wang
- Beijing Key Laboratory of Epilepsy, Epilepsy Center, Sanbo Brain Hospital, Capital Medical University, No. 50 Yikesong Xiangshan Road, Haidian District, Beijing 100093, China.,School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China.,IDG/McGovern Institute for Brain Research, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Wenhan Hu
- Beijing Neurosurgical Institute, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Qun Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China.,National Clinical Research Center for Neurological Diseases, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China.,Beijing Institute of Brain Disorders, Collaborative Innovation Center for Brain Disorders, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing 100069, China
| |
Collapse
|
40
|
Accou B, Vanthornhout J, Hamme HV, Francart T. Decoding of the speech envelope from EEG using the VLAAI deep neural network. Sci Rep 2023; 13:812. [PMID: 36646740 PMCID: PMC9842721 DOI: 10.1038/s41598-022-27332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 12/30/2022] [Indexed: 01/18/2023] Open
Abstract
To investigate the processing of speech in the brain, commonly simple linear models are used to establish a relationship between brain signals and speech features. However, these linear models are ill-equipped to model a highly-dynamic, complex non-linear system like the brain, and they often require a substantial amount of subject-specific training data. This work introduces a novel speech decoder architecture: the Very Large Augmented Auditory Inference (VLAAI) network. The VLAAI network outperformed state-of-the-art subject-independent models (median Pearson correlation of 0.19, p < 0.001), yielding an increase over the well-established linear model by 52%. Using ablation techniques, we identified the relative importance of each part of the VLAAI network and found that the non-linear components and output context module influenced model performance the most (10% relative performance increase). Subsequently, the VLAAI network was evaluated on a holdout dataset of 26 subjects and a publicly available unseen dataset to test generalization for unseen subjects and stimuli. No significant difference was found between the default test and the holdout subjects, and between the default test set and the public dataset. The VLAAI network also significantly outperformed all baseline models on the public dataset. We evaluated the effect of training set size by training the VLAAI network on data from 1 up to 80 subjects and evaluated on 26 holdout subjects, revealing a relationship following a hyperbolic tangent function between the number of subjects in the training set and the performance on unseen subjects. Finally, the subject-independent VLAAI network was finetuned for 26 holdout subjects to obtain subject-specific VLAAI models. With 5 minutes of data or more, a significant performance improvement was found, up to 34% (from 0.18 to 0.25 median Pearson correlation) with regards to the subject-independent VLAAI network.
Collapse
Affiliation(s)
- Bernd Accou
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium. .,PSI, Department of Electrical Engineering, KU Leuven, Leuven, Belgium.
| | | | - Hugo Van Hamme
- PSI, Department of Electrical Engineering, KU Leuven, Leuven, Belgium
| | - Tom Francart
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium.
| |
Collapse
|
41
|
Mesik J, Wojtczak M. The effects of data quantity on performance of temporal response function analyses of natural speech processing. Front Neurosci 2023; 16:963629. [PMID: 36711133 PMCID: PMC9878558 DOI: 10.3389/fnins.2022.963629] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 12/26/2022] [Indexed: 01/15/2023] Open
Abstract
In recent years, temporal response function (TRF) analyses of neural activity recordings evoked by continuous naturalistic stimuli have become increasingly popular for characterizing response properties within the auditory hierarchy. However, despite this rise in TRF usage, relatively few educational resources for these tools exist. Here we use a dual-talker continuous speech paradigm to demonstrate how a key parameter of experimental design, the quantity of acquired data, influences TRF analyses fit to either individual data (subject-specific analyses), or group data (generic analyses). We show that although model prediction accuracy increases monotonically with data quantity, the amount of data required to achieve significant prediction accuracies can vary substantially based on whether the fitted model contains densely (e.g., acoustic envelope) or sparsely (e.g., lexical surprisal) spaced features, especially when the goal of the analyses is to capture the aspect of neural responses uniquely explained by specific features. Moreover, we demonstrate that generic models can exhibit high performance on small amounts of test data (2-8 min), if they are trained on a sufficiently large data set. As such, they may be particularly useful for clinical and multi-task study designs with limited recording time. Finally, we show that the regularization procedure used in fitting TRF models can interact with the quantity of data used to fit the models, with larger training quantities resulting in systematically larger TRF amplitudes. Together, demonstrations in this work should aid new users of TRF analyses, and in combination with other tools, such as piloting and power analyses, may serve as a detailed reference for choosing acquisition duration in future studies.
Collapse
Affiliation(s)
- Juraj Mesik
- Department of Psychology, University of Minnesota, Minneapolis, MN, United States
| | | |
Collapse
|
42
|
Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.02.522438. [PMID: 36711934 PMCID: PMC9881851 DOI: 10.1101/2023.01.02.522438] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyses resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.x.
Collapse
|
43
|
Garibyan A, Schilling A, Boehm C, Zankl A, Krauss P. Neural correlates of linguistic collocations during continuous speech perception. Front Psychol 2022; 13:1076339. [PMID: 36619132 PMCID: PMC9822706 DOI: 10.3389/fpsyg.2022.1076339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 12/02/2022] [Indexed: 12/25/2022] Open
Abstract
Language is fundamentally predictable, both on a higher schematic level as well as low-level lexical items. Regarding predictability on a lexical level, collocations are frequent co-occurrences of words that are often characterized by high strength of association. So far, psycho- and neurolinguistic studies have mostly employed highly artificial experimental paradigms in the investigation of collocations by focusing on the processing of single words or isolated sentences. In contrast, here we analyze EEG brain responses recorded during stimulation with continuous speech, i.e., audio books. We find that the N400 response to collocations is significantly different from that of non-collocations, whereas the effect varies with respect to cortical region (anterior/posterior) and laterality (left/right). Our results are in line with studies using continuous speech, and they mostly contradict those using artificial paradigms and stimuli. To the best of our knowledge, this is the first neurolinguistic study on collocations using continuous speech stimulation.
Collapse
Affiliation(s)
- Armine Garibyan
- Chair of English Philology and Linguistics, University Erlangen-Nuremberg, Erlangen, Germany,Linguistics Lab, University Erlangen-Nuremberg, Erlangen, Germany
| | - Achim Schilling
- Neuroscience Lab, University Hospital Erlangen, Erlangen, Germany,Cognitive Computational Neuroscience Group, University Erlangen-Nuremberg, Erlangen, Germany
| | - Claudia Boehm
- Linguistics Lab, University Erlangen-Nuremberg, Erlangen, Germany,Neuroscience Lab, University Hospital Erlangen, Erlangen, Germany,Cognitive Computational Neuroscience Group, University Erlangen-Nuremberg, Erlangen, Germany
| | - Alexandra Zankl
- Linguistics Lab, University Erlangen-Nuremberg, Erlangen, Germany,Neuroscience Lab, University Hospital Erlangen, Erlangen, Germany,Cognitive Computational Neuroscience Group, University Erlangen-Nuremberg, Erlangen, Germany
| | - Patrick Krauss
- Linguistics Lab, University Erlangen-Nuremberg, Erlangen, Germany,Neuroscience Lab, University Hospital Erlangen, Erlangen, Germany,Cognitive Computational Neuroscience Group, University Erlangen-Nuremberg, Erlangen, Germany,Pattern Recognition Lab, University Erlangen-Nuremberg, Erlangen, Germany,*Correspondence: Patrick Krauss,
| |
Collapse
|
44
|
Pastore A, Tomassini A, Delis I, Dolfini E, Fadiga L, D'Ausilio A. Speech listening entails neural encoding of invisible articulatory features. Neuroimage 2022; 264:119724. [PMID: 36328272 DOI: 10.1016/j.neuroimage.2022.119724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/28/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022] Open
Abstract
Speech processing entails a complex interplay between bottom-up and top-down computations. The former is reflected in the neural entrainment to the quasi-rhythmic properties of speech acoustics while the latter is supposed to guide the selection of the most relevant input subspace. Top-down signals are believed to originate mainly from motor regions, yet similar activities have been shown to tune attentional cycles also for simpler, non-speech stimuli. Here we examined whether, during speech listening, the brain reconstructs articulatory patterns associated to speech production. We measured electroencephalographic (EEG) data while participants listened to sentences during the production of which articulatory kinematics of lips, jaws and tongue were also recorded (via Electro-Magnetic Articulography, EMA). We captured the patterns of articulatory coordination through Principal Component Analysis (PCA) and used Partial Information Decomposition (PID) to identify whether the speech envelope and each of the kinematic components provided unique, synergistic and/or redundant information regarding the EEG signals. Interestingly, tongue movements contain both unique as well as synergistic information with the envelope that are encoded in the listener's brain activity. This demonstrates that during speech listening the brain retrieves highly specific and unique motor information that is never accessible through vision, thus leveraging audio-motor maps that arise most likely from the acquisition of speech production during development.
Collapse
Affiliation(s)
- A Pastore
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| | - A Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
| | - I Delis
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - E Dolfini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - L Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - A D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| |
Collapse
|
45
|
Pinto D, Kaufman M, Brown A, Zion Golumbic E. An ecological investigation of the capacity to follow simultaneous speech and preferential detection of ones’ own name. Cereb Cortex 2022; 33:5361-5374. [PMID: 36331339 DOI: 10.1093/cercor/bhac424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/11/2022] [Accepted: 09/12/2022] [Indexed: 11/06/2022] Open
Abstract
Abstract
Many situations require focusing attention on one speaker, while monitoring the environment for potentially important information. Some have proposed that dividing attention among 2 speakers involves behavioral trade-offs, due to limited cognitive resources. However the severity of these trade-offs, particularly under ecologically-valid circumstances, is not well understood. We investigated the capacity to process simultaneous speech using a dual-task paradigm simulating task-demands and stimuli encountered in real-life. Participants listened to conversational narratives (Narrative Stream) and monitored a stream of announcements (Barista Stream), to detect when their order was called. We measured participants’ performance, neural activity, and skin conductance as they engaged in this dual-task. Participants achieved extremely high dual-task accuracy, with no apparent behavioral trade-offs. Moreover, robust neural and physiological responses were observed for target-stimuli in the Barista Stream, alongside significant neural speech-tracking of the Narrative Stream. These results suggest that humans have substantial capacity to process simultaneous speech and do not suffer from insufficient processing resources, at least for this highly ecological task-combination and level of perceptual load. Results also confirmed the ecological validity of the advantage for detecting ones’ own name at the behavioral, neural, and physiological level, highlighting the contribution of personal relevance when processing simultaneous speech.
Collapse
Affiliation(s)
- Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Maya Kaufman
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Adi Brown
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| |
Collapse
|
46
|
Cantiani C, Dondena C, Molteni M, Riva V, Piazza C. Synchronizing with the rhythm: Infant neural entrainment to complex musical and speech stimuli. Front Psychol 2022; 13:944670. [PMID: 36337544 PMCID: PMC9635850 DOI: 10.3389/fpsyg.2022.944670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 09/22/2022] [Indexed: 11/14/2022] Open
Abstract
Neural entrainment is defined as the process whereby brain activity, and more specifically neuronal oscillations measured by EEG, synchronize with exogenous stimulus rhythms. Despite the importance that neural oscillations have assumed in recent years in the field of auditory neuroscience and speech perception, in human infants the oscillatory brain rhythms and their synchronization with complex auditory exogenous rhythms are still relatively unexplored. In the present study, we investigate infant neural entrainment to complex non-speech (musical) and speech rhythmic stimuli; we provide a developmental analysis to explore potential similarities and differences between infants' and adults' ability to entrain to the stimuli; and we analyze the associations between infants' neural entrainment measures and the concurrent level of development. 25 8-month-old infants were included in the study. Their EEG signals were recorded while they passively listened to non-speech and speech rhythmic stimuli modulated at different rates. In addition, Bayley Scales were administered to all infants to assess their cognitive, language, and social-emotional development. Neural entrainment to the incoming rhythms was measured in the form of peaks emerging from the EEG spectrum at frequencies corresponding to the rhythm envelope. Analyses of the EEG spectrum revealed clear responses above the noise floor at frequencies corresponding to the rhythm envelope, suggesting that - similarly to adults - infants at 8 months of age were capable of entraining to the incoming complex auditory rhythms. Infants' measures of neural entrainment were associated with concurrent measures of cognitive and social-emotional development.
Collapse
Affiliation(s)
- Chiara Cantiani
- Child Psychopathology Unit, Scientific Institute, IRCCS Eugenio Medea, Lecco, Italy
| | - Chiara Dondena
- Child Psychopathology Unit, Scientific Institute, IRCCS Eugenio Medea, Lecco, Italy
| | - Massimo Molteni
- Child Psychopathology Unit, Scientific Institute, IRCCS Eugenio Medea, Lecco, Italy
| | - Valentina Riva
- Child Psychopathology Unit, Scientific Institute, IRCCS Eugenio Medea, Lecco, Italy
| | - Caterina Piazza
- Bioengineering Lab, Scientific Institute, IRCCS Eugenio Medea, Lecco, Italy
| |
Collapse
|
47
|
Pérez-Navarro J, Lallier M, Clark C, Flanagan S, Goswami U. Local Temporal Regularities in Child-Directed Speech in Spanish. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3776-3788. [PMID: 36194778 DOI: 10.1044/2022_jslhr-22-00111] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
PURPOSE The purpose of this study is to characterize the local (utterance-level) temporal regularities of child-directed speech (CDS) that might facilitate phonological development in Spanish, classically termed a syllable-timed language. METHOD Eighteen female adults addressed their 4-year-old children versus other adults spontaneously and also read aloud (CDS vs. adult-directed speech [ADS]). We compared CDS and ADS speech productions using a spectrotemporal model (Leong & Goswami, 2015), obtaining three temporal metrics: (a) distribution of modulation energy, (b) temporal regularity of stressed syllables, and (c) syllable rate. RESULTS CDS was characterized by (a) significantly greater modulation energy in the lower frequencies (0.5-4 Hz), (b) more regular rhythmic occurrence of stressed syllables, and (c) a slower syllable rate than ADS, across both spontaneous and read conditions. DISCUSSION CDS is characterized by a robust local temporal organization (i.e., within utterances) with amplitude modulation bands aligning with delta and theta electrophysiological frequency bands, respectively, showing greater phase synchronization than in ADS, facilitating parsing of stress units and syllables. These temporal regularities, together with the slower rate of production of CDS, might support the automatic extraction of phonological units in speech and hence support the phonological development of children. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21210893.
Collapse
Affiliation(s)
- Jose Pérez-Navarro
- BCBL, Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- University of the Basque Country UPV/EHU, Donostia-San Sebastián, Spain
| | - Marie Lallier
- BCBL, Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
| | - Catherine Clark
- BCBL, Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- University of the Basque Country UPV/EHU, Donostia-San Sebastián, Spain
| | - Sheila Flanagan
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, United Kingdom
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, United Kingdom
| |
Collapse
|
48
|
Wang S, Zhang X, Zhang J, Zong C. A synchronized multimodal neuroimaging dataset for studying brain language processing. Sci Data 2022; 9:590. [PMID: 36180444 PMCID: PMC9525723 DOI: 10.1038/s41597-022-01708-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 08/22/2022] [Indexed: 11/15/2022] Open
Abstract
We present a synchronized multimodal neuroimaging dataset for studying brain language processing (SMN4Lang) that contains functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) data on the same 12 healthy volunteers while the volunteers listened to 6 hours of naturalistic stories, as well as high-resolution structural (T1, T2), diffusion MRI and resting-state fMRI data for each participant. We also provide rich linguistic annotations for the stimuli, including word frequencies, syntactic tree structures, time-aligned characters and words, and various types of word and character embeddings. Quality assessment indicators verify that this is a high-quality neuroimaging dataset. Such synchronized data is separately collected by the same group of participants first listening to story materials in fMRI and then in MEG which are well suited to studying the dynamic processing of language comprehension, such as the time and location of different linguistic features encoded in the brain. In addition, this dataset, comprising a large vocabulary from stories with various topics, can serve as a brain benchmark to evaluate and improve computational language models. Measurement(s) | functional brain measurement • Magnetoencephalography | Technology Type(s) | Functional Magnetic Resonance Imaging • Magnetoencephalography | Factor Type(s) | naturalistic stimuli listening | Sample Characteristic - Organism | humanbeings |
Collapse
Affiliation(s)
- Shaonan Wang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China. .,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
| | - Xiaohan Zhang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Jiajun Zhang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Chengqing Zong
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
49
|
Brown JA, Bidelman GM. Familiarity of Background Music Modulates the Cortical Tracking of Target Speech at the "Cocktail Party". Brain Sci 2022; 12:brainsci12101320. [PMID: 36291252 PMCID: PMC9599198 DOI: 10.3390/brainsci12101320] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 09/23/2022] [Accepted: 09/27/2022] [Indexed: 11/23/2022] Open
Abstract
The "cocktail party" problem-how a listener perceives speech in noisy environments-is typically studied using speech (multi-talker babble) or noise maskers. However, realistic cocktail party scenarios often include background music (e.g., coffee shops, concerts). Studies investigating music's effects on concurrent speech perception have predominantly used highly controlled synthetic music or shaped noise, which do not reflect naturalistic listening environments. Behaviorally, familiar background music and songs with vocals/lyrics inhibit concurrent speech recognition. Here, we investigated the neural bases of these effects. While recording multichannel EEG, participants listened to an audiobook while popular songs (or silence) played in the background at a 0 dB signal-to-noise ratio. Songs were either familiar or unfamiliar to listeners and featured either vocals or isolated instrumentals from the original audio recordings. Comprehension questions probed task engagement. We used temporal response functions (TRFs) to isolate cortical tracking to the target speech envelope and analyzed neural responses around 100 ms (i.e., auditory N1 wave). We found that speech comprehension was, expectedly, impaired during background music compared to silence. Target speech tracking was further hindered by the presence of vocals. When masked by familiar music, response latencies to speech were less susceptible to informational masking, suggesting concurrent neural tracking of speech was easier during music known to the listener. These differential effects of music familiarity were further exacerbated in listeners with less musical ability. Our neuroimaging results and their dependence on listening skills are consistent with early attentional-gain mechanisms where familiar music is easier to tune out (listeners already know the song's expectancies) and thus can allocate fewer attentional resources to the background music to better monitor concurrent speech material.
Collapse
Affiliation(s)
- Jane A. Brown
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN 38152, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA
| | - Gavin M. Bidelman
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN 38152, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN 47408, USA
- Program in Neuroscience, Indiana University, Bloomington, IN 47405, USA
- Correspondence:
| |
Collapse
|
50
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|