1
|
Hausfeld L, Hamers IMH, Formisano E. FMRI speech tracking in primary and non-primary auditory cortex while listening to noisy scenes. Commun Biol 2024; 7:1217. [PMID: 39349723 PMCID: PMC11442455 DOI: 10.1038/s42003-024-06913-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/17/2024] [Indexed: 10/04/2024] Open
Abstract
Invasive and non-invasive electrophysiological measurements during "cocktail-party"-like listening indicate that neural activity in the human auditory cortex (AC) "tracks" the envelope of relevant speech. However, due to limited coverage and/or spatial resolution, the distinct contribution of primary and non-primary areas remains unclear. Here, using 7-Tesla fMRI, we measured brain responses of participants attending to one speaker, in the presence and absence of another speaker. Through voxel-wise modeling, we observed envelope tracking in bilateral Heschl's gyrus (HG), right middle superior temporal sulcus (mSTS) and left temporo-parietal junction (TPJ), despite the signal's sluggish nature and slow temporal sampling. Neurovascular activity correlated positively (HG) or negatively (mSTS, TPJ) with the envelope. Further analyses comparing the similarity between spatial response patterns in the single speaker and concurrent speakers conditions and envelope decoding indicated that tracking in HG reflected both relevant and (to a lesser extent) non-relevant speech, while mSTS represented the relevant speech signal. Additionally, in mSTS, the similarity strength correlated with the comprehension of relevant speech. These results indicate that the fMRI signal tracks cortical responses and attention effects related to continuous speech and support the notion that primary and non-primary AC process ongoing speech in a push-pull of acoustic and linguistic information.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands.
- Maastricht Brain Imaging Centre, 6200 MD, Maastricht, The Netherlands.
| | - Iris M H Hamers
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands
- Department of Biomedical Sciences of Cells & Systems, Section Cognitive Neurosciences, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands
- Maastricht Brain Imaging Centre, 6200 MD, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology, Faculty of Science and Engineering, 6200 MD, Maastricht, The Netherlands
| |
Collapse
|
2
|
Baus C, Millan I, Chen XJ, Blanco-Elorrieta E. Exploring the Interplay Between Language Comprehension and Cortical Tracking: The Bilingual Test Case. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:484-496. [PMID: 38911463 PMCID: PMC11192516 DOI: 10.1162/nol_a_00141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/04/2024] [Indexed: 06/25/2024]
Abstract
Cortical tracking, the synchronization of brain activity to linguistic rhythms is a well-established phenomenon. However, its nature has been heavily contested: Is it purely epiphenomenal or does it play a fundamental role in speech comprehension? Previous research has used intelligibility manipulations to examine this topic. Here, we instead varied listeners' language comprehension skills while keeping the auditory stimulus constant. To do so, we tested 22 native English speakers and 22 Spanish/Catalan bilinguals learning English as a second language (SL) in an EEG cortical entrainment experiment and correlated the responses with the magnitude of the N400 component of a semantic comprehension task. As expected, native listeners effectively tracked sentential, phrasal, and syllabic linguistic structures. In contrast, SL listeners exhibited limitations in tracking sentential structures but successfully tracked phrasal and syllabic rhythms. Importantly, the amplitude of the neural entrainment correlated with the amplitude of the detection of semantic incongruities in SLs, showing a direct connection between tracking and the ability to understand speech. Together, these findings shed light on the interplay between language comprehension and cortical tracking, to identify neural entrainment as a fundamental principle for speech comprehension.
Collapse
Affiliation(s)
- Cristina Baus
- Department of Cognition, Development and Educational Psychology, University of Barcelona, Barcelona, Spain
- Institute of Neurosciences, University of Barcelona, Barcelona, Spain
| | | | | | - Esti Blanco-Elorrieta
- Department of Psychology, New York University, New York, NY, USA
- Department of Neural Science, New York University, New York, NY, USA
| |
Collapse
|
3
|
Simon A, Bech S, Loquet G, Østergaard J. Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. Eur J Neurosci 2024; 59:2059-2074. [PMID: 38303522 DOI: 10.1111/ejn.16265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 11/02/2023] [Accepted: 01/12/2024] [Indexed: 02/03/2024]
Abstract
Linear models are becoming increasingly popular to investigate brain activity in response to continuous and naturalistic stimuli. In the context of auditory perception, these predictive models can be 'encoding', when stimulus features are used to reconstruct brain activity, or 'decoding' when neural features are used to reconstruct the audio stimuli. These linear models are a central component of some brain-computer interfaces that can be integrated into hearing assistive devices (e.g., hearing aids). Such advanced neurotechnologies have been widely investigated when listening to speech stimuli but rarely when listening to music. Recent attempts at neural tracking of music show that the reconstruction performances are reduced compared with speech decoding. The present study investigates the performance of stimuli reconstruction and electroencephalogram prediction (decoding and encoding models) based on the cortical entrainment of temporal variations of the audio stimuli for both music and speech listening. Three hypotheses that may explain differences between speech and music stimuli reconstruction were tested to assess the importance of the speech-specific acoustic and linguistic factors. While the results obtained with encoding models suggest different underlying cortical processing between speech and music listening, no differences were found in terms of reconstruction of the stimuli or the cortical data. The results suggest that envelope-based linear modelling can be used to study both speech and music listening, despite the differences in the underlying cortical mechanisms.
Collapse
Affiliation(s)
- Adèle Simon
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Søren Bech
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Gérard Loquet
- Department of Audiology and Speech Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Jan Østergaard
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
| |
Collapse
|
4
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
5
|
Li Z, Zhang D. How does the human brain process noisy speech in real life? Insights from the second-person neuroscience perspective. Cogn Neurodyn 2024; 18:371-382. [PMID: 38699619 PMCID: PMC11061069 DOI: 10.1007/s11571-022-09924-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/20/2022] [Accepted: 12/19/2022] [Indexed: 01/07/2023] Open
Abstract
Comprehending speech with the existence of background noise is of great importance for human life. In the past decades, a large number of psychological, cognitive and neuroscientific research has explored the neurocognitive mechanisms of speech-in-noise comprehension. However, as limited by the low ecological validity of the speech stimuli and the experimental paradigm, as well as the inadequate attention on the high-order linguistic and extralinguistic processes, there remains much unknown about how the brain processes noisy speech in real-life scenarios. A recently emerging approach, i.e., the second-person neuroscience approach, provides a novel conceptual framework. It measures both of the speaker's and the listener's neural activities, and estimates the speaker-listener neural coupling with regarding of the speaker's production-related neural activity as a standardized reference. The second-person approach not only promotes the use of naturalistic speech but also allows for free communication between speaker and listener as in a close-to-life context. In this review, we first briefly review the previous discoveries about how the brain processes speech in noise; then, we introduce the principles and advantages of the second-person neuroscience approach and discuss its implications to unravel the linguistic and extralinguistic processes during speech-in-noise comprehension; finally, we conclude by proposing some critical issues and calls for more research interests in the second-person approach, which would further extend the present knowledge about how people comprehend speech in noise.
Collapse
Affiliation(s)
- Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, 100084 China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, 100084 China
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, 100084 China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, 100084 China
| |
Collapse
|
6
|
Ershaid H, Lizarazu M, McLaughlin D, Cooke M, Simantiraki O, Koutsogiannaki M, Lallier M. Contributions of listening effort and intelligibility to cortical tracking of speech in adverse listening conditions. Cortex 2024; 172:54-71. [PMID: 38215511 DOI: 10.1016/j.cortex.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/05/2023] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Cortical tracking of speech is vital for speech segmentation and is linked to speech intelligibility. However, there is no clear consensus as to whether reduced intelligibility leads to a decrease or an increase in cortical speech tracking, warranting further investigation of the factors influencing this relationship. One such factor is listening effort, defined as the cognitive resources necessary for speech comprehension, and reported to have a strong negative correlation with speech intelligibility. Yet, no studies have examined the relationship between speech intelligibility, listening effort, and cortical tracking of speech. The aim of the present study was thus to examine these factors in quiet and distinct adverse listening conditions. Forty-nine normal hearing adults listened to sentences produced casually, presented in quiet and two adverse listening conditions: cafeteria noise and reverberant speech. Electrophysiological responses were registered with electroencephalogram, and listening effort was estimated subjectively using self-reported scores and objectively using pupillometry. Results indicated varying impacts of adverse conditions on intelligibility, listening effort, and cortical tracking of speech, depending on the preservation of the speech temporal envelope. The more distorted envelope in the reverberant condition led to higher listening effort, as reflected in higher subjective scores, increased pupil diameter, and stronger cortical tracking of speech in the delta band. These findings suggest that using measures of listening effort in addition to those of intelligibility is useful for interpreting cortical tracking of speech results. Moreover, reading and phonological skills of participants were positively correlated with listening effort in the cafeteria condition, suggesting a special role of expert language skills in processing speech in this noisy condition. Implications for future research and theories linking atypical cortical tracking of speech and reading disorders are further discussed.
Collapse
Affiliation(s)
- Hadeel Ershaid
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Mikel Lizarazu
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Drew McLaughlin
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Martin Cooke
- Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| | | | | | - Marie Lallier
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain; Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| |
Collapse
|
7
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
8
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
9
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
10
|
Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023; 21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open
Abstract
Models that predict brain responses to stimuli provide one measure of understanding of a sensory system and have many potential applications in science and engineering. Deep artificial neural networks have emerged as the leading such predictive models of the visual system but are less explored in audition. Prior work provided examples of audio-trained neural networks that produced good predictions of auditory cortical fMRI responses and exhibited correspondence between model stages and brain regions, but left it unclear whether these results generalize to other neural network models and, thus, how to further improve models in this domain. We evaluated model-brain correspondence for publicly available audio neural network models along with in-house models trained on 4 different tasks. Most tested models outpredicted standard spectromporal filter-bank models of auditory cortex and exhibited systematic model-brain correspondence: Middle stages best predicted primary auditory cortex, while deep stages best predicted non-primary cortex. However, some state-of-the-art models produced substantially worse brain predictions. Models trained to recognize speech in background noise produced better brain predictions than models trained to recognize speech in quiet, potentially because hearing in noise imposes constraints on biological auditory representations. The training task influenced the prediction quality for specific cortical tuning properties, with best overall predictions resulting from models trained on multiple tasks. The results generally support the promise of deep neural networks as models of audition, though they also indicate that current models do not explain auditory cortical responses in their entirety.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
- University of Rochester Medical Center, Rochester, New York, New York, United States of America
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds, and Machines, MIT, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
11
|
Zhang X, Li J, Li Z, Hong B, Diao T, Ma X, Nolte G, Engel AK, Zhang D. Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension. Neuroimage 2023; 282:120404. [PMID: 37806465 DOI: 10.1016/j.neuroimage.2023.120404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/19/2023] [Accepted: 10/05/2023] [Indexed: 10/10/2023] Open
Abstract
Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.
Collapse
Affiliation(s)
- Xinmiao Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Jiawei Li
- Department of Education and Psychology, Freie Universität Berlin, Berlin 14195, Federal Republic of Germany
| | - Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China; Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Tongxiang Diao
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Xin Ma
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
12
|
Wang B, Xu X, Niu Y, Wu C, Wu X, Chen J. EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners. Cereb Cortex 2023; 33:10972-10983. [PMID: 37750333 DOI: 10.1093/cercor/bhad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/21/2023] [Accepted: 08/22/2023] [Indexed: 09/27/2023] Open
Abstract
Auditory attention decoding (AAD) was used to determine the attended speaker during an auditory selective attention task. However, the auditory factors modulating AAD remained unclear for hearing-impaired (HI) listeners. In this study, scalp electroencephalogram (EEG) was recorded with an auditory selective attention paradigm, in which HI listeners were instructed to attend one of the two simultaneous speech streams with or without congruent visual input (articulation movements), and at a high or low target-to-masker ratio (TMR). Meanwhile, behavioral hearing tests (i.e. audiogram, speech reception threshold, temporal modulation transfer function) were used to assess listeners' individual auditory abilities. The results showed that both visual input and increasing TMR could significantly enhance the cortical tracking of the attended speech and AAD accuracy. Further analysis revealed that the audiovisual (AV) gain in attended speech cortical tracking was significantly correlated with listeners' auditory amplitude modulation (AM) sensitivity, and the TMR gain in attended speech cortical tracking was significantly correlated with listeners' hearing thresholds. Temporal response function analysis revealed that subjects with higher AM sensitivity demonstrated more AV gain over the right occipitotemporal and bilateral frontocentral scalp electrodes.
Collapse
Affiliation(s)
- Bo Wang
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Xiran Xu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Yadong Niu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Chao Wu
- School of Nursing, Peking University, Beijing 100191, China
| | - Xihong Wu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| | - Jing Chen
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| |
Collapse
|
13
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
14
|
Van Hirtum T, Somers B, Dieudonné B, Verschueren E, Wouters J, Francart T. Neural envelope tracking predicts speech intelligibility and hearing aid benefit in children with hearing loss. Hear Res 2023; 439:108893. [PMID: 37806102 DOI: 10.1016/j.heares.2023.108893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/01/2023] [Accepted: 09/27/2023] [Indexed: 10/10/2023]
Abstract
Early assessment of hearing aid benefit is crucial, as the extent to which hearing aids provide audible speech information predicts speech and language outcomes. A growing body of research has proposed neural envelope tracking as an objective measure of speech intelligibility, particularly for individuals unable to provide reliable behavioral feedback. However, its potential for evaluating speech intelligibility and hearing aid benefit in children with hearing loss remains unexplored. In this study, we investigated neural envelope tracking in children with permanent hearing loss through two separate experiments. EEG data were recorded while children listened to age-appropriate stories (Experiment 1) or an animated movie (Experiment 2) under aided and unaided conditions (using personal hearing aids) at multiple stimulus intensities. Neural envelope tracking was evaluated using a linear decoder reconstructing the speech envelope from the EEG in the delta band (0.5-4 Hz). Additionally, we calculated temporal response functions (TRFs) to investigate the spatio-temporal dynamics of the response. In both experiments, neural tracking increased with increasing stimulus intensity, but only in the unaided condition. In the aided condition, neural tracking remained stable across a wide range of intensities, as long as speech intelligibility was maintained. Similarly, TRF amplitudes increased with increasing stimulus intensity in the unaided condition, while in the aided condition significant differences were found in TRF latency rather than TRF amplitude. This suggests that decreasing stimulus intensity does not necessarily impact neural tracking. Furthermore, the use of personal hearing aids significantly enhanced neural envelope tracking, particularly in challenging speech conditions that would be inaudible when unaided. Finally, we found a strong correlation between neural envelope tracking and behaviorally measured speech intelligibility for both narrated stories (Experiment 1) and movie stimuli (Experiment 2). Altogether, these findings indicate that neural envelope tracking could be a valuable tool for predicting speech intelligibility benefits derived from personal hearing aids in hearing-impaired children. Incorporating narrated stories or engaging movies expands the accessibility of these methods even in clinical settings, offering new avenues for using objective speech measures to guide pediatric audiology decision-making.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Jan Wouters
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium.
| |
Collapse
|
15
|
An H, Lee J, Suh MW, Lim Y. Neural correlation of speech envelope tracking for background noise in normal hearing. Front Neurosci 2023; 17:1268591. [PMID: 37916182 PMCID: PMC10616241 DOI: 10.3389/fnins.2023.1268591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/04/2023] [Indexed: 11/03/2023] Open
Abstract
Everyday speech communication often occurs in environments with background noise, and the impact of noise on speech recognition can vary depending on factors such as noise type, noise intensity, and the listener's hearing ability. However, the extent to which neural mechanisms in speech understanding are influenced by different types and levels of noise remains unknown. This study aims to investigate whether individuals exhibit distinct neural responses and attention strategies depending on noise conditions. We recorded electroencephalography (EEG) data from 20 participants with normal hearing (13 males) and evaluated both neural tracking of speech envelopes and behavioral performance in speech understanding in the presence of varying types of background noise. Participants engaged in an EEG experiment consisting of two separate sessions. The first session involved listening to a 12-min story presented binaurally without any background noise. In the second session, speech understanding scores were measured using matrix sentences presented under speech-shaped noise (SSN) and Story noise background noise conditions at noise levels corresponding to sentence recognitions score (SRS). We observed differences in neural envelope correlation depending on noise type but not on its level. Interestingly, the impact of noise type on the variation in envelope tracking was more significant among participants with higher speech perception scores, while those with lower scores exhibited similarities in envelope correlation regardless of the noise condition. The findings suggest that even individuals with normal hearing could adopt different strategies to understand speech in challenging listening environments, depending on the type of noise.
Collapse
Affiliation(s)
- HyunJung An
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - JeeWon Lee
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
- Department of Electronic and Electrical Engineering, Ewha Womans University, Seoul, Republic of Korea
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - Yoonseob Lim
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul, Republic of Korea
| |
Collapse
|
16
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
17
|
Mohammadi Y, Graversen C, Østergaard J, Andersen OK, Reichenbach T. Phase-locking of Neural Activity to the Envelope of Speech in the Delta Frequency Band Reflects Differences between Word Lists and Sentences. J Cogn Neurosci 2023; 35:1301-1311. [PMID: 37379482 DOI: 10.1162/jocn_a_02016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
The envelope of a speech signal is tracked by neural activity in the cerebral cortex. The cortical tracking occurs mainly in two frequency bands, theta (4-8 Hz) and delta (1-4 Hz). Tracking in the faster theta band has been mostly associated with lower-level acoustic processing, such as the parsing of syllables, whereas the slower tracking in the delta band relates to higher-level linguistic information of words and word sequences. However, much regarding the more specific association between cortical tracking and acoustic as well as linguistic processing remains to be uncovered. Here, we recorded EEG responses to both meaningful sentences and random word lists in different levels of signal-to-noise ratios (SNRs) that lead to different levels of speech comprehension as well as listening effort. We then related the neural signals to the acoustic stimuli by computing the phase-locking value (PLV) between the EEG recordings and the speech envelope. We found that the PLV in the delta band increases with increasing SNR for sentences but not for the random word lists, showing that the PLV in this frequency band reflects linguistic information. When attempting to disentangle the effects of SNR, speech comprehension, and listening effort, we observed a trend that the PLV in the delta band might reflect listening effort rather than the other two variables, although the effect was not statistically significant. In summary, our study shows that the PLV in the delta band reflects linguistic information and might be related to listening effort.
Collapse
|
18
|
Tewarie PKB, Hindriks R, Lai YM, Sotiropoulos SN, Kringelbach M, Deco G. Non-reversibility outperforms functional connectivity in characterisation of brain states in MEG data. Neuroimage 2023; 276:120186. [PMID: 37268096 DOI: 10.1016/j.neuroimage.2023.120186] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/27/2023] [Accepted: 05/22/2023] [Indexed: 06/04/2023] Open
Abstract
Characterising brain states during tasks is common practice for many neuroscientific experiments using electrophysiological modalities such as electroencephalography (EEG) and magnetoencephalography (MEG). Brain states are often described in terms of oscillatory power and correlated brain activity, i.e. functional connectivity. It is, however, not unusual to observe weak task induced functional connectivity alterations in the presence of strong task induced power modulations using classical time-frequency representation of the data. Here, we propose that non-reversibility, or the temporal asymmetry in functional interactions, may be more sensitive to characterise task induced brain states than functional connectivity. As a second step, we explore causal mechanisms of non-reversibility in MEG data using whole brain computational models. We include working memory, motor, language tasks and resting-state data from participants of the Human Connectome Project (HCP). Non-reversibility is derived from the lagged amplitude envelope correlation (LAEC), and is based on asymmetry of the forward and reversed cross-correlations of the amplitude envelopes. Using random forests, we find that non-reversibility outperforms functional connectivity in the identification of task induced brain states. Non-reversibility shows especially better sensitivity to capture bottom-up gamma induced brain states across all tasks, but also alpha band associated brain states. Using whole brain computational models we find that asymmetry in the effective connectivity and axonal conduction delays play a major role in shaping non-reversibility across the brain. Our work paves the way for better sensitivity in characterising brain states during both bottom-up as well as top-down modulation in future neuroscientific experiments.
Collapse
Affiliation(s)
- Prejaas K B Tewarie
- Center for Brain and Cognition, Computational Neuroscience Group, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Spain; Clinical Neurophysiology Group, University of Twente, Enschede, The Netherlands; Department of Neurology, Amsterdam UMC, Amsterdam, the Netherlands; Sir Peter Mansfield Imaging Centre, School of Physics, University of Nottingham, Nottingham, United Kingdom.
| | - Rikkert Hindriks
- Department of Mathematics, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Yi Ming Lai
- Sir Peter Mansfield Imaging Centre, School of Medicine, University of Nottingham, United Kingdom
| | - Stamatios N Sotiropoulos
- Sir Peter Mansfield Imaging Centre, School of Medicine, University of Nottingham, United Kingdom; NIHR Biomedical Research Centre, University of Nottingham, Nottingham University Hospitals NHS Trust, Nottingham, UK
| | - Morten Kringelbach
- Centre for Eudaimonia and Human Flourishing, Linacre College, University of Oxford, Oxford, UK; Center for Music in the Brain, Department of Clinical Medicine, Aarhus University, Aarhus, Denmark; Department of Psychiatry, University of Oxford, Oxford, UK
| | - Gustavo Deco
- Center for Brain and Cognition, Computational Neuroscience Group, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Spain; Institució Catalana de la Recerca i Estudis Avançats (ICREA), Barcelona, Spain; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
19
|
Abbasi O, Steingräber N, Chalas N, Kluger DS, Gross J. Spatiotemporal dynamics characterise spectral connectivity profiles of continuous speaking and listening. PLoS Biol 2023; 21:e3002178. [PMID: 37478152 DOI: 10.1371/journal.pbio.3002178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/31/2023] [Indexed: 07/23/2023] Open
Abstract
Speech production and perception are fundamental processes of human cognition that both rely on intricate processing mechanisms that are still poorly understood. Here, we study these processes by using magnetoencephalography (MEG) to comprehensively map connectivity of regional brain activity within the brain and to the speech envelope during continuous speaking and listening. Our results reveal not only a partly shared neural substrate for both processes but also a dissociation in space, delay, and frequency. Neural activity in motor and frontal areas is coupled to succeeding speech in delta band (1 to 3 Hz), whereas coupling in the theta range follows speech in temporal areas during speaking. Neural connectivity results showed a separation of bottom-up and top-down signalling in distinct frequency bands during speaking. Here, we show that frequency-specific connectivity channels for bottom-up and top-down signalling support continuous speaking and listening. These findings further shed light on the complex interplay between different brain regions involved in speech production and perception.
Collapse
Affiliation(s)
- Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Nadine Steingräber
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
- Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
20
|
Karunathilake IMD, Dunlap JL, Perera J, Presacco A, Decruy L, Anderson S, Kuchinsky SE, Simon JZ. Effects of aging on cortical representations of continuous speech. J Neurophysiol 2023; 129:1359-1377. [PMID: 37096924 PMCID: PMC10202479 DOI: 10.1152/jn.00356.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/04/2023] [Accepted: 04/20/2023] [Indexed: 04/26/2023] Open
Abstract
Understanding speech in a noisy environment is crucial in day-to-day interactions and yet becomes more challenging with age, even for healthy aging. Age-related changes in the neural mechanisms that enable speech-in-noise listening have been investigated previously; however, the extent to which age affects the timing and fidelity of encoding of target and interfering speech streams is not well understood. Using magnetoencephalography (MEG), we investigated how continuous speech is represented in auditory cortex in the presence of interfering speech in younger and older adults. Cortical representations were obtained from neural responses that time-locked to the speech envelopes with speech envelope reconstruction and temporal response functions (TRFs). TRFs showed three prominent peaks corresponding to auditory cortical processing stages: early (∼50 ms), middle (∼100 ms), and late (∼200 ms). Older adults showed exaggerated speech envelope representations compared with younger adults. Temporal analysis revealed both that the age-related exaggeration starts as early as ∼50 ms and that older adults needed a substantially longer integration time window to achieve their better reconstruction of the speech envelope. As expected, with increased speech masking envelope reconstruction for the attended talker decreased and all three TRF peaks were delayed, with aging contributing additionally to the reduction. Interestingly, for older adults the late peak was delayed, suggesting that this late peak may receive contributions from multiple sources. Together these results suggest that there are several mechanisms at play compensating for age-related temporal processing deficits at several stages but which are not able to fully reestablish unimpaired speech perception.NEW & NOTEWORTHY We observed age-related changes in cortical temporal processing of continuous speech that may be related to older adults' difficulty in understanding speech in noise. These changes occur in both timing and strength of the speech representations at different cortical processing stages and depend on both noise condition and selective attention. Critically, their dependence on noise condition changes dramatically among the early, middle, and late cortical processing stages, underscoring how aging differentially affects these stages.
Collapse
Affiliation(s)
- I M Dushyanthi Karunathilake
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
| | - Jason L Dunlap
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Janani Perera
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Alessandro Presacco
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Stefanie E Kuchinsky
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland, United States
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
- Department of Biology, University of Maryland, College Park, Maryland, United States
| |
Collapse
|
21
|
Bellur A, Thakkar K, Elhilali M. Explicit-memory multiresolution adaptive framework for speech and music separation. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING 2023; 2023:20. [PMID: 37181589 PMCID: PMC10169896 DOI: 10.1186/s13636-023-00286-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 04/21/2023] [Indexed: 05/16/2023]
Abstract
The human auditory system employs a number of principles to facilitate the selection of perceptually separated streams from a complex sound mixture. The brain leverages multi-scale redundant representations of the input and uses memory (or priors) to guide the selection of a target sound from the input mixture. Moreover, feedback mechanisms refine the memory constructs resulting in further improvement of selectivity of a particular sound object amidst dynamic backgrounds. The present study proposes a unified end-to-end computational framework that mimics these principles for sound source separation applied to both speech and music mixtures. While the problems of speech enhancement and music separation have often been tackled separately due to constraints and specificities of each signal domain, the current work posits that common principles for sound source separation are domain-agnostic. In the proposed scheme, parallel and hierarchical convolutional paths map input mixtures onto redundant but distributed higher-dimensional subspaces and utilize the concept of temporal coherence to gate the selection of embeddings belonging to a target stream abstracted in memory. These explicit memories are further refined through self-feedback from incoming observations in order to improve the system's selectivity when faced with unknown backgrounds. The model yields stable outcomes of source separation for both speech and music mixtures and demonstrates benefits of explicit memory as a powerful representation of priors that guide information selection from complex inputs.
Collapse
Affiliation(s)
- Ashwin Bellur
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Karan Thakkar
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Mounya Elhilali
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| |
Collapse
|
22
|
Van Hirtum T, Somers B, Verschueren E, Dieudonné B, Francart T. Delta-band neural envelope tracking predicts speech intelligibility in noise in preschoolers. Hear Res 2023; 434:108785. [PMID: 37172414 DOI: 10.1016/j.heares.2023.108785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/24/2023] [Accepted: 05/05/2023] [Indexed: 05/15/2023]
Abstract
Behavioral tests are currently the gold standard in measuring speech intelligibility. However, these tests can be difficult to administer in young children due to factors such as motivation, linguistic knowledge and cognitive skills. It has been shown that measures of neural envelope tracking can be used to predict speech intelligibility and overcome these issues. However, its potential as an objective measure for speech intelligibility in noise remains to be investigated in preschool children. Here, we evaluated neural envelope tracking as a function of signal-to-noise ratio (SNR) in 14 5-year-old children. We examined EEG responses to natural, continuous speech presented at different SNRs ranging from -8 (very difficult) to 8 dB SNR (very easy). As expected delta band (0.5-4 Hz) tracking increased with increasing stimulus SNR. However, this increase was not strictly monotonic as neural tracking reached a plateau between 0 and 4 dB SNR, similarly to the behavioral speech intelligibility outcomes. These findings indicate that neural tracking in the delta band remains stable, as long as the acoustical degradation of the speech signal does not reflect significant changes in speech intelligibility. Theta band tracking (4-8 Hz), on the other hand, was found to be drastically reduced and more easily affected by noise in children, making it less reliable as a measure of speech intelligibility. By contrast, neural envelope tracking in the delta band was directly associated with behavioral measures of speech intelligibility. This suggests that neural envelope tracking in the delta band is a valuable tool for evaluating speech-in-noise intelligibility in preschoolers, highlighting its potential as an objective measure of speech in difficult-to-test populations.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium.
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| |
Collapse
|
23
|
Park JJ, Baek SC, Suh MW, Choi J, Kim SJ, Lim Y. The effect of topic familiarity and volatility of auditory scene on selective auditory attention. Hear Res 2023; 433:108770. [PMID: 37104990 DOI: 10.1016/j.heares.2023.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/06/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023]
Abstract
Selective auditory attention has been shown to modulate the cortical representation of speech. This effect has been well documented in acoustically more challenging environments. However, the influence of top-down factors, in particular topic familiarity, on this process remains unclear, despite evidence that semantic information can promote speech-in-noise perception. Apart from individual features forming a static listening condition, dynamic and irregular changes of auditory scenes-volatile listening environments-have been less studied. To address these gaps, we explored the influence of topic familiarity and volatile listening on the selective auditory attention process during dichotic listening using electroencephalography. When stories with unfamiliar topics were presented, participants' comprehension was severely degraded. However, their cortical activity selectively tracked the speech of the target story well. This implies that topic familiarity hardly influences the speech tracking neural index, possibly when the bottom-up information is sufficient. However, when the listening environment was volatile and the listeners had to re-engage in new speech whenever auditory scenes altered, the neural correlates of the attended speech were degraded. In particular, the cortical response to the attended speech and the spatial asymmetry of the response to the left and right attention were significantly attenuated around 100-200 ms after the speech onset. These findings suggest that volatile listening environments could adversely affect the modulation effect of selective attention, possibly by hampering proper attention due to increased perceptual load.
Collapse
Affiliation(s)
- Jonghwa Jeonglok Park
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul 03080, South Korea
| | - Jongsuk Choi
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of AI Robotics, KIST School, Korea University of Science and Technology, Seoul 02792, South Korea
| | - Sung June Kim
- Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, South Korea.
| |
Collapse
|
24
|
Willmore BDB, King AJ. Adaptation in auditory processing. Physiol Rev 2023; 103:1025-1058. [PMID: 36049112 PMCID: PMC9829473 DOI: 10.1152/physrev.00011.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Adaptation is an essential feature of auditory neurons, which reduces their responses to unchanging and recurring sounds and allows their response properties to be matched to the constantly changing statistics of sounds that reach the ears. As a consequence, processing in the auditory system highlights novel or unpredictable sounds and produces an efficient representation of the vast range of sounds that animals can perceive by continually adjusting the sensitivity and, to a lesser extent, the tuning properties of neurons to the most commonly encountered stimulus values. Together with attentional modulation, adaptation to sound statistics also helps to generate neural representations of sound that are tolerant to background noise and therefore plays a vital role in auditory scene analysis. In this review, we consider the diverse forms of adaptation that are found in the auditory system in terms of the processing levels at which they arise, the underlying neural mechanisms, and their impact on neural coding and perception. We also ask what the dynamics of adaptation, which can occur over multiple timescales, reveal about the statistical properties of the environment. Finally, we examine how adaptation to sound statistics is influenced by learning and experience and changes as a result of aging and hearing loss.
Collapse
Affiliation(s)
- Ben D. B. Willmore
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Andrew J. King
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
25
|
Kösem A, Dai B, McQueen JM, Hagoort P. Neural tracking of speech envelope does not unequivocally reflect intelligibility. Neuroimage 2023; 272:120040. [PMID: 36935084 DOI: 10.1016/j.neuroimage.2023.120040] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 03/13/2023] [Accepted: 03/15/2023] [Indexed: 03/19/2023] Open
Abstract
During listening, brain activity tracks the rhythmic structures of speech signals. Here, we directly dissociated the contribution of neural envelope tracking in the processing of speech acoustic cues from that related to linguistic processing. We examined the neural changes associated with the comprehension of Noise-Vocoded (NV) speech using magnetoencephalography (MEG). Participants listened to NV sentences in a 3-phase training paradigm: (1) pre-training, where NV stimuli were barely comprehended, (2) training with exposure of the original clear version of speech stimulus, and (3) post-training, where the same stimuli gained intelligibility from the training phase. Using this paradigm, we tested if the neural responses of a speech signal was modulated by its intelligibility without any change in its acoustic structure. To test the influence of spectral degradation on neural envelope tracking independently of training, participants listened to two types of NV sentences (4-band and 2-band NV speech), but were only trained to understand 4-band NV speech. Significant changes in neural tracking were observed in the delta range in relation to the acoustic degradation of speech. However, we failed to find a direct effect of intelligibility on the neural tracking of speech envelope in both theta and delta ranges, in both auditory regions-of-interest and whole-brain sensor-space analyses. This suggests that acoustics greatly influence the neural tracking response to speech envelope, and that caution needs to be taken when choosing the control signals for speech-brain tracking analyses, considering that a slight change in acoustic parameters can have strong effects on the neural tracking response.
Collapse
Affiliation(s)
- Anne Kösem
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands; Lyon Neuroscience Research Center (CRNL), CoPhy Team, INSERM U1028, 69500 Bron, France.
| | - Bohan Dai
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| | - James M McQueen
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, 6500 AH Nijmegen, The Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
26
|
De Clercq P, Vanthornhout J, Vandermosten M, Francart T. Beyond linear neural envelope tracking: a mutual information approach. J Neural Eng 2023; 20. [PMID: 36812597 DOI: 10.1088/1741-2552/acbe1d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 02/22/2023] [Indexed: 02/24/2023]
Abstract
Objective.The human brain tracks the temporal envelope of speech, which contains essential cues for speech understanding. Linear models are the most common tool to study neural envelope tracking. However, information on how speech is processed can be lost since nonlinear relations are precluded. Analysis based on mutual information (MI), on the other hand, can detect both linear and nonlinear relations and is gradually becoming more popular in the field of neural envelope tracking. Yet, several different approaches to calculating MI are applied with no consensus on which approach to use. Furthermore, the added value of nonlinear techniques remains a subject of debate in the field. The present paper aims to resolve these open questions.Approach.We analyzed electroencephalography (EEG) data of participants listening to continuous speech and applied MI analyses and linear models.Main results.Comparing the different MI approaches, we conclude that results are most reliable and robust using the Gaussian copula approach, which first transforms the data to standard Gaussians. With this approach, the MI analysis is a valid technique for studying neural envelope tracking. Like linear models, it allows spatial and temporal interpretations of speech processing, peak latency analyses, and applications to multiple EEG channels combined. In a final analysis, we tested whether nonlinear components were present in the neural response to the envelope by first removing all linear components in the data. We robustly detected nonlinear components on the single-subject level using the MI analysis.Significance.We demonstrate that the human brain processes speech in a nonlinear way. Unlike linear models, the MI analysis detects such nonlinear relations, proving its added value to neural envelope tracking. In addition, the MI analysis retains spatial and temporal characteristics of speech processing, an advantage lost when using more complex (nonlinear) deep neural networks.
Collapse
Affiliation(s)
- Pieter De Clercq
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
27
|
Aljarboa GS, Bell SL, Simpson DM. Detecting cortical responses to continuous running speech using EEG data from only one channel. Int J Audiol 2023; 62:199-208. [PMID: 35152811 DOI: 10.1080/14992027.2022.2035832] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVE To explore the detection of cortical responses to continuous speech using a single EEG channel. Particularly, to compare detection rates and times using a cross-correlation approach and parameters extracted from the temporal response function (TRF). DESIGN EEG from 32-channels were recorded whilst presenting 25-min continuous English speech. Detection parameters were cross-correlation between speech and EEG (XCOR), peak value and power of the TRF filter (TRF-peak and TRF-power), and correlation between predicted TRF and true EEG (TRF-COR). A bootstrap analysis was used to determine response statistical significance. Different electrode configurations were compared: Using single channels Cz or Fz, or selecting channels with the highest correlation value. STUDY SAMPLE Seventeen native English-speaking subjects with mild-to-moderate hearing loss. RESULTS Significant cortical responses were detected from all subjects at Fz channel with XCOR and TRF-COR. Lower detection time was seen for XCOR (mean = 4.8 min) over TRF parameters (best TRF-COR, mean = 6.4 min), with significant time differences from XCOR to TRF-peak and TRF-power. Analysing multiple EEG channels and testing channels with the highest correlation between envelope and EEG reduced detection sensitivity compared to Fz alone. CONCLUSIONS Cortical responses to continuous speech can be detected from a single channel with recording times that may be suitable for clinical application.
Collapse
Affiliation(s)
- Ghadah S Aljarboa
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK.,Communication Sciences, Princess Nora bint Abdul Rahman University, Riyadh, Saudi Arabia
| | - Steve L Bell
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK
| | - David M Simpson
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK
| |
Collapse
|
28
|
Chen YP, Schmidt F, Keitel A, Rösch S, Hauswald A, Weisz N. Speech intelligibility changes the temporal evolution of neural speech tracking. Neuroimage 2023; 268:119894. [PMID: 36693596 DOI: 10.1016/j.neuroimage.2023.119894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/13/2022] [Accepted: 01/20/2023] [Indexed: 01/22/2023] Open
Abstract
Listening to speech with poor signal quality is challenging. Neural speech tracking of degraded speech has been used to advance the understanding of how brain processes and speech intelligibility are interrelated. However, the temporal dynamics of neural speech tracking and their relation to speech intelligibility are not clear. In the present MEG study, we exploited temporal response functions (TRFs), which has been used to describe the time course of speech tracking on a gradient from intelligible to unintelligible degraded speech. In addition, we used inter-related facets of neural speech tracking (e.g., speech envelope reconstruction, speech-brain coherence, and components of broadband coherence spectra) to endorse our findings in TRFs. Our TRF analysis yielded marked temporally differential effects of vocoding: ∼50-110 ms (M50TRF), ∼175-230 ms (M200TRF), and ∼315-380 ms (M350TRF). Reduction of intelligibility went along with large increases of early peak responses M50TRF, but strongly reduced responses in M200TRF. In the late responses M350TRF, the maximum response occurred for degraded speech that was still comprehensible then declined with reduced intelligibility. Furthermore, we related the TRF components to our other neural "tracking" measures and found that M50TRF and M200TRF play a differential role in the shifting center frequency of the broadband coherence spectra. Overall, our study highlights the importance of time-resolved computation of neural speech tracking and decomposition of coherence spectra and provides a better understanding of degraded speech processing.
Collapse
Affiliation(s)
- Ya-Ping Chen
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria.
| | - Fabian Schmidt
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Anne Keitel
- Psychology, School of Social Sciences, University of Dundee, DD1 4HN Dundee, UK
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Paracelsus Medical University, 5020 Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, 5020 Salzburg, Austria
| |
Collapse
|
29
|
Mischler G, Keshishian M, Bickel S, Mehta AD, Mesgarani N. Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex. Neuroimage 2023; 266:119819. [PMID: 36529203 PMCID: PMC10510744 DOI: 10.1016/j.neuroimage.2022.119819] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/28/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments.
Collapse
Affiliation(s)
- Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States
| | - Menoua Keshishian
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States
| | - Stephan Bickel
- Hofstra Northwell School of Medicine, Manhasset, New York, United States
| | - Ashesh D Mehta
- Hofstra Northwell School of Medicine, Manhasset, New York, United States
| | - Nima Mesgarani
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States.
| |
Collapse
|
30
|
Niesen M, Bourguignon M, Bertels J, Vander Ghinst M, Wens V, Goldman S, De Tiège X. Cortical tracking of lexical speech units in a multi-talker background is immature in school-aged children. Neuroimage 2023; 265:119770. [PMID: 36462732 DOI: 10.1016/j.neuroimage.2022.119770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 11/09/2022] [Accepted: 11/23/2022] [Indexed: 12/03/2022] Open
Abstract
Children have more difficulty perceiving speech in noise than adults. Whether this difficulty relates to an immature processing of prosodic or linguistic elements of the attended speech is still unclear. To address the impact of noise on linguistic processing per se, we assessed how babble noise impacts the cortical tracking of intelligible speech devoid of prosody in school-aged children and adults. Twenty adults and twenty children (7-9 years) listened to synthesized French monosyllabic words presented at 2.5 Hz, either randomly or in 4-word hierarchical structures wherein 2 words formed a phrase at 1.25 Hz, and 2 phrases formed a sentence at 0.625 Hz, with or without babble noise. Neuromagnetic responses to words, phrases and sentences were identified and source-localized. Children and adults displayed significant cortical tracking of words in all conditions, and of phrases and sentences only when words formed meaningful sentences. In children compared with adults, the cortical tracking was lower for all linguistic units in conditions without noise. In the presence of noise, the cortical tracking was similarly reduced for sentence units in both groups, but remained stable for phrase units. Critically, when there was noise, adults increased the cortical tracking of monosyllabic words in the inferior frontal gyri and supratemporal auditory cortices but children did not. This study demonstrates that the difficulties of school-aged children in understanding speech in a multi-talker background might be partly due to an immature tracking of lexical but not supra-lexical linguistic units.
Collapse
Affiliation(s)
- Maxime Niesen
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Otorhinolaryngology, 1070 Brussels, Belgium.
| | - Mathieu Bourguignon
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), UNI-ULB Neuroscience Institute, Laboratory of Neurophysiology and Movement Biomechanics, 1070 Brussels, Belgium.; BCBL, Basque Center on Cognition, Brain and Language, 20009 San Sebastian, Spain
| | - Julie Bertels
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), UNI-ULB Neuroscience Institute, Cognition and Computation group, ULBabyLab - Consciousness, Brussels, Belgium
| | - Marc Vander Ghinst
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Otorhinolaryngology, 1070 Brussels, Belgium
| | - Vincent Wens
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of translational Neuroimaging, 1070 Brussels, Belgium
| | - Serge Goldman
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of Nuclear Medicine, 1070 Brussels, Belgium
| | - Xavier De Tiège
- Université libre de Bruxelles (ULB), UNI - ULB Neurosciences Institute, Laboratoire de Neuroanatomie et de Neuroimagerie translationnelles (LN2T), 1070 Brussels, Belgium; Université libre de Bruxelles (ULB), Hôpital Universitaire de Bruxelles (HUB), CUB Hôpital Erasme, Department of translational Neuroimaging, 1070 Brussels, Belgium
| |
Collapse
|
31
|
Understanding why infant-directed speech supports learning: A dynamic attention perspective. DEVELOPMENTAL REVIEW 2022. [DOI: 10.1016/j.dr.2022.101047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
32
|
Liu Y, Luo C, Zheng J, Liang J, Ding N. Working memory asymmetrically modulates auditory and linguistic processing of speech. Neuroimage 2022; 264:119698. [PMID: 36270622 DOI: 10.1016/j.neuroimage.2022.119698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/11/2022] [Accepted: 10/17/2022] [Indexed: 11/09/2022] Open
Abstract
Working memory load can modulate speech perception. However, since speech perception and working memory are both complex functions, it remains elusive how each component of the working memory system interacts with each speech processing stage. To investigate this issue, we concurrently measure how the working memory load modulates neural activity tracking three levels of linguistic units, i.e., syllables, phrases, and sentences, using a multiscale frequency-tagging approach. Participants engage in a sentence comprehension task and the working memory load is manipulated by asking them to memorize either auditory verbal sequences or visual patterns. It is found that verbal and visual working memory load modulate speech processing in similar manners: Higher working memory load attenuates neural activity tracking of phrases and sentences but enhances neural activity tracking of syllables. Since verbal and visual WM load similarly influence the neural responses to speech, such influences may derive from the domain-general component of WM system. More importantly, working memory load asymmetrically modulates lower-level auditory encoding and higher-level linguistic processing of speech, possibly reflecting reallocation of attention induced by mnemonic load.
Collapse
Affiliation(s)
- Yiguang Liu
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China
| | - Cheng Luo
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China
| | - Jing Zheng
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Junying Liang
- Department of Linguistics, School of International Studies, Zhejiang University, Hangzhou 310058, China
| | - Nai Ding
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China; Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou 310012, China.
| |
Collapse
|
33
|
Neurodevelopmental oscillatory basis of speech processing in noise. Dev Cogn Neurosci 2022; 59:101181. [PMID: 36549148 PMCID: PMC9792357 DOI: 10.1016/j.dcn.2022.101181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/31/2022] [Accepted: 11/25/2022] [Indexed: 11/27/2022] Open
Abstract
Humans' extraordinary ability to understand speech in noise relies on multiple processes that develop with age. Using magnetoencephalography (MEG), we characterize the underlying neuromaturational basis by quantifying how cortical oscillations in 144 participants (aged 5-27 years) track phrasal and syllabic structures in connected speech mixed with different types of noise. While the extraction of prosodic cues from clear speech was stable during development, its maintenance in a multi-talker background matured rapidly up to age 9 and was associated with speech comprehension. Furthermore, while the extraction of subtler information provided by syllables matured at age 9, its maintenance in noisy backgrounds progressively matured until adulthood. Altogether, these results highlight distinct behaviorally relevant maturational trajectories for the neuronal signatures of speech perception. In accordance with grain-size proposals, neuromaturational milestones are reached increasingly late for linguistic units of decreasing size, with further delays incurred by noise.
Collapse
|
34
|
Broderick MP, Zuk NJ, Anderson AJ, Lalor EC. More than words: Neurophysiological correlates of semantic dissimilarity depend on comprehension of the speech narrative. Eur J Neurosci 2022; 56:5201-5214. [PMID: 35993240 DOI: 10.1111/ejn.15805] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 08/15/2022] [Accepted: 08/18/2022] [Indexed: 12/14/2022]
Abstract
Speech comprehension relies on the ability to understand words within a coherent context. Recent studies have attempted to obtain electrophysiological indices of this process by modelling how brain activity is affected by a word's semantic dissimilarity to preceding words. Although the resulting indices appear robust and are strongly modulated by attention, it remains possible that, rather than capturing the contextual understanding of words, they may actually reflect word-to-word changes in semantic content without the need for a narrative-level understanding on the part of the listener. To test this, we recorded electroencephalography from subjects who listened to speech presented in either its original, narrative form, or after scrambling the word order by varying amounts. This manipulation affected the ability of subjects to comprehend the speech narrative but not the ability to recognise individual words. Neural indices of semantic understanding and low-level acoustic processing were derived for each scrambling condition using the temporal response function. Signatures of semantic processing were observed when speech was unscrambled or minimally scrambled and subjects understood the speech. The same markers were absent for higher scrambling levels as speech comprehension dropped. In contrast, word recognition remained high and neural measures related to envelope tracking did not vary significantly across scrambling conditions. This supports the previous claim that electrophysiological indices based on the semantic dissimilarity of words to their context reflect a listener's understanding of those words relative to that context. It also highlights the relative insensitivity of neural measures of low-level speech processing to speech comprehension.
Collapse
Affiliation(s)
- Michael P Broderick
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Nathaniel J Zuk
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Andrew J Anderson
- Del Monte Institute for Neuroscience, Department of Neuroscience, Department of Biomedical Engineering, University of Rochester, Rochester, New York, USA
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland.,Del Monte Institute for Neuroscience, Department of Neuroscience, Department of Biomedical Engineering, University of Rochester, Rochester, New York, USA
| |
Collapse
|
35
|
Verschueren E, Gillis M, Decruy L, Vanthornhout J, Francart T. Speech Understanding Oppositely Affects Acoustic and Linguistic Neural Tracking in a Speech Rate Manipulation Paradigm. J Neurosci 2022; 42:7442-7453. [PMID: 36041851 PMCID: PMC9525161 DOI: 10.1523/jneurosci.0259-22.2022] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 06/29/2022] [Accepted: 07/17/2022] [Indexed: 11/21/2022] Open
Abstract
When listening to continuous speech, the human brain can track features of the presented speech signal. It has been shown that neural tracking of acoustic features is a prerequisite for speech understanding and can predict speech understanding in controlled circumstances. However, the brain also tracks linguistic features of speech, which may be more directly related to speech understanding. We investigated acoustic and linguistic speech processing as a function of varying speech understanding by manipulating the speech rate. In this paradigm, acoustic and linguistic speech processing is affected simultaneously but in opposite directions: When the speech rate increases, more acoustic information per second is present. In contrast, the tracking of linguistic information becomes more challenging when speech is less intelligible at higher speech rates. We measured the EEG of 18 participants (4 male) who listened to speech at various speech rates. As expected and confirmed by the behavioral results, speech understanding decreased with increasing speech rate. Accordingly, linguistic neural tracking decreased with increasing speech rate, but acoustic neural tracking increased. This indicates that neural tracking of linguistic representations can capture the gradual effect of decreasing speech understanding. In addition, increased acoustic neural tracking does not necessarily imply better speech understanding. This suggests that, although more challenging to measure because of the low signal-to-noise ratio, linguistic neural tracking may be a more direct predictor of speech understanding.SIGNIFICANCE STATEMENT An increasingly popular method to investigate neural speech processing is to measure neural tracking. Although much research has been done on how the brain tracks acoustic speech features, linguistic speech features have received less attention. In this study, we disentangled acoustic and linguistic characteristics of neural speech tracking via manipulating the speech rate. A proper way of objectively measuring auditory and language processing paves the way toward clinical applications: An objective measure of speech understanding would allow for behavioral-free evaluation of speech understanding, which allows to evaluate hearing loss and adjust hearing aids based on brain responses. This objective measure would benefit populations from whom obtaining behavioral measures may be complex, such as young children or people with cognitive impairments.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Marlies Gillis
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742
| | - Jonas Vanthornhout
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Tom Francart
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| |
Collapse
|
36
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
37
|
Andreeva IG, Ogorodnikova EA. Auditory Adaptation to Speech Signal Characteristics. J EVOL BIOCHEM PHYS+ 2022. [DOI: 10.1134/s0022093022050027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
38
|
Na Y, Joo H, Trang LT, Quan LDA, Woo J. Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses. Front Neurosci 2022; 16:906616. [PMID: 36061597 PMCID: PMC9433707 DOI: 10.3389/fnins.2022.906616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 07/25/2022] [Indexed: 11/29/2022] Open
Abstract
Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models’ informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.
Collapse
Affiliation(s)
- Youngmin Na
- Department of Biomedical Engineering, University of Ulsan, Ulsan, South Korea
| | - Hyosung Joo
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea
| | - Le Thi Trang
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea
| | - Luong Do Anh Quan
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea
| | - Jihwan Woo
- Department of Biomedical Engineering, University of Ulsan, Ulsan, South Korea
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea
- *Correspondence: Jihwan Woo,
| |
Collapse
|
39
|
Li Z, Hong B, Wang D, Nolte G, Engel AK, Zhang D. Speaker-listener neural coupling reveals a right-lateralized mechanism for non-native speech-in-noise comprehension. Cereb Cortex 2022; 33:3701-3714. [PMID: 35975617 DOI: 10.1093/cercor/bhac302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/08/2022] [Accepted: 07/09/2022] [Indexed: 11/14/2022] Open
Abstract
While the increasingly globalized world has brought more and more demands for non-native language communication, the prevalence of background noise in everyday life poses a great challenge to non-native speech comprehension. The present study employed an interbrain approach based on functional near-infrared spectroscopy (fNIRS) to explore how people adapt to comprehend non-native speech information in noise. A group of Korean participants who acquired Chinese as their non-native language was invited to listen to Chinese narratives at 4 noise levels (no noise, 2 dB, -6 dB, and - 9 dB). These narratives were real-life stories spoken by native Chinese speakers. Processing of the non-native speech was associated with significant fNIRS-based listener-speaker neural couplings mainly over the right hemisphere at both the listener's and the speaker's sides. More importantly, the neural couplings from the listener's right superior temporal gyrus, the right middle temporal gyrus, as well as the right postcentral gyrus were found to be positively correlated with their individual comprehension performance at the strongest noise level (-9 dB). These results provide interbrain evidence in support of the right-lateralized mechanism for non-native speech processing and suggest that both an auditory-based and a sensorimotor-based mechanism contributed to the non-native speech-in-noise comprehension.
Collapse
Affiliation(s)
- Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China.,Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China.,Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Daifa Wang
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, 20246 Hamburg, Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, 20246 Hamburg, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China.,Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| |
Collapse
|
40
|
Brodbeck C, Simon JZ. Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Front Neurosci 2022; 16:828546. [PMID: 36003957 PMCID: PMC9393379 DOI: 10.3389/fnins.2022.828546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker's fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker's speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
41
|
Peter V, van Ommen S, Kalashnikova M, Mazuka R, Nazzi T, Burnham D. Language specificity in cortical tracking of speech rhythm at the mora, syllable, and foot levels. Sci Rep 2022; 12:13477. [PMID: 35931787 PMCID: PMC9356059 DOI: 10.1038/s41598-022-17401-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 07/25/2022] [Indexed: 11/29/2022] Open
Abstract
Recent research shows that adults' neural oscillations track the rhythm of the speech signal. However, the extent to which this tracking is driven by the acoustics of the signal, or by language-specific processing remains unknown. Here adult native listeners of three rhythmically different languages (English, French, Japanese) were compared on their cortical tracking of speech envelopes synthesized in their three native languages, which allowed for coding at each of the three language's dominant rhythmic unit, respectively the foot (2.5 Hz), syllable (5 Hz), or mora (10 Hz) level. The three language groups were also tested with a sequence in a non-native language, Polish, and a non-speech vocoded equivalent, to investigate possible differential speech/nonspeech processing. The results first showed that cortical tracking was most prominent at 5 Hz (syllable rate) for all three groups, but the French listeners showed enhanced tracking at 5 Hz compared to the English and the Japanese groups. Second, across groups, there were no differences in responses for speech versus non-speech at 5 Hz (syllable rate), but there was better tracking for speech than for non-speech at 10 Hz (not the syllable rate). Together these results provide evidence for both language-general and language-specific influences on cortical tracking.
Collapse
Affiliation(s)
- Varghese Peter
- MARCS Institute for Brain Behaviour and Development, Western Sydney University, Penrith, NSW, Australia.
- School of Health and Behavioural Sciences, University of the Sunshine Coast, Sippy Downs, Australia.
| | - Sandrien van Ommen
- Integrative Neuroscience and Cognition Center, CNRS-Université Paris Cité, Paris, France
- Neurosciences Fondamentales, University of Geneva, Geneva, Switzerland
| | - Marina Kalashnikova
- MARCS Institute for Brain Behaviour and Development, Western Sydney University, Penrith, NSW, Australia
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Guipuzcoa, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, Bizcaya, Spain
| | - Reiko Mazuka
- Laboratory for Language Development, RIKEN Center for Brain Science, Saitama, Japan
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Thierry Nazzi
- Integrative Neuroscience and Cognition Center, CNRS-Université Paris Cité, Paris, France
| | - Denis Burnham
- MARCS Institute for Brain Behaviour and Development, Western Sydney University, Penrith, NSW, Australia
| |
Collapse
|
42
|
David W, Gransier R, Wouters J. Evaluation of phase-locking to parameterized speech envelopes. Front Neurol 2022; 13:852030. [PMID: 35989900 PMCID: PMC9382131 DOI: 10.3389/fneur.2022.852030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 06/29/2022] [Indexed: 12/04/2022] Open
Abstract
Humans rely on the temporal processing ability of the auditory system to perceive speech during everyday communication. The temporal envelope of speech is essential for speech perception, particularly envelope modulations below 20 Hz. In the literature, the neural representation of this speech envelope is usually investigated by recording neural phase-locked responses to speech stimuli. However, these phase-locked responses are not only associated with envelope modulation processing, but also with processing of linguistic information at a higher-order level when speech is comprehended. It is thus difficult to disentangle the responses into components from the acoustic envelope itself and the linguistic structures in speech (such as words, phrases and sentences). Another way to investigate neural modulation processing is to use sinusoidal amplitude-modulated stimuli at different modulation frequencies to obtain the temporal modulation transfer function. However, these transfer functions are considerably variable across modulation frequencies and individual listeners. To tackle the issues of both speech and sinusoidal amplitude-modulated stimuli, the recently introduced Temporal Speech Envelope Tracking (TEMPEST) framework proposed the use of stimuli with a distribution of envelope modulations. The framework aims to assess the brain's capability to process temporal envelopes in different frequency bands using stimuli with speech-like envelope modulations. In this study, we provide a proof-of-concept of the framework using stimuli with modulation frequency bands around the syllable and phoneme rate in natural speech. We evaluated whether the evoked phase-locked neural activity correlates with the speech-weighted modulation transfer function measured using sinusoidal amplitude-modulated stimuli in normal-hearing listeners. Since many studies on modulation processing employ different metrics and comparing their results is difficult, we included different power- and phase-based metrics and investigate how these metrics relate to each other. Results reveal a strong correspondence across listeners between the neural activity evoked by the speech-like stimuli and the activity evoked by the sinusoidal amplitude-modulated stimuli. Furthermore, strong correspondence was also apparent between each metric, facilitating comparisons between studies using different metrics. These findings indicate the potential of the TEMPEST framework to efficiently assess the neural capability to process temporal envelope modulations within a frequency band that is important for speech perception.
Collapse
Affiliation(s)
- Wouter David
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | | | | |
Collapse
|
43
|
Raghavendra S, Lee S, Chen F, Martin BA, Tan CT. Cortical Entrainment to Speech Produced by Cochlear Implant Users and Normal-Hearing Talkers. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:3577-3581. [PMID: 36085647 DOI: 10.1109/embc48229.2022.9871276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The perceived sound quality of speech produced by hard-of-hearing individuals greatly depends on the degree and configuration of their hearing loss. A cochlear implant (CI) may provide some compensation and auditory feedback to monitor/control speech production. However, to date, the speech produced by CI users is still different in quality from that produced by normal-hearing (NH) talkers. In this study, we attempted to address this difference by examining the cortical activity of NH listeners when listening to continuous speech produced by 8 CI talkers and 8 NH talkers. We utilized a discriminative model to decode and reconstruct the speech envelope from the single-trial electroencephalogram (EEG) recorded from scalp electrode in NH listeners when listening to continuous speech. The correlation coefficient between the reconstructed envelope and original speech envelope was computed as a metric to quantify the difference in response to the speech produced by CI and NH talkers. The same listeners were asked to rate the perceived sound quality of the speech as a behavioral sound quality assessment. Both behavioral perceived sound quality ratings and the cortical entrainment to speech envelope were higher for the speech set produced by NH talkers than for the speech set produced by CI talkers.
Collapse
|
44
|
Bai F, Meyer AS, Martin AE. Neural dynamics differentially encode phrases and sentences during spoken language comprehension. PLoS Biol 2022; 20:e3001713. [PMID: 35834569 PMCID: PMC9282610 DOI: 10.1371/journal.pbio.3001713] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/14/2022] [Indexed: 11/19/2022] Open
Abstract
Human language stands out in the natural world as a biological signal that uses a structured system to combine the meanings of small linguistic units (e.g., words) into larger constituents (e.g., phrases and sentences). However, the physical dynamics of speech (or sign) do not stand in a one-to-one relationship with the meanings listeners perceive. Instead, listeners infer meaning based on their knowledge of the language. The neural readouts of the perceptual and cognitive processes underlying these inferences are still poorly understood. In the present study, we used scalp electroencephalography (EEG) to compare the neural response to phrases (e.g., the red vase) and sentences (e.g., the vase is red), which were close in semantic meaning and had been synthesized to be physically indistinguishable. Differences in structure were well captured in the reorganization of neural phase responses in delta (approximately <2 Hz) and theta bands (approximately 2 to 7 Hz),and in power and power connectivity changes in the alpha band (approximately 7.5 to 13.5 Hz). Consistent with predictions from a computational model, sentences showed more power, more power connectivity, and more phase synchronization than phrases did. Theta-gamma phase-amplitude coupling occurred, but did not differ between the syntactic structures. Spectral-temporal response function (STRF) modeling revealed different encoding states for phrases and sentences, over and above the acoustically driven neural response. Our findings provide a comprehensive description of how the brain encodes and separates linguistic structures in the dynamics of neural responses. They imply that phase synchronization and strength of connectivity are readouts for the constituent structure of language. The results provide a novel basis for future neurophysiological research on linguistic structure representation in the brain, and, together with our simulations, support time-based binding as a mechanism of structure encoding in neural dynamics.
Collapse
Affiliation(s)
- Fan Bai
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Antje S. Meyer
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Andrea E. Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
45
|
Kachlicka M, Laffere A, Dick F, Tierney A. Slow phase-locked modulations support selective attention to sound. Neuroimage 2022; 252:119024. [PMID: 35231629 PMCID: PMC9133470 DOI: 10.1016/j.neuroimage.2022.119024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/16/2022] [Accepted: 02/19/2022] [Indexed: 11/16/2022] Open
Abstract
To make sense of complex soundscapes, listeners must select and attend to task-relevant streams while ignoring uninformative sounds. One possible neural mechanism underlying this process is alignment of endogenous oscillations with the temporal structure of the target sound stream. Such a mechanism has been suggested to mediate attentional modulation of neural phase-locking to the rhythms of attended sounds. However, such modulations are compatible with an alternate framework, where attention acts as a filter that enhances exogenously-driven neural auditory responses. Here we attempted to test several predictions arising from the oscillatory account by playing two tone streams varying across conditions in tone duration and presentation rate; participants attended to one stream or listened passively. Attentional modulation of the evoked waveform was roughly sinusoidal and scaled with rate, while the passive response did not. However, there was only limited evidence for continuation of modulations through the silence between sequences. These results suggest that attentionally-driven changes in phase alignment reflect synchronization of slow endogenous activity with the temporal structure of attended stimuli.
Collapse
Affiliation(s)
- Magdalena Kachlicka
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England; Division of Psychology & Language Sciences, UCL, Gower Street, London WC1E 6BT, England
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England.
| |
Collapse
|
46
|
Distracting Linguistic Information Impairs Neural Tracking of Attended Speech. CURRENT RESEARCH IN NEUROBIOLOGY 2022; 3:100043. [DOI: 10.1016/j.crneur.2022.100043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/27/2022] [Accepted: 05/24/2022] [Indexed: 11/20/2022] Open
|
47
|
Bröhl F, Keitel A, Kayser C. MEG Activity in Visual and Auditory Cortices Represents Acoustic Speech-Related Information during Silent Lip Reading. eNeuro 2022; 9:ENEURO.0209-22.2022. [PMID: 35728955 PMCID: PMC9239847 DOI: 10.1523/eneuro.0209-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 06/06/2022] [Indexed: 11/21/2022] Open
Abstract
Speech is an intrinsically multisensory signal, and seeing the speaker's lips forms a cornerstone of communication in acoustically impoverished environments. Still, it remains unclear how the brain exploits visual speech for comprehension. Previous work debated whether lip signals are mainly processed along the auditory pathways or whether the visual system directly implements speech-related processes. To probe this, we systematically characterized dynamic representations of multiple acoustic and visual speech-derived features in source localized MEG recordings that were obtained while participants listened to speech or viewed silent speech. Using a mutual-information framework we provide a comprehensive assessment of how well temporal and occipital cortices reflect the physically presented signals and unique aspects of acoustic features that were physically absent but may be critical for comprehension. Our results demonstrate that both cortices feature a functionally specific form of multisensory restoration: during lip reading, they reflect unheard acoustic features, independent of co-existing representations of the visible lip movements. This restoration emphasizes the unheard pitch signature in occipital cortex and the speech envelope in temporal cortex and is predictive of lip-reading performance. These findings suggest that when seeing the speaker's lips, the brain engages both visual and auditory pathways to support comprehension by exploiting multisensory correspondences between lip movements and spectro-temporal acoustic cues.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Bielefeld 33615, Germany
| | - Anne Keitel
- Psychology, University of Dundee, Dundee DD1 4HN, United Kingdom
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Bielefeld 33615, Germany
| |
Collapse
|
48
|
Jessica Tan SH, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: The Relationship between Cortical Tracking of Continuous Auditory-Visual Speech and Gaze Behaviour in Infants, Children and Adults. Neuroimage 2022; 256:119217. [PMID: 35436614 DOI: 10.1016/j.neuroimage.2022.119217] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 11/24/2022] Open
Abstract
An auditory-visual speech benefit, the benefit that visual speech cues bring to auditory speech perception, is experienced from early on in infancy and continues to be experienced to an increasing degree with age. While there is both behavioural and neurophysiological evidence for children and adults, only behavioural evidence exists for infants - as no neurophysiological study has provided a comprehensive examination of the auditory-visual speech benefit in infants. It is also surprising that most studies on auditory-visual speech benefit do not concurrently report looking behaviour especially since the auditory-visual speech benefit rests on the assumption that listeners attend to a speaker's talking face and that there are meaningful individual differences in looking behaviour. To address these gaps, we simultaneously recorded electroencephalographic (EEG) and eye-tracking data of 5-month-olds, 4-year-olds and adults as they were presented with a speaker in auditory-only (AO), visual-only (VO), and auditory-visual (AV) modes. Cortical tracking analyses that involved forward encoding models of the speech envelope revealed that there was an auditory-visual speech benefit [i.e., AV > (A+V)], evident in 5-month-olds and adults but not 4-year-olds. Examination of cortical tracking accuracy in relation to looking behaviour, showed that infants' relative attention to the speaker's mouth (vs. eyes) was positively correlated with cortical tracking accuracy of VO speech, whereas adults' attention to the display overall was negatively correlated with cortical tracking accuracy of VO speech. This study provides the first neurophysiological evidence of auditory-visual speech benefit in infants and our results suggest ways in which current models of speech processing can be fine-tuned.
Collapse
Affiliation(s)
- S H Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University.
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language; IKERBASQUE, Basque Foundation for Science
| | | | - Michael J Crosse
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University
| |
Collapse
|
49
|
Attaheri A, Panayiotou D, Phillips A, Ní Choisdealbha Á, Di Liberto GM, Rocha S, Brusini P, Mead N, Flanagan S, Olawole-Scott H, Goswami U. Cortical Tracking of Sung Speech in Adults vs Infants: A Developmental Analysis. Front Neurosci 2022; 16:842447. [PMID: 35495026 PMCID: PMC9039340 DOI: 10.3389/fnins.2022.842447] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 02/23/2022] [Indexed: 11/28/2022] Open
Abstract
Here we duplicate a neural tracking paradigm, previously published with infants (aged 4 to 11 months), with adult participants, in order to explore potential developmental similarities and differences in entrainment. Adults listened and watched passively as nursery rhymes were sung or chanted in infant-directed speech. Whole-head EEG (128 channels) was recorded, and cortical tracking of the sung speech in the delta (0.5–4 Hz), theta (4–8 Hz) and alpha (8–12 Hz) frequency bands was computed using linear decoders (multivariate Temporal Response Function models, mTRFs). Phase-amplitude coupling (PAC) was also computed to assess whether delta and theta phases temporally organize higher-frequency amplitudes for adults in the same pattern as found in the infant brain. Similar to previous infant participants, the adults showed significant cortical tracking of the sung speech in both delta and theta bands. However, the frequencies associated with peaks in stimulus-induced spectral power (PSD) in the two populations were different. PAC was also different in the adults compared to the infants. PAC was stronger for theta- versus delta- driven coupling in adults but was equal for delta- versus theta-driven coupling in infants. Adults also showed a stimulus-induced increase in low alpha power that was absent in infants. This may suggest adult recruitment of other cognitive processes, possibly related to comprehension or attention. The comparative data suggest that while infant and adult brains utilize essentially the same cortical mechanisms to track linguistic input, the operation of and interplay between these mechanisms may change with age and language experience.
Collapse
Affiliation(s)
- Adam Attaheri
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Adam Attaheri,
| | - Dimitris Panayiotou
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| | - Alessia Phillips
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| | - Áine Ní Choisdealbha
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| | - Giovanni M. Di Liberto
- School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Sinead Rocha
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| | - Perrine Brusini
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
- Institute of Population Health, University of Liverpool, Liverpool, United Kingdom
| | - Natasha Mead
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| | - Sheila Flanagan
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| | - Helen Olawole-Scott
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| | - Usha Goswami
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
50
|
Irsik VC, Johnsrude IS, Herrmann B. Neural Activity during Story Listening Is Synchronized across Individuals Despite Acoustic Masking. J Cogn Neurosci 2022; 34:933-950. [PMID: 35258555 DOI: 10.1162/jocn_a_01842] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Older people with hearing problems often experience difficulties understanding speech in the presence of background sound. As a result, they may disengage in social situations, which has been associated with negative psychosocial health outcomes. Measuring listening (dis)engagement during challenging listening situations has received little attention thus far. We recruit young, normal-hearing human adults (both sexes) and investigate how speech intelligibility and engagement during naturalistic story listening is affected by the level of acoustic masking (12-talker babble) at different signal-to-noise ratios (SNRs). In Experiment 1, we observed that word-report scores were above 80% for all but the lowest SNR (-3 dB SNR) we tested, at which performance dropped to 54%. In Experiment 2, we calculated intersubject correlation (ISC) using EEG data to identify dynamic spatial patterns of shared neural activity evoked by the stories. ISC has been used as a neural measure of participants' engagement with naturalistic materials. Our results show that ISC was stable across all but the lowest SNRs, despite reduced speech intelligibility. Comparing ISC and intelligibility demonstrated that word-report performance declined more strongly with decreasing SNR compared to ISC. Our measure of neural engagement suggests that individuals remain engaged in story listening despite missing words because of background noise. Our work provides a potentially fruitful approach to investigate listener engagement with naturalistic, spoken stories that may be used to investigate (dis)engagement in older adults with hearing impairment.
Collapse
Affiliation(s)
| | | | - Björn Herrmann
- The University of Western Ontario.,Rotman Research Institute, Toronto, ON, Canada.,University of Toronto
| |
Collapse
|