1
|
Liang M, Gerwien J, Gutschalk A. A listening advantage for native speech is reflected by attention-related activity in auditory cortex. Commun Biol 2025; 8:180. [PMID: 39910341 PMCID: PMC11799217 DOI: 10.1038/s42003-025-07601-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 01/24/2025] [Indexed: 02/07/2025] Open
Abstract
The listening advantage for native speech is well known, but the neural basis of the effect remains unknown. Here we test the hypothesis that attentional enhancement in auditory cortex is stronger for native speech, using magnetoencephalography. Chinese and German speech stimuli were recorded by a bilingual speaker and combined into a two-stream, cocktail-party scene, with consistent and inconsistent language combinations. A group of native speakers of Chinese and a group of native speakers of German performed a detection task in the cued target stream. Results show that attention enhances negative-going activity in the temporal response function deconvoluted from the speech envelope. This activity is stronger when the target stream is in the native compared to the non-native language, and for inconsistent compared to consistent language stimuli. We interpret the findings to show that the stronger activity for native speech could be related to better top-down prediction of the native speech streams.
Collapse
Affiliation(s)
- Meng Liang
- Department of Neurology, University of Heidelberg, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany
| | - Johannes Gerwien
- Institute of German as a Foreign Language Philology, University of Heidelberg, Plöck 55, 69117, Heidelberg, Germany
| | - Alexander Gutschalk
- Department of Neurology, University of Heidelberg, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany.
| |
Collapse
|
2
|
Iotzov I, Parra LC. Effects of Noise and Reward on Pupil Size and Electroencephalographic Speech Tracking in a Word-Detection Task. Eur J Neurosci 2025; 61:e70009. [PMID: 39939282 DOI: 10.1111/ejn.70009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 12/28/2024] [Accepted: 01/10/2025] [Indexed: 02/14/2025]
Abstract
Speech is hard to understand when there is background noise. Speech intelligibility and listening effort both affect our ability to understand speech, but the relative contribution of these factors is hard to disentangle. Previous studies suggest that speech intelligibility could be assessed with EEG speech tracking and listening effort via pupil size. However, these measures may be confounded, because poor intelligibility may require a larger effort. To address this, we developed a novel word-detection paradigm that allows for a rapid behavioural assessment of speech processing. In this paradigm, words appear on the screen during continuous speech, similar to closed captioning. In two listening experiments with a total of 51 participants, we manipulated intelligibility by changing signal-to-noise ratios (SNRs) and modulated effort by varying monetary reward. Increasing SNR improved detection performance along with EEG speech tracking. Additionally, we find that pupil size increases with increased SNR. Surprisingly, when we modulated both reward and SNR, we found that reward modulated only pupil size, whereas SNR modulated only EEG speech tracking. We interpret this as the effects of arousal and listening effort on pupil size and of intelligibility on EEG speech tracking. The experimental paradigm developed here may be beneficial when assessing hearing devices in terms of speech intelligibility and listening effort.
Collapse
Affiliation(s)
- Ivan Iotzov
- Department of Biomedical Engineering, City College of New York, New York, New York, USA
| | - Lucas C Parra
- Department of Biomedical Engineering, City College of New York, New York, New York, USA
| |
Collapse
|
3
|
Karunathilake ID, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are enhanced by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and a narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers, linguistic level late responses, derived from TRF components modulated by linguistic content, suggesting that these markers are indicative of speech comprehension rather than mere speech perception.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics, and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, 20742
| | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, 20742
| |
Collapse
|
4
|
Simon A, Bech S, Loquet G, Østergaard J. Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. Eur J Neurosci 2024; 59:2059-2074. [PMID: 38303522 DOI: 10.1111/ejn.16265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 11/02/2023] [Accepted: 01/12/2024] [Indexed: 02/03/2024]
Abstract
Linear models are becoming increasingly popular to investigate brain activity in response to continuous and naturalistic stimuli. In the context of auditory perception, these predictive models can be 'encoding', when stimulus features are used to reconstruct brain activity, or 'decoding' when neural features are used to reconstruct the audio stimuli. These linear models are a central component of some brain-computer interfaces that can be integrated into hearing assistive devices (e.g., hearing aids). Such advanced neurotechnologies have been widely investigated when listening to speech stimuli but rarely when listening to music. Recent attempts at neural tracking of music show that the reconstruction performances are reduced compared with speech decoding. The present study investigates the performance of stimuli reconstruction and electroencephalogram prediction (decoding and encoding models) based on the cortical entrainment of temporal variations of the audio stimuli for both music and speech listening. Three hypotheses that may explain differences between speech and music stimuli reconstruction were tested to assess the importance of the speech-specific acoustic and linguistic factors. While the results obtained with encoding models suggest different underlying cortical processing between speech and music listening, no differences were found in terms of reconstruction of the stimuli or the cortical data. The results suggest that envelope-based linear modelling can be used to study both speech and music listening, despite the differences in the underlying cortical mechanisms.
Collapse
Affiliation(s)
- Adèle Simon
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Søren Bech
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Gérard Loquet
- Department of Audiology and Speech Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Jan Østergaard
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
| |
Collapse
|
5
|
Panela RA, Copelli F, Herrmann B. Reliability and generalizability of neural speech tracking in younger and older adults. Neurobiol Aging 2024; 134:165-180. [PMID: 38103477 DOI: 10.1016/j.neurobiolaging.2023.11.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/09/2023] [Accepted: 11/16/2023] [Indexed: 12/19/2023]
Abstract
Neural tracking of spoken speech is considered a potential clinical biomarker for speech-processing difficulties, but the reliability of neural speech tracking is unclear. Here, younger and older adults listened to stories in two sessions while electroencephalography was recorded to investigate the reliability and generalizability of neural speech tracking. Speech tracking amplitude was larger for older than younger adults, consistent with an age-related loss of inhibition. The reliability of neural speech tracking was moderate (ICC ∼0.5-0.75) and tended to be higher for older adults. However, reliability was lower for speech tracking than for neural responses to noise bursts (ICC >0.8), which we used as a benchmark for maximum reliability. Neural speech tracking generalized moderately across different stories (ICC ∼0.5-0.6), which appeared greatest for audiobook-like stories spoken by the same person. Hence, a variety of stories could possibly be used for clinical assessments. Overall, the current data are important for developing a biomarker of speech processing but suggest that further work is needed to increase the reliability to meet clinical standards.
Collapse
Affiliation(s)
- Ryan A Panela
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada
| | - Francesca Copelli
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada
| | - Björn Herrmann
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada.
| |
Collapse
|
6
|
Gillis M, Vanthornhout J, Francart T. Heard or Understood? Neural Tracking of Language Features in a Comprehensible Story, an Incomprehensible Story and a Word List. eNeuro 2023; 10:ENEURO.0075-23.2023. [PMID: 37451862 DOI: 10.1523/eneuro.0075-23.2023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/21/2023] [Accepted: 06/25/2023] [Indexed: 07/18/2023] Open
Abstract
Speech comprehension is a complex neural process on which relies on activation and integration of multiple brain regions. In the current study, we evaluated whether speech comprehension can be investigated by neural tracking. Neural tracking is the phenomenon in which the brain responses time-lock to the rhythm of specific features in continuous speech. These features can be acoustic, i.e., acoustic tracking, or derived from the content of the speech using language properties, i.e., language tracking. We evaluated whether neural tracking of speech differs between a comprehensible story, an incomprehensible story, and a word list. We evaluated the neural responses to speech of 19 participants (six men). No significant difference regarding acoustic tracking was found. However, significant language tracking was only found for the comprehensible story. The most prominent effect was visible to word surprisal, a language feature at the word level. The neural response to word surprisal showed a prominent negativity between 300 and 400 ms, similar to the N400 in evoked response paradigms. This N400 was significantly more negative when the story was comprehended, i.e., when words could be integrated in the context of previous words. These results show that language tracking can capture the effect of speech comprehension.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| |
Collapse
|
7
|
Xiu B, Paul BT, Chen JM, Le TN, Lin VY, Dimitrijevic A. Neural responses to naturalistic audiovisual speech are related to listening demand in cochlear implant users. Front Hum Neurosci 2022; 16:1043499. [DOI: 10.3389/fnhum.2022.1043499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
There is a weak relationship between clinical and self-reported speech perception outcomes in cochlear implant (CI) listeners. Such poor correspondence may be due to differences in clinical and “real-world” listening environments and stimuli. Speech in the real world is often accompanied by visual cues, background environmental noise, and is generally in a conversational context, all factors that could affect listening demand. Thus, our objectives were to determine if brain responses to naturalistic speech could index speech perception and listening demand in CI users. Accordingly, we recorded high-density electroencephalogram (EEG) while CI users listened/watched a naturalistic stimulus (i.e., the television show, “The Office”). We used continuous EEG to quantify “speech neural tracking” (i.e., TRFs, temporal response functions) to the show’s soundtrack and 8–12 Hz (alpha) brain rhythms commonly related to listening effort. Background noise at three different signal-to-noise ratios (SNRs), +5, +10, and +15 dB were presented to vary the difficulty of following the television show, mimicking a natural noisy environment. The task also included an audio-only (no video) condition. After each condition, participants subjectively rated listening demand and the degree of words and conversations they felt they understood. Fifteen CI users reported progressively higher degrees of listening demand and less words and conversation with increasing background noise. Listening demand and conversation understanding in the audio-only condition was comparable to that of the highest noise condition (+5 dB). Increasing background noise affected speech neural tracking at a group level, in addition to eliciting strong individual differences. Mixed effect modeling showed that listening demand and conversation understanding were correlated to early cortical speech tracking, such that high demand and low conversation understanding occurred with lower amplitude TRFs. In the high noise condition, greater listening demand was negatively correlated to parietal alpha power, where higher demand was related to lower alpha power. No significant correlations were observed between TRF/alpha and clinical speech perception scores. These results are similar to previous findings showing little relationship between clinical speech perception and quality-of-life in CI users. However, physiological responses to complex natural speech may provide an objective measure of aspects of quality-of-life measures like self-perceived listening demand.
Collapse
|
8
|
Caucheteux C, Gramfort A, King JR. Deep language algorithms predict semantic comprehension from brain activity. Sci Rep 2022; 12:16327. [PMID: 36175483 PMCID: PMC9522791 DOI: 10.1038/s41598-022-20460-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 09/13/2022] [Indexed: 11/09/2022] Open
Abstract
Deep language algorithms, like GPT-2, have demonstrated remarkable abilities to process text, and now constitute the backbone of automatic translation, summarization and dialogue. However, whether these models encode information that relates to human comprehension still remains controversial. Here, we show that the representations of GPT-2 not only map onto the brain responses to spoken stories, but they also predict the extent to which subjects understand the corresponding narratives. To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear mapping model to predict brain activity from GPT-2's activations. Finally, we show that this mapping reliably correlates ([Formula: see text]) with subjects' comprehension scores as assessed for each story. This effect peaks in the angular, medial temporal and supra-marginal gyri, and is best accounted for by the long-distance dependencies generated in the deep layers of GPT-2. Overall, this study shows how deep language models help clarify the brain computations underlying language comprehension.
Collapse
Affiliation(s)
- Charlotte Caucheteux
- Meta AI Research, Paris, France.
- Université Paris-Saclay, Inria, CEA, Palaiseau, France.
| | | | - Jean-Rémi King
- Meta AI Research, Paris, France
- École normale supérieure, PSL University, CNRS, Paris, France
| |
Collapse
|