1
|
Wilroth J, Alickovic E, Skoglund MA, Signoret C, Rönnberg J, Enqvist M. Improving Tracking of Selective Attention in Hearing Aid Users: The Role of Noise Reduction and Nonlinearity Compensation. eNeuro 2025; 12:ENEURO.0275-24.2025. [PMID: 39880674 PMCID: PMC11839092 DOI: 10.1523/eneuro.0275-24.2025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 12/17/2024] [Accepted: 01/07/2025] [Indexed: 01/31/2025] Open
Abstract
Hearing impairment (HI) disrupts social interaction by hindering the ability to follow conversations in noisy environments. While hearing aids (HAs) with noise reduction (NR) partially address this, the "cocktail-party problem" persists, where individuals struggle to attend to specific voices amidst background noise. This study investigated how NR and an advanced signal processing method for compensating for nonlinearities in Electroencephalography (EEG) signals can improve neural speech processing in HI listeners. Participants wore HAs with NR, either activated or deactivated, while focusing on target speech amidst competing masker speech and background noise. Analysis focused on temporal response functions to assess neural tracking of relevant target and masker speech. Results revealed enhanced neural responses (N1 and P2) to target speech, particularly in frontal and central scalp regions, when NR was activated. Additionally, a novel method compensated for nonlinearities in EEG data, leading to improved signal-to-noise ratio (SNR) and potentially revealing more precise neural tracking of relevant speech. This effect was most prominent in the left-frontal scalp region. Importantly, NR activation significantly improved the effectiveness of this method, leading to stronger responses and reduced variance in EEG data and potentially revealing more precise neural tracking of relevant speech. This study provides valuable insights into the neural mechanisms underlying NR benefits and introduces a promising EEG analysis approach sensitive to NR effects, paving the way for potential improvements in HAs.
Collapse
Affiliation(s)
- Johanna Wilroth
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping 581 83, Sweden
| | - Emina Alickovic
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping 581 83, Sweden
- Eriksholm Research Centre, Snekkersten DK-3070, Denmark
| | - Martin A Skoglund
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping 581 83, Sweden
- Eriksholm Research Centre, Snekkersten DK-3070, Denmark
| | - Carine Signoret
- Disability Research Division, Linnaeus Centre HEAD, Department of Behavioural Sciences and Learning, Linköping University, Linköping 581 83, Sweden
| | - Jerker Rönnberg
- Disability Research Division, Linnaeus Centre HEAD, Department of Behavioural Sciences and Learning, Linköping University, Linköping 581 83, Sweden
| | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping 581 83, Sweden
| |
Collapse
|
2
|
Kulasingham JP, Innes-Brown H, Enqvist M, Alickovic E. Level-Dependent Subcortical Electroencephalography Responses to Continuous Speech. eNeuro 2024; 11:ENEURO.0135-24.2024. [PMID: 39142822 DOI: 10.1523/eneuro.0135-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 07/02/2024] [Accepted: 07/26/2024] [Indexed: 08/16/2024] Open
Abstract
The auditory brainstem response (ABR) is a measure of subcortical activity in response to auditory stimuli. The wave V peak of the ABR depends on the stimulus intensity level, and has been widely used for clinical hearing assessment. Conventional methods estimate the ABR average electroencephalography (EEG) responses to short unnatural stimuli such as clicks. Recent work has moved toward more ecologically relevant continuous speech stimuli using linear deconvolution models called temporal response functions (TRFs). Investigating whether the TRF waveform changes with stimulus intensity is a crucial step toward the use of natural speech stimuli for hearing assessments involving subcortical responses. Here, we develop methods to estimate level-dependent subcortical TRFs using EEG data collected from 21 participants listening to continuous speech presented at 4 different intensity levels. We find that level-dependent changes can be detected in the wave V peak of the subcortical TRF for almost all participants, and are consistent with level-dependent changes in click-ABR wave V. We also investigate the most suitable peripheral auditory model to generate predictors for level-dependent subcortical TRFs and find that simple gammatone filterbanks perform the best. Additionally, around 6 min of data may be sufficient for detecting level-dependent effects and wave V peaks above the noise floor for speech segments with higher intensity. Finally, we show a proof-of-concept that level-dependent subcortical TRFs can be detected even for the inherent intensity fluctuations in natural continuous speech.
Collapse
Affiliation(s)
- Joshua P Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, 581 83 Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, 581 83 Linköping, Sweden
| | - Emina Alickovic
- Automatic Control, Department of Electrical Engineering, Linköping University, 581 83 Linköping, Sweden
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark
| |
Collapse
|
3
|
Gehmacher Q, Schubert J, Schmidt F, Hartmann T, Reisinger P, Rösch S, Schwarz K, Popov T, Chait M, Weisz N. Eye movements track prioritized auditory features in selective attention to natural speech. Nat Commun 2024; 15:3692. [PMID: 38693186 PMCID: PMC11063150 DOI: 10.1038/s41467-024-48126-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/22/2024] [Indexed: 05/03/2024] Open
Abstract
Over the last decades, cognitive neuroscience has identified a distributed set of brain regions that are critical for attention. Strong anatomical overlap with brain regions critical for oculomotor processes suggests a joint network for attention and eye movements. However, the role of this shared network in complex, naturalistic environments remains understudied. Here, we investigated eye movements in relation to (un)attended sentences of natural speech. Combining simultaneously recorded eye tracking and magnetoencephalographic data with temporal response functions, we show that gaze tracks attended speech, a phenomenon we termed ocular speech tracking. Ocular speech tracking even differentiates a target from a distractor in a multi-speaker context and is further related to intelligibility. Moreover, we provide evidence for its contribution to neural differences in speech processing, emphasizing the necessity to consider oculomotor activity in future research and in the interpretation of neural differences in auditory cognition.
Collapse
Affiliation(s)
- Quirin Gehmacher
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria.
| | - Juliane Schubert
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Fabian Schmidt
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Thomas Hartmann
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Patrick Reisinger
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Head and Neck Surgery, Paracelsus Medical University Salzburg, 5020, Salzburg, Austria
| | | | - Tzvetan Popov
- Methods of Plasticity Research, Department of Psychology, University of Zurich, CH-8050, Zurich, Switzerland
- Department of Psychology, University of Konstanz, DE- 78464, Konstanz, Germany
| | - Maria Chait
- Ear Institute, University College London, London, UK
| | - Nathan Weisz
- Paris-Lodron-University of Salzburg, Department of Psychology, Centre for Cognitive Neuroscience, Salzburg, Austria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
4
|
Kulasingham JP, Bachmann FL, Eskelund K, Enqvist M, Innes-Brown H, Alickovic E. Predictors for estimating subcortical EEG responses to continuous speech. PLoS One 2024; 19:e0297826. [PMID: 38330068 PMCID: PMC10852227 DOI: 10.1371/journal.pone.0297826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 01/12/2024] [Indexed: 02/10/2024] Open
Abstract
Perception of sounds and speech involves structures in the auditory brainstem that rapidly process ongoing auditory stimuli. The role of these structures in speech processing can be investigated by measuring their electrical activity using scalp-mounted electrodes. However, typical analysis methods involve averaging neural responses to many short repetitive stimuli that bear little relevance to daily listening environments. Recently, subcortical responses to more ecologically relevant continuous speech were detected using linear encoding models. These methods estimate the temporal response function (TRF), which is a regression model that minimises the error between the measured neural signal and a predictor derived from the stimulus. Using predictors that model the highly non-linear peripheral auditory system may improve linear TRF estimation accuracy and peak detection. Here, we compare predictors from both simple and complex peripheral auditory models for estimating brainstem TRFs on electroencephalography (EEG) data from 24 participants listening to continuous speech. We also investigate the data length required for estimating subcortical TRFs, and find that around 12 minutes of data is sufficient for clear wave V peaks (>3 dB SNR) to be seen in nearly all participants. Interestingly, predictors derived from simple filterbank-based models of the peripheral auditory system yield TRF wave V peak SNRs that are not significantly different from those estimated using a complex model of the auditory nerve, provided that the nonlinear effects of adaptation in the auditory system are appropriately modelled. Crucially, computing predictors from these simpler models is more than 50 times faster compared to the complex model. This work paves the way for efficient modelling and detection of subcortical processing of continuous speech, which may lead to improved diagnosis metrics for hearing impairment and assistive hearing technology.
Collapse
Affiliation(s)
- Joshua P. Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | | | | | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
- Eriksholm Research Centre, Snekkersten, Denmark
| |
Collapse
|
5
|
Bachmann FL, Kulasingham JP, Eskelund K, Enqvist M, Alickovic E, Innes-Brown H. Extending Subcortical EEG Responses to Continuous Speech to the Sound-Field. Trends Hear 2024; 28:23312165241246596. [PMID: 38738341 PMCID: PMC11092544 DOI: 10.1177/23312165241246596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 03/08/2024] [Accepted: 03/26/2024] [Indexed: 05/14/2024] Open
Abstract
The auditory brainstem response (ABR) is a valuable clinical tool for objective hearing assessment, which is conventionally detected by averaging neural responses to thousands of short stimuli. Progressing beyond these unnatural stimuli, brainstem responses to continuous speech presented via earphones have been recently detected using linear temporal response functions (TRFs). Here, we extend earlier studies by measuring subcortical responses to continuous speech presented in the sound-field, and assess the amount of data needed to estimate brainstem TRFs. Electroencephalography (EEG) was recorded from 24 normal hearing participants while they listened to clicks and stories presented via earphones and loudspeakers. Subcortical TRFs were computed after accounting for non-linear processing in the auditory periphery by either stimulus rectification or an auditory nerve model. Our results demonstrated that subcortical responses to continuous speech could be reliably measured in the sound-field. TRFs estimated using auditory nerve models outperformed simple rectification, and 16 minutes of data was sufficient for the TRFs of all participants to show clear wave V peaks for both earphones and sound-field stimuli. Subcortical TRFs to continuous speech were highly consistent in both earphone and sound-field conditions, and with click ABRs. However, sound-field TRFs required slightly more data (16 minutes) to achieve clear wave V peaks compared to earphone TRFs (12 minutes), possibly due to effects of room acoustics. By investigating subcortical responses to sound-field speech stimuli, this study lays the groundwork for bringing objective hearing assessment closer to real-life conditions, which may lead to improved hearing evaluations and smart hearing technologies.
Collapse
Affiliation(s)
| | - Joshua P. Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | | | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Emina Alickovic
- Eriksholm Research Centre, Snekkersten, Denmark
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
6
|
Brodbeck C, Das P, Gillis M, Kulasingham JP, Bhattasali S, Gaston P, Resnik P, Simon JZ. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife 2023; 12:e85012. [PMID: 38018501 PMCID: PMC10783870 DOI: 10.7554/elife.85012] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 11/24/2023] [Indexed: 11/30/2023] Open
Abstract
Even though human experience unfolds continuously in time, it is not strictly linear; instead, it entails cascading processes building hierarchical cognitive structures. For instance, during speech perception, humans transform a continuously varying acoustic signal into phonemes, words, and meaning, and these levels all have distinct but interdependent temporal structures. Time-lagged regression using temporal response functions (TRFs) has recently emerged as a promising tool for disentangling electrophysiological brain responses related to such complex models of perception. Here, we introduce the Eelbrain Python toolkit, which makes this kind of analysis easy and accessible. We demonstrate its use, using continuous speech as a sample paradigm, with a freely available EEG dataset of audiobook listening. A companion GitHub repository provides the complete source code for the analysis, from raw data to group-level statistics. More generally, we advocate a hypothesis-driven approach in which the experimenter specifies a hierarchy of time-continuous representations that are hypothesized to have contributed to brain responses, and uses those as predictor variables for the electrophysiological signal. This is analogous to a multiple regression problem, but with the addition of a time dimension. TRF analysis decomposes the brain signal into distinct responses associated with the different predictor variables by estimating a multivariate TRF (mTRF), quantifying the influence of each predictor on brain responses as a function of time(-lags). This allows asking two questions about the predictor variables: (1) Is there a significant neural representation corresponding to this predictor variable? And if so, (2) what are the temporal characteristics of the neural response associated with it? Thus, different predictor variables can be systematically combined and evaluated to jointly model neural processing at multiple hierarchical levels. We discuss applications of this approach, including the potential for linking algorithmic/representational theories at different cognitive levels to brain responses through computational models with appropriate linking hypotheses.
Collapse
Affiliation(s)
| | - Proloy Das
- Stanford UniversityStanfordUnited States
| | | | | | | | | | - Philip Resnik
- University of Maryland, College ParkCollege ParkUnited States
| | | |
Collapse
|
7
|
Mesik J, Wojtczak M. The effects of data quantity on performance of temporal response function analyses of natural speech processing. Front Neurosci 2023; 16:963629. [PMID: 36711133 PMCID: PMC9878558 DOI: 10.3389/fnins.2022.963629] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 12/26/2022] [Indexed: 01/15/2023] Open
Abstract
In recent years, temporal response function (TRF) analyses of neural activity recordings evoked by continuous naturalistic stimuli have become increasingly popular for characterizing response properties within the auditory hierarchy. However, despite this rise in TRF usage, relatively few educational resources for these tools exist. Here we use a dual-talker continuous speech paradigm to demonstrate how a key parameter of experimental design, the quantity of acquired data, influences TRF analyses fit to either individual data (subject-specific analyses), or group data (generic analyses). We show that although model prediction accuracy increases monotonically with data quantity, the amount of data required to achieve significant prediction accuracies can vary substantially based on whether the fitted model contains densely (e.g., acoustic envelope) or sparsely (e.g., lexical surprisal) spaced features, especially when the goal of the analyses is to capture the aspect of neural responses uniquely explained by specific features. Moreover, we demonstrate that generic models can exhibit high performance on small amounts of test data (2-8 min), if they are trained on a sufficiently large data set. As such, they may be particularly useful for clinical and multi-task study designs with limited recording time. Finally, we show that the regularization procedure used in fitting TRF models can interact with the quantity of data used to fit the models, with larger training quantities resulting in systematically larger TRF amplitudes. Together, demonstrations in this work should aid new users of TRF analyses, and in combination with other tools, such as piloting and power analyses, may serve as a detailed reference for choosing acquisition duration in future studies.
Collapse
Affiliation(s)
- Juraj Mesik
- Department of Psychology, University of Minnesota, Minneapolis, MN, United States
| | | |
Collapse
|
8
|
Brown JA, Bidelman GM. Familiarity of Background Music Modulates the Cortical Tracking of Target Speech at the "Cocktail Party". Brain Sci 2022; 12:brainsci12101320. [PMID: 36291252 PMCID: PMC9599198 DOI: 10.3390/brainsci12101320] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 09/23/2022] [Accepted: 09/27/2022] [Indexed: 11/23/2022] Open
Abstract
The "cocktail party" problem-how a listener perceives speech in noisy environments-is typically studied using speech (multi-talker babble) or noise maskers. However, realistic cocktail party scenarios often include background music (e.g., coffee shops, concerts). Studies investigating music's effects on concurrent speech perception have predominantly used highly controlled synthetic music or shaped noise, which do not reflect naturalistic listening environments. Behaviorally, familiar background music and songs with vocals/lyrics inhibit concurrent speech recognition. Here, we investigated the neural bases of these effects. While recording multichannel EEG, participants listened to an audiobook while popular songs (or silence) played in the background at a 0 dB signal-to-noise ratio. Songs were either familiar or unfamiliar to listeners and featured either vocals or isolated instrumentals from the original audio recordings. Comprehension questions probed task engagement. We used temporal response functions (TRFs) to isolate cortical tracking to the target speech envelope and analyzed neural responses around 100 ms (i.e., auditory N1 wave). We found that speech comprehension was, expectedly, impaired during background music compared to silence. Target speech tracking was further hindered by the presence of vocals. When masked by familiar music, response latencies to speech were less susceptible to informational masking, suggesting concurrent neural tracking of speech was easier during music known to the listener. These differential effects of music familiarity were further exacerbated in listeners with less musical ability. Our neuroimaging results and their dependence on listening skills are consistent with early attentional-gain mechanisms where familiar music is easier to tune out (listeners already know the song's expectancies) and thus can allocate fewer attentional resources to the background music to better monitor concurrent speech material.
Collapse
Affiliation(s)
- Jane A. Brown
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN 38152, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA
| | - Gavin M. Bidelman
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN 38152, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN 47408, USA
- Program in Neuroscience, Indiana University, Bloomington, IN 47405, USA
- Correspondence:
| |
Collapse
|
9
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|