1
|
EskandariNasab M, Raeisi Z, Lashaki RA, Najafi H. A GRU-CNN model for auditory attention detection using microstate and recurrence quantification analysis. Sci Rep 2024; 14:8861. [PMID: 38632246 PMCID: PMC11024110 DOI: 10.1038/s41598-024-58886-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open
Abstract
Attention as a cognition ability plays a crucial role in perception which helps humans to concentrate on specific objects of the environment while discarding others. In this paper, auditory attention detection (AAD) is investigated using different dynamic features extracted from multichannel electroencephalography (EEG) signals when listeners attend to a target speaker in the presence of a competing talker. To this aim, microstate and recurrence quantification analysis are utilized to extract different types of features that reflect changes in the brain state during cognitive tasks. Then, an optimized feature set is determined by employing the processes of significant feature selection based on classification performance. The classifier model is developed by hybrid sequential learning that employs Gated Recurrent Units (GRU) and Convolutional Neural Network (CNN) into a unified framework for accurate attention detection. The proposed AAD method shows that the selected feature set achieves the most discriminative features for the classification process. Also, it yields the best performance as compared with state-of-the-art AAD approaches from the literature in terms of various measures. The current study is the first to validate the use of microstate and recurrence quantification parameters to differentiate auditory attention using reinforcement learning without access to stimuli.
Collapse
Affiliation(s)
| | - Zahra Raeisi
- Department of Computer Science, University of Fairleigh Dickinson, Vancouver Campus, Vancouver, Canada
| | - Reza Ahmadi Lashaki
- Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Hamidreza Najafi
- Biomedical Engineering Department, School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
| |
Collapse
|
2
|
Ha J, Baek SC, Lim Y, Chung JH. Validation of cost-efficient EEG experimental setup for neural tracking in an auditory attention task. Sci Rep 2023; 13:22682. [PMID: 38114579 PMCID: PMC10730561 DOI: 10.1038/s41598-023-49990-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
When individuals listen to speech, their neural activity phase-locks to the slow temporal rhythm, which is commonly referred to as "neural tracking". The neural tracking mechanism allows for the detection of an attended sound source in a multi-talker situation by decoding neural signals obtained by electroencephalography (EEG), known as auditory attention decoding (AAD). Neural tracking with AAD can be utilized as an objective measurement tool for diverse clinical contexts, and it has potential to be applied to neuro-steered hearing devices. To effectively utilize this technology, it is essential to enhance the accessibility of EEG experimental setup and analysis. The aim of the study was to develop a cost-efficient neural tracking system and validate the feasibility of neural tracking measurement by conducting an AAD task using an offline and real-time decoder model outside the soundproof environment. We devised a neural tracking system capable of conducting AAD experiments using an OpenBCI and Arduino board. Nine participants were recruited to assess the performance of the AAD using the developed system, which involved presenting competing speech signals in an experiment setting without soundproofing. As a result, the offline decoder model demonstrated an average performance of 90%, and real-time decoder model exhibited a performance of 78%. The present study demonstrates the feasibility of implementing neural tracking and AAD using cost-effective devices in a practical environment.
Collapse
Affiliation(s)
- Jiyeon Ha
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, 60322, Frankfurt\ Main, Germany
| | - Yoonseob Lim
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
| | - Jae Ho Chung
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul, 04763, Korea.
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Hanyang University, 222-Wangshimni-ro, Seongdong-gu, Seoul, 133-792, Korea.
| |
Collapse
|
3
|
Wang B, Xu X, Niu Y, Wu C, Wu X, Chen J. EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners. Cereb Cortex 2023; 33:10972-10983. [PMID: 37750333 DOI: 10.1093/cercor/bhad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/21/2023] [Accepted: 08/22/2023] [Indexed: 09/27/2023] Open
Abstract
Auditory attention decoding (AAD) was used to determine the attended speaker during an auditory selective attention task. However, the auditory factors modulating AAD remained unclear for hearing-impaired (HI) listeners. In this study, scalp electroencephalogram (EEG) was recorded with an auditory selective attention paradigm, in which HI listeners were instructed to attend one of the two simultaneous speech streams with or without congruent visual input (articulation movements), and at a high or low target-to-masker ratio (TMR). Meanwhile, behavioral hearing tests (i.e. audiogram, speech reception threshold, temporal modulation transfer function) were used to assess listeners' individual auditory abilities. The results showed that both visual input and increasing TMR could significantly enhance the cortical tracking of the attended speech and AAD accuracy. Further analysis revealed that the audiovisual (AV) gain in attended speech cortical tracking was significantly correlated with listeners' auditory amplitude modulation (AM) sensitivity, and the TMR gain in attended speech cortical tracking was significantly correlated with listeners' hearing thresholds. Temporal response function analysis revealed that subjects with higher AM sensitivity demonstrated more AV gain over the right occipitotemporal and bilateral frontocentral scalp electrodes.
Collapse
Affiliation(s)
- Bo Wang
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Xiran Xu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Yadong Niu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Chao Wu
- School of Nursing, Peking University, Beijing 100191, China
| | - Xihong Wu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| | - Jing Chen
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| |
Collapse
|
4
|
An H, Lee J, Suh MW, Lim Y. Neural correlation of speech envelope tracking for background noise in normal hearing. Front Neurosci 2023; 17:1268591. [PMID: 37916182 PMCID: PMC10616241 DOI: 10.3389/fnins.2023.1268591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/04/2023] [Indexed: 11/03/2023] Open
Abstract
Everyday speech communication often occurs in environments with background noise, and the impact of noise on speech recognition can vary depending on factors such as noise type, noise intensity, and the listener's hearing ability. However, the extent to which neural mechanisms in speech understanding are influenced by different types and levels of noise remains unknown. This study aims to investigate whether individuals exhibit distinct neural responses and attention strategies depending on noise conditions. We recorded electroencephalography (EEG) data from 20 participants with normal hearing (13 males) and evaluated both neural tracking of speech envelopes and behavioral performance in speech understanding in the presence of varying types of background noise. Participants engaged in an EEG experiment consisting of two separate sessions. The first session involved listening to a 12-min story presented binaurally without any background noise. In the second session, speech understanding scores were measured using matrix sentences presented under speech-shaped noise (SSN) and Story noise background noise conditions at noise levels corresponding to sentence recognitions score (SRS). We observed differences in neural envelope correlation depending on noise type but not on its level. Interestingly, the impact of noise type on the variation in envelope tracking was more significant among participants with higher speech perception scores, while those with lower scores exhibited similarities in envelope correlation regardless of the noise condition. The findings suggest that even individuals with normal hearing could adopt different strategies to understand speech in challenging listening environments, depending on the type of noise.
Collapse
Affiliation(s)
- HyunJung An
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - JeeWon Lee
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
- Department of Electronic and Electrical Engineering, Ewha Womans University, Seoul, Republic of Korea
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - Yoonseob Lim
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul, Republic of Korea
| |
Collapse
|
5
|
Yasmin S, Irsik VC, Johnsrude IS, Herrmann B. The effects of speech masking on neural tracking of acoustic and semantic features of natural speech. Neuropsychologia 2023; 186:108584. [PMID: 37169066 DOI: 10.1016/j.neuropsychologia.2023.108584] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 04/30/2023] [Accepted: 05/08/2023] [Indexed: 05/13/2023]
Abstract
Listening environments contain background sounds that mask speech and lead to communication challenges. Sensitivity to slow acoustic fluctuations in speech can help segregate speech from background noise. Semantic context can also facilitate speech perception in noise, for example, by enabling prediction of upcoming words. However, not much is known about how different degrees of background masking affect the neural processing of acoustic and semantic features during naturalistic speech listening. In the current electroencephalography (EEG) study, participants listened to engaging, spoken stories masked at different levels of multi-talker babble to investigate how neural activity in response to acoustic and semantic features changes with acoustic challenges, and how such effects relate to speech intelligibility. The pattern of neural response amplitudes associated with both acoustic and semantic speech features across masking levels was U-shaped, such that amplitudes were largest for moderate masking levels. This U-shape may be due to increased attentional focus when speech comprehension is challenging, but manageable. The latency of the neural responses increased linearly with increasing background masking, and neural latency change associated with acoustic processing most closely mirrored the changes in speech intelligibility. Finally, tracking responses related to semantic dissimilarity remained robust until severe speech masking (-3 dB SNR). The current study reveals that neural responses to acoustic features are highly sensitive to background masking and decreasing speech intelligibility, whereas neural responses to semantic features are relatively robust, suggesting that individuals track the meaning of the story well even in moderate background sound.
Collapse
Affiliation(s)
- Sonia Yasmin
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada.
| | - Vanessa C Irsik
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada
| | - Ingrid S Johnsrude
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada; School of Communication and Speech Disorders,The University of Western Ontario, London, ON, N6A 5B7, Canada
| | - Björn Herrmann
- Rotman Research Institute, Baycrest, M6A 2E1, Toronto, ON, Canada; Department of Psychology,University of Toronto, M5S 1A1, Toronto, ON, Canada
| |
Collapse
|
6
|
Karunathilake IMD, Dunlap JL, Perera J, Presacco A, Decruy L, Anderson S, Kuchinsky SE, Simon JZ. Effects of aging on cortical representations of continuous speech. J Neurophysiol 2023; 129:1359-1377. [PMID: 37096924 PMCID: PMC10202479 DOI: 10.1152/jn.00356.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/04/2023] [Accepted: 04/20/2023] [Indexed: 04/26/2023] Open
Abstract
Understanding speech in a noisy environment is crucial in day-to-day interactions and yet becomes more challenging with age, even for healthy aging. Age-related changes in the neural mechanisms that enable speech-in-noise listening have been investigated previously; however, the extent to which age affects the timing and fidelity of encoding of target and interfering speech streams is not well understood. Using magnetoencephalography (MEG), we investigated how continuous speech is represented in auditory cortex in the presence of interfering speech in younger and older adults. Cortical representations were obtained from neural responses that time-locked to the speech envelopes with speech envelope reconstruction and temporal response functions (TRFs). TRFs showed three prominent peaks corresponding to auditory cortical processing stages: early (∼50 ms), middle (∼100 ms), and late (∼200 ms). Older adults showed exaggerated speech envelope representations compared with younger adults. Temporal analysis revealed both that the age-related exaggeration starts as early as ∼50 ms and that older adults needed a substantially longer integration time window to achieve their better reconstruction of the speech envelope. As expected, with increased speech masking envelope reconstruction for the attended talker decreased and all three TRF peaks were delayed, with aging contributing additionally to the reduction. Interestingly, for older adults the late peak was delayed, suggesting that this late peak may receive contributions from multiple sources. Together these results suggest that there are several mechanisms at play compensating for age-related temporal processing deficits at several stages but which are not able to fully reestablish unimpaired speech perception.NEW & NOTEWORTHY We observed age-related changes in cortical temporal processing of continuous speech that may be related to older adults' difficulty in understanding speech in noise. These changes occur in both timing and strength of the speech representations at different cortical processing stages and depend on both noise condition and selective attention. Critically, their dependence on noise condition changes dramatically among the early, middle, and late cortical processing stages, underscoring how aging differentially affects these stages.
Collapse
Affiliation(s)
- I M Dushyanthi Karunathilake
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
| | - Jason L Dunlap
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Janani Perera
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Alessandro Presacco
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Stefanie E Kuchinsky
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland, United States
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
- Department of Biology, University of Maryland, College Park, Maryland, United States
| |
Collapse
|
7
|
Makov S, Pinto D, Har-Shai Yahav P, Miller LM, Zion Golumbic E. "Unattended, distracting or irrelevant": Theoretical implications of terminological choices in auditory selective attention research. Cognition 2023; 231:105313. [PMID: 36344304 DOI: 10.1016/j.cognition.2022.105313] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 09/30/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022]
Abstract
For seventy years, auditory selective attention research has focused on studying the cognitive mechanisms of prioritizing the processing a 'main' task-relevant stimulus, in the presence of 'other' stimuli. However, a closer look at this body of literature reveals deep empirical inconsistencies and theoretical confusion regarding the extent to which this 'other' stimulus is processed. We argue that many key debates regarding attention arise, at least in part, from inappropriate terminological choices for experimental variables that may not accurately map onto the cognitive constructs they are meant to describe. Here we critically review the more common or disruptive terminological ambiguities, differentiate between methodology-based and theory-derived terms, and unpack the theoretical assumptions underlying different terminological choices. Particularly, we offer an in-depth analysis of the terms 'unattended' and 'distractor' and demonstrate how their use can lead to conflicting theoretical inferences. We also offer a framework for thinking about terminology in a more productive and precise way, in hope of fostering more productive debates and promoting more nuanced and accurate cognitive models of selective attention.
Collapse
Affiliation(s)
- Shiri Makov
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Paz Har-Shai Yahav
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Lee M Miller
- The Center for Mind and Brain, University of California, Davis, CA, United States of America; Department of Neurobiology, Physiology, & Behavior, University of California, Davis, CA, United States of America; Department of Otolaryngology / Head and Neck Surgery, University of California, Davis, CA, United States of America
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel.
| |
Collapse
|
8
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
9
|
Xu Z, Bai Y, Zhao R, Zheng Q, Ni G, Ming D. Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction. Hear Res 2022; 422:108552. [PMID: 35714555 DOI: 10.1016/j.heares.2022.108552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 06/01/2022] [Accepted: 06/08/2022] [Indexed: 11/23/2022]
Abstract
In the cocktail party circumstance, the human auditory system extracts the information from a specific speaker of interest and ignores others. Many studies have focused on auditory attention decoding (AAD), but the stimulation materials were mainly non-tonal languages. We used a tonal language (Mandarin) as the speech stimulus and constructed a Long Short-Term Memory (LSTM) architecture for speech envelope reconstruction based on electroencephalogram (EEG) data. The correlation coefficient between the reconstructed and candidate envelopes was calculated to determine the subject's auditory attention. The proposed LSTM architecture outperformed the linear models. The average decoding accuracy in cross-subject and inter-subject cases varies from 63.02 to 74.29%, with the highest accuracy rate of 89.1% in a decision window of 0.15 s. In addition, the beta-band rhythm was found to play an essential role in identifying the attention and the non-attention state. These results provide a new AAD architecture to help develop neuro-steered hearing devices, especially for tonal languages.
Collapse
Affiliation(s)
- Zihao Xu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Yanru Bai
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Ran Zhao
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Qi Zheng
- Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Guangjian Ni
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China; Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China.
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China; Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
10
|
Shahsavari Baboukani P, Graversen C, Alickovic E, Østergaard J. Speech to noise ratio improvement induces nonlinear parietal phase synchrony in hearing aid users. Front Neurosci 2022; 16:932959. [PMID: 36017182 PMCID: PMC9396236 DOI: 10.3389/fnins.2022.932959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 06/29/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectivesComprehension of speech in adverse listening conditions is challenging for hearing-impaired (HI) individuals. Noise reduction (NR) schemes in hearing aids (HAs) have demonstrated the capability to help HI to overcome these challenges. The objective of this study was to investigate the effect of NR processing (inactive, where the NR feature was switched off, vs. active, where the NR feature was switched on) on correlates of listening effort across two different background noise levels [+3 dB signal-to-noise ratio (SNR) and +8 dB SNR] by using a phase synchrony analysis of electroencephalogram (EEG) signals.DesignThe EEG was recorded while 22 HI participants fitted with HAs performed a continuous speech in noise (SiN) task in the presence of background noise and a competing talker. The phase synchrony within eight regions of interest (ROIs) and four conventional EEG bands was computed by using a multivariate phase synchrony measure.ResultsThe results demonstrated that the activation of NR in HAs affects the EEG phase synchrony in the parietal ROI at low SNR differently than that at high SNR. The relationship between conditions of the listening task and phase synchrony in the parietal ROI was nonlinear.ConclusionWe showed that the activation of NR schemes in HAs can non-linearly reduce correlates of listening effort as estimated by EEG-based phase synchrony. We contend that investigation of the phase synchrony within ROIs can reflect the effects of HAs in HI individuals in ecological listening conditions.
Collapse
Affiliation(s)
- Payam Shahsavari Baboukani
- Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- *Correspondence: Payam Shahsavari Baboukani
| | - Carina Graversen
- Integrative Neuroscience, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Department of Health Science and Technology, Center for Neuroplasticity and Pain (CNAP), Aalborg University, Aalborg, Denmark
| | - Emina Alickovic
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Jan Østergaard
- Department of Electronic Systems, Aalborg University, Aalborg, Denmark
| |
Collapse
|
11
|
A neuroscience-inspired spiking neural network for EEG-based auditory spatial attention detection. Neural Netw 2022; 152:555-565. [DOI: 10.1016/j.neunet.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 03/02/2022] [Accepted: 05/02/2022] [Indexed: 11/18/2022]
|
12
|
Hearing Aid Noise Reduction Lowers the Sustained Listening Effort During Continuous Speech in Noise-A Combined Pupillometry and EEG Study. Ear Hear 2021; 42:1590-1601. [PMID: 33950865 DOI: 10.1097/aud.0000000000001050] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The investigation of auditory cognitive processes recently moved from strictly controlled, trial-based paradigms toward the presentation of continuous speech. This also allows the investigation of listening effort on larger time scales (i.e., sustained listening effort). Here, we investigated the modulation of sustained listening effort by a noise reduction algorithm as applied in hearing aids in a listening scenario with noisy continuous speech. The investigated directional noise reduction algorithm mainly suppresses noise from the background. DESIGN We recorded the pupil size and the EEG in 22 participants with hearing loss who listened to audio news clips in the presence of background multi-talker babble noise. We estimated how noise reduction (off, on) and signal-to-noise ratio (SNR; +3 dB, +8 dB) affect pupil size and the power in the parietal EEG alpha band (i.e., parietal alpha power) as well as the behavioral performance. RESULTS Our results show that noise reduction reduces pupil size, while there was no significant effect of the SNR. It is important to note that we found interactions of SNR and noise reduction, which suggested that noise reduction reduces pupil size predominantly under the lower SNR. Parietal alpha power showed a similar yet nonsignificant pattern, with increased power under easier conditions. In line with the participants' reports that one of the two presented talkers was more intelligible, we found a reduced pupil size, increased parietal alpha power, and better performance when people listened to the more intelligible talker. CONCLUSIONS We show that the modulation of sustained listening effort (e.g., by hearing aid noise reduction) as indicated by pupil size and parietal alpha power can be studied under more ecologically valid conditions. Mainly concluded from pupil size, we demonstrate that hearing aid noise reduction lowers sustained listening effort. Our study approximates to real-world listening scenarios and evaluates the benefit of the signal processing as can be found in a modern hearing aid.
Collapse
|
13
|
Islam MN, Sulaiman N, Farid FA, Uddin J, Alyami SA, Rashid M, P.P. Abdul Majeed A, Moni MA. Diagnosis of hearing deficiency using EEG based AEP signals: CWT and improved-VGG16 pipeline. PeerJ Comput Sci 2021; 7:e638. [PMID: 34712786 PMCID: PMC8507488 DOI: 10.7717/peerj-cs.638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 06/21/2021] [Indexed: 05/14/2023]
Abstract
Hearing deficiency is the world's most common sensation of impairment and impedes human communication and learning. Early and precise hearing diagnosis using electroencephalogram (EEG) is referred to as the optimum strategy to deal with this issue. Among a wide range of EEG control signals, the most relevant modality for hearing loss diagnosis is auditory evoked potential (AEP) which is produced in the brain's cortex area through an auditory stimulus. This study aims to develop a robust intelligent auditory sensation system utilizing a pre-train deep learning framework by analyzing and evaluating the functional reliability of the hearing based on the AEP response. First, the raw AEP data is transformed into time-frequency images through the wavelet transformation. Then, lower-level functionality is eliminated using a pre-trained network. Here, an improved-VGG16 architecture has been designed based on removing some convolutional layers and adding new layers in the fully connected block. Subsequently, the higher levels of the neural network architecture are fine-tuned using the labelled time-frequency images. Finally, the proposed method's performance has been validated by a reputed publicly available AEP dataset, recorded from sixteen subjects when they have heard specific auditory stimuli in the left or right ear. The proposed method outperforms the state-of-art studies by improving the classification accuracy to 96.87% (from 57.375%), which indicates that the proposed improved-VGG16 architecture can significantly deal with AEP response in early hearing loss diagnosis.
Collapse
Affiliation(s)
- Md Nahidul Islam
- Faculty of Electrical and Electronics Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Norizam Sulaiman
- Faculty of Electrical and Electronics Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Fahmid Al Farid
- Faculty of Computing and Informatics, Multimedia University, Malaysia
| | - Jia Uddin
- Technology Studies Department, Endicott College, Woosong university, Daejeon, South Korea
| | - Salem A. Alyami
- Department of Mathematics and Statistics, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
| | - Mamunur Rashid
- Faculty of Electrical and Electronics Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Anwar P.P. Abdul Majeed
- Innovative Manufacturing, Mechatronics and Sports Laboratory, Faculty of Manufacturing and Mechatronic Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
- Centre for Software Development & Integrated Computing, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, Australia
| |
Collapse
|
14
|
Drgas S, Blaszak M, Przekoracka-Krawczyk A. The Combination of Neural Tracking and Alpha Power Lateralization for Auditory Attention Detection. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3603-3616. [PMID: 34403288 DOI: 10.1044/2021_jslhr-20-00608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose The acoustic source that is attended to by the listener in a mixture can be identified with a certain accuracy on the basis of their neural response recorded during listening, and various phenomena may be used to detect attention. For example, neural tracking (NT) and alpha power lateralization (APL) may be utilized in order to obtain information concerning attention. However, these methods of auditory attention detection (AAD) are typically tested in different experimental setups, which makes it impossible to compare their accuracy. The aim of this study is to compare the accuracy of AAD based on NT, APL, and their combination for a dichotic natural speech listening task. Method Thirteen adult listeners were presented with dichotic speech stimuli and instructed to attend to one of them. Electroencephalogram of the subjects was continuously recorded during the experiment using a set of 32 active electrodes. The accuracy of AAD was evaluated for trial lengths of 50, 25, and 12.5 s. AAD was tested for various parameters of NT- and APL-based modules. Results The obtained results suggest that NT of natural running speech provides similar accuracy to APL. The statistically significant improvement of the accuracy of AAD using a combined method has been observed not only for the longest duration of test samples (50 s, p = .005) but also for shorter ones (25 s, p = .011). Conclusions It seems that the combination of standard NT and APL significantly increases the effectiveness of accurate identification of the traced signal perceived by a listener under dichotic conditions. It has been demonstrated that, under certain conditions, the combination of NT and APL may provide a benefit for AAD in cocktail party scenarios.
Collapse
Affiliation(s)
- Szymon Drgas
- Institute of Automation and Robotics, Poznań University of Technology, Poland
| | - Magdalena Blaszak
- Department of Medical Physics and Radiospectroscopy, Faculty of Physics, Adam Mickiewicz University, Poznań, Poland
- Vision and Neuroscience Laboratory, NanoBioMedical Centre, Adam Mickiewicz University, Poznań, Poland
| | - Anna Przekoracka-Krawczyk
- Vision and Neuroscience Laboratory, NanoBioMedical Centre, Adam Mickiewicz University, Poznań, Poland
- Laboratory of Vision Science and Optometry, Faculty of Physics, Adam Mickiewicz University, Poznań, Poland
| |
Collapse
|
15
|
Geravanchizadeh M, Zakeri S. Ear-EEG-based binaural speech enhancement (ee-BSE) using auditory attention detection and audiometric characteristics of hearing-impaired subjects. J Neural Eng 2021; 18. [PMID: 34289464 DOI: 10.1088/1741-2552/ac16b4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Accepted: 07/21/2021] [Indexed: 11/11/2022]
Abstract
Objective. Speech perception in cocktail party scenarios has been the concern of a group of researchers who are involved with the design of hearing-aid devices.Approach. In this paper, a new unified ear-EEG-based binaural speech enhancement system is introduced for hearing-impaired (HI) listeners. The proposed model, which is based on auditory attention detection (AAD) and individual hearing threshold (HT) characteristics, has four main processing stages. In the binaural processing stage, a system based on the deep neural network is trained to estimate auditory ratio masks for each of the speakers in the mixture signal. In the EEG processing stage, AAD is employed to select one ratio mask corresponding to the attended speech. Here, the same EEG data is also used to predict the HTs of listeners who participated in the EEG recordings. The third stage, called insertion gain computation, concerns the calculation of a special amplification gain based on individual HTs. Finally, in the selection-resynthesis-amplification stage, the attended speech signals of the target are resynthesized based on the selected auditory mask and then are amplified using the computed insertion gain.Main results. The detection of the attended speech and the HTs are achieved by classifiers that are trained with features extracted from the scalp EEG or the ear EEG signals. The results of evaluating AAD and HT detection show high detection accuracies. The systematic evaluations of the proposed system yield substantial intelligibility and quality improvements for the HI and normal-hearingaudiograms.Significance. The AAD method determines the direction of attention from single-trial EEG signals without access to audio signals of the speakers. The amplification procedure could be adjusted for each subject based on the individual HTs. The present model has the potential to be considered as an important processing tool to personalize the neuro-steered hearing aids.
Collapse
Affiliation(s)
- Masoud Geravanchizadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz 51666-15813, Iran
| | - Sahar Zakeri
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz 51666-15813, Iran
| |
Collapse
|
16
|
Cai S, Li P, Su E, Xie L. Auditory Attention Detection via Cross-Modal Attention. Front Neurosci 2021; 15:652058. [PMID: 34366770 PMCID: PMC8333999 DOI: 10.3389/fnins.2021.652058] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/24/2021] [Indexed: 11/13/2022] Open
Abstract
Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.
Collapse
Affiliation(s)
| | | | | | - Longhan Xie
- Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
17
|
Lunner T, Alickovic E, Graversen C, Ng EHN, Wendt D, Keidser G. Three New Outcome Measures That Tap Into Cognitive Processes Required for Real-Life Communication. Ear Hear 2021; 41 Suppl 1:39S-47S. [PMID: 33105258 PMCID: PMC7676869 DOI: 10.1097/aud.0000000000000941] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 07/11/2020] [Indexed: 11/29/2022]
Abstract
To increase the ecological validity of outcomes from laboratory evaluations of hearing and hearing devices, it is desirable to introduce more realistic outcome measures in the laboratory. This article presents and discusses three outcome measures that have been designed to go beyond traditional speech-in-noise measures to better reflect realistic everyday challenges. The outcome measures reviewed are: the Sentence-final Word Identification and Recall (SWIR) test that measures working memory performance while listening to speech in noise at ceiling performance; a neural tracking method that produces a quantitative measure of selective speech attention in noise; and pupillometry that measures changes in pupil dilation to assess listening effort while listening to speech in noise. According to evaluation data, the SWIR test provides a sensitive measure in situations where speech perception performance might be unaffected. Similarly, pupil dilation has also shown sensitivity in situations where traditional speech-in-noise measures are insensitive. Changes in working memory capacity and effort mobilization were found at positive signal-to-noise ratios (SNR), that is, at SNRs that might reflect everyday situations. Using stimulus reconstruction, it has been demonstrated that neural tracking is a robust method at determining to what degree a listener is attending to a specific talker in a typical cocktail party situation. Using both established and commercially available noise reduction schemes, data have further shown that all three measures are sensitive to variation in SNR. In summary, the new outcome measures seem suitable for testing hearing and hearing devices under more realistic and demanding everyday conditions than traditional speech-in-noise tests.
Collapse
Affiliation(s)
- Thomas Lunner
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
- Department of Electrical Engineering, Division Automatic Control, Linköping University, Linköping, Sweden
- Department of Health Technology, Hearing Systems, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Electrical Engineering, Division Automatic Control, Linköping University, Linköping, Sweden
| | | | - Elaine Hoi Ning Ng
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
- Oticon A/S, Kongebakken, Denmark
| | - Dorothea Wendt
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Health Technology, Hearing Systems, Technical University of Denmark, Lyngby, Denmark
| | - Gitte Keidser
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
| |
Collapse
|
18
|
Potential of Augmented Reality Platforms to Improve Individual Hearing Aids and to Support More Ecologically Valid Research. Ear Hear 2021; 41 Suppl 1:140S-146S. [PMID: 33105268 PMCID: PMC7676615 DOI: 10.1097/aud.0000000000000961] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
An augmented reality (AR) platform combines several technologies in a system that can render individual “digital objects” that can be manipulated for a given purpose. In the audio domain, these may, for example, be generated by speaker separation, noise suppression, and signal enhancement. Access to the “digital objects” could be used to augment auditory objects that the user wants to hear better. Such AR platforms in conjunction with traditional hearing aids may contribute to closing the gap for people with hearing loss through multimodal sensor integration, leveraging extensive current artificial intelligence research, and machine-learning frameworks. This could take the form of an attention-driven signal enhancement and noise suppression platform, together with context awareness, which would improve the interpersonal communication experience in complex real-life situations. In that sense, an AR platform could serve as a frontend to current and future hearing solutions. The AR device would enhance the signals to be attended, but the hearing amplification would still be handled by hearing aids. In this article, suggestions are made about why AR platforms may offer ideal affordances to compensate for hearing loss, and how research-focused AR platforms could help toward better understanding of the role of hearing in everyday life.
Collapse
|
19
|
Wang L, Wu EX, Chen F. EEG-based auditory attention decoding using speech-level-based segmented computational models. J Neural Eng 2021; 18. [PMID: 33957606 DOI: 10.1088/1741-2552/abfeba] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
Objective.Auditory attention in complex scenarios can be decoded by electroencephalography (EEG)-based cortical speech-envelope tracking. The relative root-mean-square (RMS) intensity is a valuable cue for the decomposition of speech into distinct characteristic segments. To improve auditory attention decoding (AAD) performance, this work proposed a novel segmented AAD approach to decode target speech envelopes from different RMS-level-based speech segments.Approach.Speech was decomposed into higher- and lower-RMS-level speech segments with a threshold of -10 dB relative RMS level. A support vector machine classifier was designed to identify higher- and lower-RMS-level speech segments, using clean target and mixed speech as reference signals based on corresponding EEG signals recorded when subjects listened to target auditory streams in competing two-speaker auditory scenes. Segmented computational models were developed with the classification results of higher- and lower-RMS-level speech segments. Speech envelopes were reconstructed based on segmented decoding models for either higher- or lower-RMS-level speech segments. AAD accuracies were calculated according to the correlations between actual and reconstructed speech envelopes. The performance of the proposed segmented AAD computational model was compared to those of traditional AAD methods with unified decoding functions.Main results.Higher- and lower-RMS-level speech segments in continuous sentences could be identified robustly with classification accuracies that approximated or exceeded 80% based on corresponding EEG signals at 6 dB, 3 dB, 0 dB, -3 dB and -6 dB signal-to-mask ratios (SMRs). Compared with unified AAD decoding methods, the proposed segmented AAD approach achieved more accurate results in the reconstruction of target speech envelopes and in the detection of attentional directions. Moreover, the proposed segmented decoding method had higher information transfer rates (ITRs) and shorter minimum expected switch times compared with the unified decoder.Significance.This study revealed that EEG signals may be used to classify higher- and lower-RMS-level-based speech segments across a wide range of SMR conditions (from 6 dB to -6 dB). A novel finding was that the specific information in different RMS-level-based speech segments facilitated EEG-based decoding of auditory attention. The significantly improved AAD accuracies and ITRs of the segmented decoding method suggests that this proposed computational model may be an effective method for the application of neuro-controlled brain-computer interfaces in complex auditory scenes.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
20
|
Vandecappelle S, Deckers L, Das N, Ansari AH, Bertrand A, Francart T. EEG-based detection of the locus of auditory attention with convolutional neural networks. eLife 2021; 10:e56481. [PMID: 33929315 PMCID: PMC8143791 DOI: 10.7554/elife.56481] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/28/2021] [Indexed: 01/16/2023] Open
Abstract
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Collapse
Affiliation(s)
- Servaas Vandecappelle
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Lucas Deckers
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Neetha Das
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Amir Hossein Ansari
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Tom Francart
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
| |
Collapse
|
21
|
Vandecappelle S, Deckers L, Das N, Ansari AH, Bertrand A, Francart T. EEG-based detection of the locus of auditory attention with convolutional neural networks. eLife 2021; 10:56481. [PMID: 33929315 DOI: 10.1101/475673] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/28/2021] [Indexed: 05/27/2023] Open
Abstract
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Collapse
Affiliation(s)
- Servaas Vandecappelle
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Lucas Deckers
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Neetha Das
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Amir Hossein Ansari
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
| |
Collapse
|
22
|
Geirnaert S, Francart T, Bertrand A. Fast EEG-Based Decoding Of The Directional Focus Of Auditory Attention Using Common Spatial Patterns. IEEE Trans Biomed Eng 2021; 68:1557-1568. [PMID: 33095706 DOI: 10.1109/tbme.2020.3033446] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE Noise reduction algorithms in current hearing devices lack informationabout the sound source a user attends to when multiple sources are present. To resolve this issue, they can be complemented with auditory attention decoding (AAD) algorithms, which decode the attention using electroencephalography (EEG) sensors. State-of-the-art AAD algorithms employ a stimulus reconstruction approach, in which the envelope of the attended source is reconstructed from the EEG and correlated with the envelopes of the individual sources. This approach, however, performs poorly on short signal segments, whilelonger segments yield impractically long detection delays when the user switches attention. METHODS We propose decoding the directional focus of attention using filterbank common spatial pattern filters (FB-CSP) as an alternative AAD paradigm, whichdoes not require access to the clean source envelopes. RESULTS The proposed FB-CSP approach outperforms both the stimulus reconstruction approach on short signal segments, as well as a convolutional neural network approach on the same task. We achieve a high accuracy (80% for [Formula: see text] windows and 70% for quasi-instantaneous decisions), which is sufficient to reach minimal expected switch durations below [Formula: see text]. We also demonstrate that the decoder can adapt to unlabeled data from anunseen subject and works with only a subset of EEG channels located around the ear to emulate a wearable EEG setup. CONCLUSION The proposed FB-CSP method provides fast and accurate decoding of the directional focus of auditory attention. SIGNIFICANCE The high accuracy on very short data segments is a major step forward towards practical neuro-steered hearing devices.
Collapse
|
23
|
Alickovic E, Ng EHN, Fiedler L, Santurette S, Innes-Brown H, Graversen C. Effects of Hearing Aid Noise Reduction on Early and Late Cortical Representations of Competing Talkers in Noise. Front Neurosci 2021; 15:636060. [PMID: 33841081 PMCID: PMC8032942 DOI: 10.3389/fnins.2021.636060] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 02/26/2021] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Previous research using non-invasive (magnetoencephalography, MEG) and invasive (electrocorticography, ECoG) neural recordings has demonstrated the progressive and hierarchical representation and processing of complex multi-talker auditory scenes in the auditory cortex. Early responses (<85 ms) in primary-like areas appear to represent the individual talkers with almost equal fidelity and are independent of attention in normal-hearing (NH) listeners. However, late responses (>85 ms) in higher-order non-primary areas selectively represent the attended talker with significantly higher fidelity than unattended talkers in NH and hearing-impaired (HI) listeners. Motivated by these findings, the objective of this study was to investigate the effect of a noise reduction scheme (NR) in a commercial hearing aid (HA) on the representation of complex multi-talker auditory scenes in distinct hierarchical stages of the auditory cortex by using high-density electroencephalography (EEG). DESIGN We addressed this issue by investigating early (<85 ms) and late (>85 ms) EEG responses recorded in 34 HI subjects fitted with HAs. The HA noise reduction (NR) was either on or off while the participants listened to a complex auditory scene. Participants were instructed to attend to one of two simultaneous talkers in the foreground while multi-talker babble noise played in the background (+3 dB SNR). After each trial, a two-choice question about the content of the attended speech was presented. RESULTS Using a stimulus reconstruction approach, our results suggest that the attention-related enhancement of neural representations of target and masker talkers located in the foreground, as well as suppression of the background noise in distinct hierarchical stages is significantly affected by the NR scheme. We found that the NR scheme contributed to the enhancement of the foreground and of the entire acoustic scene in the early responses, and that this enhancement was driven by better representation of the target speech. We found that the target talker in HI listeners was selectively represented in late responses. We found that use of the NR scheme resulted in enhanced representations of the target and masker speech in the foreground and a suppressed representation of the noise in the background in late responses. We found a significant effect of EEG time window on the strengths of the cortical representation of the target and masker. CONCLUSION Together, our analyses of the early and late responses obtained from HI listeners support the existing view of hierarchical processing in the auditory cortex. Our findings demonstrate the benefits of a NR scheme on the representation of complex multi-talker auditory scenes in different areas of the auditory cortex in HI listeners.
Collapse
Affiliation(s)
- Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| | - Elaine Hoi Ning Ng
- Centre for Applied Audiology Research, Oticon A/S, Smørum, Denmark
- Department of Behavioral Sciences and Learning, Linkoping University, Linkoping, Sweden
| | - Lorenz Fiedler
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | - Sébastien Santurette
- Centre for Applied Audiology Research, Oticon A/S, Smørum, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | | | | |
Collapse
|
24
|
EEG-based diagnostics of the auditory system using cochlear implant electrodes as sensors. Sci Rep 2021; 11:5383. [PMID: 33686155 PMCID: PMC7940426 DOI: 10.1038/s41598-021-84829-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 02/18/2021] [Indexed: 01/31/2023] Open
Abstract
The cochlear implant is one of the most successful medical prostheses, allowing deaf and severely hearing-impaired persons to hear again by electrically stimulating the auditory nerve. A trained audiologist adjusts the stimulation settings for good speech understanding, known as "fitting" the implant. This process is based on subjective feedback from the user, making it time-consuming and challenging, especially in paediatric or communication-impaired populations. Furthermore, fittings only happen during infrequent sessions at a clinic, and therefore cannot take into account variable factors that affect the user's hearing, such as physiological changes and different listening environments. Objective audiometry, in which brain responses evoked by auditory stimulation are collected and analysed, removes the need for active patient participation. However, recording of brain responses still requires expensive equipment that is cumbersome to use. An elegant solution is to record the neural signals using the implant itself. We demonstrate for the first time the recording of continuous electroencephalographic (EEG) signals from the implanted intracochlear electrode array in human subjects, using auditory evoked potentials originating from different brain regions. This was done using a temporary recording set-up with a percutaneous connector used for research purposes. Furthermore, we show that the response morphologies and amplitudes depend crucially on the recording electrode configuration. The integration of an EEG system into cochlear implants paves the way towards chronic neuro-monitoring of hearing-impaired patients in their everyday environment, and neuro-steered hearing prostheses, which can autonomously adjust their output based on neural feedback.
Collapse
|
25
|
Mikkelsen KB, Tabar YR, Christensen CB, Kidmose P. EEGs Vary Less Between Lab and Home Locations Than They Do Between People. Front Comput Neurosci 2021; 15:565244. [PMID: 33679356 PMCID: PMC7928278 DOI: 10.3389/fncom.2021.565244] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Accepted: 01/13/2021] [Indexed: 11/24/2022] Open
Abstract
Given the rapid development of light weight EEG devices which we have witnessed the past decade, it is reasonable to ask to which extent neuroscience could now be taken outside the lab. In this study, we have designed an EEG paradigm well suited for deployment “in the wild.” The paradigm is tested in repeated recordings on 20 subjects, on eight different occasions (4 in the laboratory, 4 in the subject's own home). By calculating the inter subject, intra subject and inter location variance, we find that the inter location variation for this paradigm is considerably less than the inter subject variation. We believe the paradigm is representative of a large group of other relevant paradigms. This means that given the positive results in this study, we find that if a research paradigm would benefit from being performed in less controlled environments, we expect limited problems in doing so.
Collapse
Affiliation(s)
- Kaare B Mikkelsen
- Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark
| | - Yousef R Tabar
- Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark
| | | | - Preben Kidmose
- Department of Electrical and Computer Engineering, Aarhus University, Aarhus, Denmark
| |
Collapse
|
26
|
Baek SC, Chung JH, Lim Y. Implementation of an Online Auditory Attention Detection Model with Electroencephalography in a Dichotomous Listening Experiment. SENSORS 2021; 21:s21020531. [PMID: 33451041 PMCID: PMC7828508 DOI: 10.3390/s21020531] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 01/07/2021] [Accepted: 01/09/2021] [Indexed: 11/16/2022]
Abstract
Auditory attention detection (AAD) is the tracking of a sound source to which a listener is attending based on neural signals. Despite expectation for the applicability of AAD in real-life, most AAD research has been conducted on recorded electroencephalograms (EEGs), which is far from online implementation. In the present study, we attempted to propose an online AAD model and to implement it on a streaming EEG. The proposed model was devised by introducing a sliding window into the linear decoder model and was simulated using two datasets obtained from separate experiments to evaluate the feasibility. After simulation, the online model was constructed and evaluated based on the streaming EEG of an individual, acquired during a dichotomous listening experiment. Our model was able to detect the transient direction of a participant's attention on the order of one second during the experiment and showed up to 70% average detection accuracy. We expect that the proposed online model could be applied to develop adaptive hearing aids or neurofeedback training for auditory attention and speech perception.
Collapse
Affiliation(s)
- Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
| | - Jae Ho Chung
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul 04763, Korea
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, Korea
- Correspondence: (J.H.C.); (Y.L.); Tel.: +82-2-31-560-2298 (J.H.C.); +82-2-958-6641 (Y.L.)
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, Korea
- Research Center for Diagnosis, Treatment and Care System of Dementia, Korea Institute of Science and Technology, Seoul 02792, Korea
- Correspondence: (J.H.C.); (Y.L.); Tel.: +82-2-31-560-2298 (J.H.C.); +82-2-958-6641 (Y.L.)
| |
Collapse
|
27
|
Wang L, Wu EX, Chen F. Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions. Front Hum Neurosci 2020; 14:557534. [PMID: 33132874 PMCID: PMC7576187 DOI: 10.3389/fnhum.2020.557534] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/09/2020] [Indexed: 11/25/2022] Open
Abstract
The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
28
|
Alickovic E, Lunner T, Wendt D, Fiedler L, Hietkamp R, Ng EHN, Graversen C. Neural Representation Enhanced for Speech and Reduced for Background Noise With a Hearing Aid Noise Reduction Scheme During a Selective Attention Task. Front Neurosci 2020; 14:846. [PMID: 33071722 PMCID: PMC7533612 DOI: 10.3389/fnins.2020.00846] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 07/20/2020] [Indexed: 12/23/2022] Open
Abstract
Objectives Selectively attending to a target talker while ignoring multiple interferers (competing talkers and background noise) is more difficult for hearing-impaired (HI) individuals compared to normal-hearing (NH) listeners. Such tasks also become more difficult as background noise levels increase. To overcome these difficulties, hearing aids (HAs) offer noise reduction (NR) schemes. The objective of this study was to investigate the effect of NR processing (inactive, where the NR feature was switched off, vs. active, where the NR feature was switched on) on the neural representation of speech envelopes across two different background noise levels [+3 dB signal-to-noise ratio (SNR) and +8 dB SNR] by using a stimulus reconstruction (SR) method. Design To explore how NR processing supports the listeners’ selective auditory attention, we recruited 22 HI participants fitted with HAs. To investigate the interplay between NR schemes, background noise, and neural representation of the speech envelopes, we used electroencephalography (EEG). The participants were instructed to listen to a target talker in front while ignoring a competing talker in front in the presence of multi-talker background babble noise. Results The results show that the neural representation of the attended speech envelope was enhanced by the active NR scheme for both background noise levels. The neural representation of the attended speech envelope at lower (+3 dB) SNR was shifted, approximately by 5 dB, toward the higher (+8 dB) SNR when the NR scheme was turned on. The neural representation of the ignored speech envelope was modulated by the NR scheme and was mostly enhanced in the conditions with more background noise. The neural representation of the background noise was modulated (i.e., reduced) by the NR scheme and was significantly reduced in the conditions with more background noise. The neural representation of the net sum of the ignored acoustic scene (ignored talker and background babble) was not modulated by the NR scheme but was significantly reduced in the conditions with a reduced level of background noise. Taken together, we showed that the active NR scheme enhanced the neural representation of both the attended and the ignored speakers and reduced the neural representation of background noise, while the net sum of the ignored acoustic scene was not enhanced. Conclusion Altogether our results support the hypothesis that the NR schemes in HAs serve to enhance the neural representation of speech and reduce the neural representation of background noise during a selective attention task. We contend that these results provide a neural index that could be useful for assessing the effects of HAs on auditory and cognitive processing in HI populations.
Collapse
Affiliation(s)
- Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Electrical Engineering, Linkoping University, Linköping, Sweden
| | - Thomas Lunner
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Electrical Engineering, Linkoping University, Linköping, Sweden.,Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.,Department of Behavioral Sciences and Learning, Linkoping University, Linköping, Sweden
| | - Dorothea Wendt
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Lorenz Fiedler
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | | | - Elaine Hoi Ning Ng
- Department of Behavioral Sciences and Learning, Linkoping University, Linköping, Sweden.,Oticon A/S, Smørum, Denmark
| | | |
Collapse
|
29
|
Cortical Tracking of Speech in Delta Band Relates to Individual Differences in Speech in Noise Comprehension in Older Adults. Ear Hear 2020; 42:343-354. [PMID: 32826508 DOI: 10.1097/aud.0000000000000923] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
OBJECTIVES Understanding speech in adverse listening environments is challenging for older adults. Individual differences in pure tone averages and working memory are known to be critical indicators of speech in noise comprehension. Recent studies have suggested that tracking of the speech envelope in cortical oscillations <8 Hz may be an important mechanism related to speech comprehension by segmenting speech into words and phrases (delta, 1 to 4 Hz) or phonemes and syllables (theta, 4 to 8 Hz). The purpose of this study was to investigate the extent to which individual differences in pure tone averages, working memory, and cortical tracking of the speech envelope relate to speech in noise comprehension in older adults. DESIGN Cortical tracking of continuous speech was assessed using electroencephalography in older adults (60 to 80 years). Participants listened to speech in quiet and in the presence of noise (time-reversed speech) and answered comprehension questions. Participants completed Forward Digit Span and Backward Digit Span as measures of working memory, and pure tone averages were collected. An index of reduction in noise (RIN) was calculated by normalizing the difference between raw cortical tracking in quiet and in noise. RESULTS Comprehension question performance was greater for speech in quiet than for speech in noise. The relationship between RIN and speech in noise comprehension was assessed while controlling for the effects of individual differences in pure tone averages and working memory. Delta band RIN correlated with speech in noise comprehension, while theta band RIN did not. CONCLUSIONS Cortical tracking by delta oscillations is robust to the effects of noise. These findings demonstrate that the magnitude of delta band RIN relates to individual differences in speech in noise comprehension in older adults. Delta band RIN may serve as a neural metric of speech in noise comprehension beyond the effects of pure tone averages and working memory.
Collapse
|
30
|
Seifi Ala T, Graversen C, Wendt D, Alickovic E, Whitmer WM, Lunner T. An exploratory Study of EEG Alpha Oscillation and Pupil Dilation in Hearing-Aid Users During Effortful listening to Continuous Speech. PLoS One 2020; 15:e0235782. [PMID: 32649733 PMCID: PMC7351195 DOI: 10.1371/journal.pone.0235782] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 06/17/2020] [Indexed: 01/13/2023] Open
Abstract
Individuals with hearing loss allocate cognitive resources to comprehend noisy speech in everyday life scenarios. Such a scenario could be when they are exposed to ongoing speech and need to sustain their attention for a rather long period of time, which requires listening effort. Two well-established physiological methods that have been found to be sensitive to identify changes in listening effort are pupillometry and electroencephalography (EEG). However, these measurements have been used mainly for momentary, evoked or episodic effort. The aim of this study was to investigate how sustained effort manifests in pupillometry and EEG, using continuous speech with varying signal-to-noise ratio (SNR). Eight hearing-aid users participated in this exploratory study and performed a continuous speech-in-noise task. The speech material consisted of 30-second continuous streams that were presented from loudspeakers to the right and left side of the listener (±30° azimuth) in the presence of 4-talker background noise (+180° azimuth). The participants were instructed to attend either to the right or left speaker and ignore the other in a randomized order with two different SNR conditions: 0 dB and -5 dB (the difference between the target and the competing talker). The effects of SNR on listening effort were explored objectively using pupillometry and EEG. The results showed larger mean pupil dilation and decreased EEG alpha power in the parietal lobe during the more effortful condition. This study demonstrates that both measures are sensitive to changes in SNR during continuous speech.
Collapse
Affiliation(s)
- Tirdad Seifi Ala
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Hearing Sciences–Scottish Section, Division of Clinical Neuroscience, University of Nottingham, Glasgow, Scotland, United Kingdom
| | | | - Dorothea Wendt
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - William M. Whitmer
- Hearing Sciences–Scottish Section, Division of Clinical Neuroscience, University of Nottingham, Glasgow, Scotland, United Kingdom
| | - Thomas Lunner
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| |
Collapse
|
31
|
Jaeger M, Mirkovic B, Bleichner MG, Debener S. Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening. Front Neurosci 2020; 14:603. [PMID: 32612507 PMCID: PMC7308709 DOI: 10.3389/fnins.2020.00603] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 05/15/2020] [Indexed: 11/13/2022] Open
Abstract
Listeners differ in their ability to attend to a speech stream in the presence of a competing sound. Differences in speech intelligibility in noise cannot be fully explained by the hearing ability which suggests the involvement of additional cognitive factors. A better understanding of the temporal fluctuations in the ability to pay selective auditory attention to a desired speech stream may help in explaining these variabilities. In order to better understand the temporal dynamics of selective auditory attention, we developed an online auditory attention decoding (AAD) processing pipeline based on speech envelope tracking in the electroencephalogram (EEG). Participants had to attend to one audiobook story while a second one had to be ignored. Online AAD was applied to track the attention toward the target speech signal. Individual temporal attention profiles were computed by combining an established AAD method with an adaptive staircase procedure. The individual decoding performance over time was analyzed and linked to behavioral performance as well as subjective ratings of listening effort, motivation, and fatigue. The grand average attended speaker decoding profile derived in the online experiment indicated performance above chance level. Parameters describing the individual AAD performance in each testing block indicated significant differences in decoding performance over time to be closely related to the behavioral performance in the selective listening task. Further, an exploratory analysis indicated that subjects with poor decoding performance reported higher listening effort and fatigue compared to good performers. Taken together our results show that online EEG based AAD in a complex listening situation is feasible. Adaptive attended speaker decoding profiles over time could be used as an objective measure of behavioral performance and listening effort. The developed online processing pipeline could also serve as a basis for future EEG based near real-time auditory neurofeedback systems.
Collapse
Affiliation(s)
- Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Neurophysiology of Everyday Life Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
32
|
Decruy L, Lesenfants D, Vanthornhout J, Francart T. Top-down modulation of neural envelope tracking: The interplay with behavioral, self-report and neural measures of listening effort. Eur J Neurosci 2020; 52:3375-3393. [PMID: 32306466 DOI: 10.1111/ejn.14753] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 04/09/2020] [Accepted: 04/11/2020] [Indexed: 11/27/2022]
Abstract
When listening to natural speech, our brain activity tracks the slow amplitude modulations of speech, also called the speech envelope. Moreover, recent research has demonstrated that this neural envelope tracking can be affected by top-down processes. The present study was designed to examine if neural envelope tracking is modulated by the effort that a person expends during listening. Five measures were included to quantify listening effort: two behavioral measures based on a novel dual-task paradigm, a self-report effort measure and two neural measures related to phase synchronization and alpha power. Electroencephalography responses to sentences, presented at a wide range of subject-specific signal-to-noise ratios, were recorded in thirteen young, normal-hearing adults. A comparison of the five measures revealed different effects of listening effort as a function of speech understanding. Reaction times on the primary task and self-reported effort decreased with increasing speech understanding. In contrast, reaction times on the secondary task and alpha power showed a peak-shaped behavior with highest effort at intermediate speech understanding levels. With regard to neural envelope tracking, we found that the reaction times on the secondary task and self-reported effort explained a small part of the variability in theta-band envelope tracking. Speech understanding was found to strongly modulate neural envelope tracking. More specifically, our results demonstrated a robust increase in envelope tracking with increasing speech understanding. The present study provides new insights in the relations among different effort measures and highlights the potential of neural envelope tracking to objectively measure speech understanding in young, normal-hearing adults.
Collapse
Affiliation(s)
- Lien Decruy
- Department of Neurosciences Research, Group Experimental Oto-rhino-laryngology (ExpORL), KU Leuven, Leuven, Belgium
| | - Damien Lesenfants
- Department of Neurosciences Research, Group Experimental Oto-rhino-laryngology (ExpORL), KU Leuven, Leuven, Belgium
| | - Jonas Vanthornhout
- Department of Neurosciences Research, Group Experimental Oto-rhino-laryngology (ExpORL), KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences Research, Group Experimental Oto-rhino-laryngology (ExpORL), KU Leuven, Leuven, Belgium
| |
Collapse
|
33
|
The interplay of top-down focal attention and the cortical tracking of speech. Sci Rep 2020; 10:6922. [PMID: 32332791 PMCID: PMC7181730 DOI: 10.1038/s41598-020-63587-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 04/02/2020] [Indexed: 12/29/2022] Open
Abstract
Many active neuroimaging paradigms rely on the assumption that the participant sustains attention to a task. However, in practice, there will be momentary distractions, potentially influencing the results. We investigated the effect of focal attention, objectively quantified using a measure of brain signal entropy, on cortical tracking of the speech envelope. The latter is a measure of neural processing of naturalistic speech. We let participants listen to 44 minutes of natural speech, while their electroencephalogram was recorded, and quantified both entropy and cortical envelope tracking. Focal attention affected the later brain responses to speech, between 100 and 300 ms latency. By only taking into account periods with higher attention, the measured cortical speech tracking improved by 47%. This illustrates the impact of the participant’s active engagement in the modeling of the brain-speech response and the importance of accounting for it. Our results suggest a cortico-cortical loop that initiates during the early-stages of the auditory processing, then propagates through the parieto-occipital and frontal areas, and finally impacts the later-latency auditory processes in a top-down fashion. The proposed framework could be transposed to other active electrophysiological paradigms (visual, somatosensory, etc) and help to control the impact of participants’ engagement on the results.
Collapse
|
34
|
Geirnaert S, Francart T, Bertrand A. An Interpretable Performance Metric for Auditory Attention Decoding Algorithms in a Context of Neuro-Steered Gain Control. IEEE Trans Neural Syst Rehabil Eng 2019; 28:307-317. [PMID: 31715568 DOI: 10.1109/tnsre.2019.2952724] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
In a multi-speaker scenario, a hearing aid lacks information on which speaker the user intends to attend, and therefore it often mistakenly treats the latter as noise while enhancing an interfering speaker. Recently, it has been shown that it is possible to decode the attended speaker from the brain activity, e.g., recorded by electroencephalography sensors. While numerous of these auditory attention decoding (AAD) algorithms appeared in the literature, their performance is generally evaluated in a non-uniform manner. Furthermore, AAD algorithms typically introduce a trade-off between the AAD accuracy and the time needed to make an AAD decision, which hampers an objective benchmarking as it remains unclear which point in each algorithm's trade-off space is the optimal one in a context of neuro-steered gain control. To this end, we present an interpretable performance metric to evaluate AAD algorithms, based on an adaptive gain control system, steered by AAD decisions. Such a system can be modeled as a Markov chain, from which the minimal expected switch duration (MESD) can be calculated and interpreted as the expected time required to switch the operation of the hearing aid after an attention switch of the user, thereby resolving the trade-off between AAD accuracy and decision time. Furthermore, we show that the MESD calculation provides an automatic and theoretically founded procedure to optimize the number of gain levels and decision time in an AAD-based adaptive gain control system.
Collapse
|
35
|
Vanthornhout J, Decruy L, Francart T. Effect of Task and Attention on Neural Tracking of Speech. Front Neurosci 2019; 13:977. [PMID: 31607841 PMCID: PMC6756133 DOI: 10.3389/fnins.2019.00977] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 08/30/2019] [Indexed: 12/02/2022] Open
Abstract
EEG-based measures of neural tracking of natural running speech are becoming increasingly popular to investigate neural processing of speech and have applications in audiology. When the stimulus is a single speaker, it is usually assumed that the listener actively attends to and understands the stimulus. However, as the level of attention of the listener is inherently variable, we investigated how this affected neural envelope tracking. Using a movie as a distractor, we varied the level of attention while we estimated neural envelope tracking. We varied the intelligibility level by adding stationary noise. We found a significant difference in neural envelope tracking between the condition with maximal attention and the movie condition. This difference was most pronounced in the right-frontal region of the brain. The degree of neural envelope tracking was highly correlated with the stimulus signal-to-noise ratio, even in the movie condition. This could be due to residual neural resources to passively attend to the stimulus. When envelope tracking is used to measure speech understanding objectively, this means that the procedure can be made more enjoyable and feasible by letting participants watch a movie during stimulus presentation.
Collapse
Affiliation(s)
| | - Lien Decruy
- Department of Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| |
Collapse
|
36
|
Lesenfants D, Vanthornhout J, Verschueren E, Decruy L, Francart T. Predicting individual speech intelligibility from the cortical tracking of acoustic- and phonetic-level speech representations. Hear Res 2019; 380:1-9. [DOI: 10.1016/j.heares.2019.05.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 05/20/2019] [Accepted: 05/21/2019] [Indexed: 10/26/2022]
|
37
|
Ciccarelli G, Nolan M, Perricone J, Calamia PT, Haro S, O'Sullivan J, Mesgarani N, Quatieri TF, Smalt CJ. Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods. Sci Rep 2019; 9:11538. [PMID: 31395905 PMCID: PMC6687829 DOI: 10.1038/s41598-019-47795-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/24/2019] [Indexed: 12/30/2022] Open
Abstract
Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. Here, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the new architecture outperforms the baseline linear stimulus-reconstruction method, improving decoding accuracy from 66% to 81% using wet EEG and from 59% to 87% for dry EEG. Also of note was the finding that the dry EEG system can deliver comparable or even better results than the wet, despite the latter having one third as many EEG channels as the former. The 11-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available for further validation, experimentation, and modification.
Collapse
Affiliation(s)
- Gregory Ciccarelli
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Michael Nolan
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Joseph Perricone
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Paul T Calamia
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Stephanie Haro
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.,Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| | - James O'Sullivan
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Thomas F Quatieri
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.,Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| | - Christopher J Smalt
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.
| |
Collapse
|
38
|
Mirkovic B, Debener S, Schmidt J, Jaeger M, Neher T. Effects of directional sound processing and listener's motivation on EEG responses to continuous noisy speech: Do normal-hearing and aided hearing-impaired listeners differ? Hear Res 2019; 377:260-270. [PMID: 31003037 DOI: 10.1016/j.heares.2019.04.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 04/02/2019] [Accepted: 04/10/2019] [Indexed: 10/27/2022]
Abstract
OBJECTIVE It has been suggested that the next major advancement in hearing aid (HA) technology needs to include cognitive feedback from the user to control HA functionality. In order to enable automatic brainwave-steered HA adjustments, attentional processes underlying speech-in-noise perception in aided hearing-impaired individuals need to be better understood. Here, we addressed the influence of two important factors for the listening performance of HA users - hearing aid processing and motivation - by analysing ongoing neural responses during long-term listening to continuous noisy speech. METHODS Sixteen normal-hearing (NH) and 15 linearly aided hearing-impaired (aHI) participants listened to an audiobook recording embedded in realistic speech babble noise at individually adjusted signal-to-noise ratios (SNRs). A HA simulator was used for simulating a directional microphone setting as well as for providing individual amplification. To assess listening performance behaviourally, participants answered questions about the contents of the audiobook. We manipulated (1) the participants' motivation by offering a monetary reward for good listening performance in one half of the measurements and (2) the SNR by engaging/disengaging the directional microphone setting. During the speech-in-noise task, electroencephalography (EEG) signals were recorded using wireless, mobile hardware. EEG correlates of listening performance were investigated using EEG impulse responses, as estimated using the cross-correlation between the recorded EEG signal and the temporal envelope of the audiobook at the output of the HA simulator. RESULTS At the behavioural level, we observed better performance for the NH listeners than for the aHI listeners. Furthermore, the directional microphone setting led to better performance for both participant groups, and when the directional microphone setting was disengaged motivation also improved the performance of the aHI participants. Analysis of the EEG impulse responses showed faster N1P2 responses for both groups and larger N2 peak amplitudes for the aHI group when the directional microphone setting was activated, but no physiological correlates of motivation. SIGNIFICANCE The results of this study indicate that motivation plays an important role for speech understanding in noise. In terms of neuro-steered HAs, our results suggest that the latency of attentional processes is influenced by HA-induced stimulus changes, which can potentially be used for inferring benefit from noise suppression processing automatically. Further research is necessary to identify the neural correlates of motivation as an exclusive top-down process and to combine such features with HA-driven ones for online HA adjustments.
Collapse
Affiliation(s)
- Bojana Mirkovic
- Department of Psychology, University of Oldenburg, Ammerländer Heerstraße 114, 26129, Oldenburg, Germany; Cluster of Excellence "Hearing4all", Oldenburg, Germany.
| | - Stefan Debener
- Department of Psychology, University of Oldenburg, Ammerländer Heerstraße 114, 26129, Oldenburg, Germany; Cluster of Excellence "Hearing4all", Oldenburg, Germany.
| | - Julia Schmidt
- Department of Psychology, University of Oldenburg, Ammerländer Heerstraße 114, 26129, Oldenburg, Germany.
| | - Manuela Jaeger
- Department of Psychology, University of Oldenburg, Ammerländer Heerstraße 114, 26129, Oldenburg, Germany.
| | - Tobias Neher
- Institute of Clinical Research, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark.
| |
Collapse
|
39
|
Aroudi A, Mirkovic B, De Vos M, Doclo S. Impact of Different Acoustic Components on EEG-Based Auditory Attention Decoding in Noisy and Reverberant Conditions. IEEE Trans Neural Syst Rehabil Eng 2019; 27:652-663. [DOI: 10.1109/tnsre.2019.2903404] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
40
|
Alickovic E, Lunner T, Gustafsson F, Ljung L. A Tutorial on Auditory Attention Identification Methods. Front Neurosci 2019; 13:153. [PMID: 30941002 PMCID: PMC6434370 DOI: 10.3389/fnins.2019.00153] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 02/11/2019] [Indexed: 01/14/2023] Open
Abstract
Auditory attention identification methods attempt to identify the sound source of a listener's interest by analyzing measurements of electrophysiological data. We present a tutorial on the numerous techniques that have been developed in recent decades, and we present an overview of current trends in multivariate correlation-based and model-based learning frameworks. The focus is on the use of linear relations between electrophysiological and audio data. The way in which these relations are computed differs. For example, canonical correlation analysis (CCA) finds a linear subset of electrophysiological data that best correlates to audio data and a similar subset of audio data that best correlates to electrophysiological data. Model-based (encoding and decoding) approaches focus on either of these two sets. We investigate the similarities and differences between these linear model philosophies. We focus on (1) correlation-based approaches (CCA), (2) encoding/decoding models based on dense estimation, and (3) (adaptive) encoding/decoding models based on sparse estimation. The specific focus is on sparsity-driven adaptive encoding models and comparing the methodology in state-of-the-art models found in the auditory literature. Furthermore, we outline the main signal processing pipeline for how to identify the attended sound source in a cocktail party environment from the raw electrophysiological data with all the necessary steps, complemented with the necessary MATLAB code and the relevant references for each step. Our main aim is to compare the methodology of the available methods, and provide numerical illustrations to some of them to get a feeling for their potential. A thorough performance comparison is outside the scope of this tutorial.
Collapse
Affiliation(s)
- Emina Alickovic
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | - Thomas Lunner
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Hearing Systems, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
- Swedish Institute for Disability Research, Linnaeus Centre HEAD, Linkoping University, Linkoping, Sweden
| | - Fredrik Gustafsson
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| | - Lennart Ljung
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| |
Collapse
|