1
|
Fan C, Zhang H, Huang W, Xue J, Tao J, Yi J, Lv Z, Wu X. DGSD: Dynamical graph self-distillation for EEG-based auditory spatial attention detection. Neural Netw 2024; 179:106580. [PMID: 39096751 DOI: 10.1016/j.neunet.2024.106580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 07/20/2024] [Accepted: 07/25/2024] [Indexed: 08/05/2024]
Abstract
Auditory Attention Detection (AAD) aims to detect the target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural networks designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0% and 79.6% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.
Collapse
Affiliation(s)
- Cunhang Fan
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Hongyu Zhang
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Wei Huang
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Jun Xue
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Jianhua Tao
- Department of Automation, Tsinghua University, Beijing 100190, China
| | - Jiangyan Yi
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Zhao Lv
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| | - Xiaopei Wu
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| |
Collapse
|
2
|
Puffay C, Vanthornhout J, Gillis M, Clercq PD, Accou B, Hamme HV, Francart T. Classifying coherent versus nonsense speech perception from EEG using linguistic speech features. Sci Rep 2024; 14:18922. [PMID: 39143297 PMCID: PMC11324895 DOI: 10.1038/s41598-024-69568-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 08/06/2024] [Indexed: 08/16/2024] Open
Abstract
When a person listens to natural speech, the relation between features of the speech signal and the corresponding evoked electroencephalogram (EEG) is indicative of neural processing of the speech signal. Using linguistic representations of speech, we investigate the differences in neural processing between speech in a native and foreign language that is not understood. We conducted experiments using three stimuli: a comprehensible language, an incomprehensible language, and randomly shuffled words from a comprehensible language, while recording the EEG signal of native Dutch-speaking participants. We modeled the neural tracking of linguistic features of the speech signals using a deep-learning model in a match-mismatch task that relates EEG signals to speech, while accounting for lexical segmentation features reflecting acoustic processing. The deep learning model effectively classifies coherent versus nonsense languages. We also observed significant differences in tracking patterns between comprehensible and incomprehensible speech stimuli within the same language. It demonstrates the potential of deep learning frameworks in measuring speech understanding objectively.
Collapse
Affiliation(s)
- Corentin Puffay
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium.
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium.
| | | | - Marlies Gillis
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium
| | | | - Bernd Accou
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium
| | - Tom Francart
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium.
| |
Collapse
|
3
|
Mahjoory K, Bahmer A, Henry MJ. Convolutional neural networks can identify brain interactions involved in decoding spatial auditory attention. PLoS Comput Biol 2024; 20:e1012376. [PMID: 39116183 PMCID: PMC11335149 DOI: 10.1371/journal.pcbi.1012376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 08/20/2024] [Accepted: 07/30/2024] [Indexed: 08/10/2024] Open
Abstract
Human listeners have the ability to direct their attention to a single speaker in a multi-talker environment. The neural correlates of selective attention can be decoded from a single trial of electroencephalography (EEG) data. In this study, leveraging the source-reconstructed and anatomically-resolved EEG data as inputs, we sought to employ CNN as an interpretable model to uncover task-specific interactions between brain regions, rather than simply to utilize it as a black box decoder. To this end, our CNN model was specifically designed to learn pairwise interaction representations for 10 cortical regions from five-second inputs. By exclusively utilizing these features for decoding, our model was able to attain a median accuracy of 77.56% for within-participant and 65.14% for cross-participant classification. Through ablation analysis together with dissecting the features of the models and applying cluster analysis, we were able to discern the presence of alpha-band-dominated inter-hemisphere interactions, as well as alpha- and beta-band dominant interactions that were either hemisphere-specific or were characterized by a contrasting pattern between the right and left hemispheres. These interactions were more pronounced in parietal and central regions for within-participant decoding, but in parietal, central, and partly frontal regions for cross-participant decoding. These findings demonstrate that our CNN model can effectively utilize features known to be important in auditory attention tasks and suggest that the application of domain knowledge inspired CNNs on source-reconstructed EEG data can offer a novel computational framework for studying task-relevant brain interactions.
Collapse
Affiliation(s)
- Keyvan Mahjoory
- Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Andreas Bahmer
- RheinMain University of Applied Sciences Campus Ruesselsheim, Wiesbaden, Germany
| | - Molly J. Henry
- Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology, Toronto Metropolitan University, Toronto, Ontario, Canada
| |
Collapse
|
4
|
Tanveer MA, Skoglund MA, Bernhardsson B, Alickovic E. Deep learning-based auditory attention decoding in listeners with hearing impairment . J Neural Eng 2024; 21:036022. [PMID: 38729132 DOI: 10.1088/1741-2552/ad49d7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 05/10/2024] [Indexed: 05/12/2024]
Abstract
Objective.This study develops a deep learning (DL) method for fast auditory attention decoding (AAD) using electroencephalography (EEG) from listeners with hearing impairment (HI). It addresses three classification tasks: differentiating noise from speech-in-noise, classifying the direction of attended speech (left vs. right) and identifying the activation status of hearing aid noise reduction algorithms (OFF vs. ON). These tasks contribute to our understanding of how hearing technology influences auditory processing in the hearing-impaired population.Approach.Deep convolutional neural network (DCNN) models were designed for each task. Two training strategies were employed to clarify the impact of data splitting on AAD tasks: inter-trial, where the testing set used classification windows from trials that the training set had not seen, and intra-trial, where the testing set used unseen classification windows from trials where other segments were seen during training. The models were evaluated on EEG data from 31 participants with HI, listening to competing talkers amidst background noise.Main results.Using 1 s classification windows, DCNN models achieve accuracy (ACC) of 69.8%, 73.3% and 82.9% and area-under-curve (AUC) of 77.2%, 80.6% and 92.1% for the three tasks respectively on inter-trial strategy. In the intra-trial strategy, they achieved ACC of 87.9%, 80.1% and 97.5%, along with AUC of 94.6%, 89.1%, and 99.8%. Our DCNN models show good performance on short 1 s EEG samples, making them suitable for real-world applications. Conclusion: Our DCNN models successfully addressed three tasks with short 1 s EEG windows from participants with HI, showcasing their potential. While the inter-trial strategy demonstrated promise for assessing AAD, the intra-trial approach yielded inflated results, underscoring the important role of proper data splitting in EEG-based AAD tasks.Significance.Our findings showcase the promising potential of EEG-based tools for assessing auditory attention in clinical contexts and advancing hearing technology, while also promoting further exploration of alternative DL architectures and their potential constraints.
Collapse
Affiliation(s)
- M Asjid Tanveer
- Department of Automatic Control, Lund University, Lund, Sweden
| | - Martin A Skoglund
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Electrical Engineering, Linköping University, Linkoping, Sweden
| | - Bo Bernhardsson
- Department of Automatic Control, Lund University, Lund, Sweden
| | - Emina Alickovic
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Electrical Engineering, Linköping University, Linkoping, Sweden
| |
Collapse
|
5
|
Park D, Park H, Kim S, Choo S, Lee S, Nam CS, Jung JY. Spatio-Temporal Explanation of 3D-EEGNet for Motor Imagery EEG Classification Using Permutation and Saliency. IEEE Trans Neural Syst Rehabil Eng 2023; 31:4504-4513. [PMID: 37934650 DOI: 10.1109/tnsre.2023.3330922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Recently, convolutional neural network (CNN)-based classification models have shown good performance for motor imagery (MI) brain-computer interfaces (BCI) using electroencephalogram (EEG) in end-to-end learning. Although a few explainable artificial intelligence (XAI) techniques have been developed, it is still challenging to interpret the CNN models for EEG-based BCI classification effectively. In this research, we propose 3D-EEGNet as a 3D CNN model to improve both the explainability and performance of MI EEG classification. The proposed approach exhibited better performances on two MI EEG datasets than the existing EEGNet, which uses a 2D input shape. The MI classification accuracies are improved around 1.8% and 6.1% point in average on the datasets, respectively. The permutation-based XAI method is first applied for the reliable explanation of the 3D-EEGNet. Next, to find a faster XAI method for spatio-temporal explanation, we design a novel technique based on the normalized discounted cumulative gain (NDCG) for selecting the best among a few saliency-based methods due to their higher time complexity than the permutation-based method. Among the saliency-based methods, DeepLIFT was selected because the NDCG scores indicated its results are the most similar to the permutation-based results. Finally, the fast spatio-temporal explanation using DeepLIFT provides deeper understanding for the classification results of the 3D-EEGNet and the important properties in the MI EEG experiments.
Collapse
|
6
|
Puffay C, Vanthornhout J, Gillis M, Accou B, Van Hamme H, Francart T. Robust neural tracking of linguistic speech representations using a convolutional neural network. J Neural Eng 2023; 20:046040. [PMID: 37595606 DOI: 10.1088/1741-2552/acf1ce] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/18/2023] [Indexed: 08/20/2023]
Abstract
Objective.When listening to continuous speech, populations of neurons in the brain track different features of the signal. Neural tracking can be measured by relating the electroencephalography (EEG) and the speech signal. Recent studies have shown a significant contribution of linguistic features over acoustic neural tracking using linear models. However, linear models cannot model the nonlinear dynamics of the brain. To overcome this, we use a convolutional neural network (CNN) that relates EEG to linguistic features using phoneme or word onsets as a control and has the capacity to model non-linear relations.Approach.We integrate phoneme- and word-based linguistic features (phoneme surprisal, cohort entropy (CE), word surprisal (WS) and word frequency (WF)) in our nonlinear CNN model and investigate if they carry additional information on top of lexical features (phoneme and word onsets). We then compare the performance of our nonlinear CNN with that of a linear encoder and a linearized CNN.Main results.For the non-linear CNN, we found a significant contribution of CE over phoneme onsets and of WS and WF over word onsets. Moreover, the non-linear CNN outperformed the linear baselines.Significance.Measuring coding of linguistic features in the brain is important for auditory neuroscience research and applications that involve objectively measuring speech understanding. With linear models, this is measurable, but the effects are very small. The proposed non-linear CNN model yields larger differences between linguistic and lexical models and, therefore, could show effects that would otherwise be unmeasurable and may, in the future, lead to improved within-subject measures and shorter recordings.
Collapse
Affiliation(s)
- Corentin Puffay
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | | | - Marlies Gillis
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Bernd Accou
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| |
Collapse
|
7
|
Cai S, Li J, Yang H, Li H. RGCnet: An Efficient Recursive Gated Convolutional Network for EEG-based Auditory Attention Detection. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083536 DOI: 10.1109/embc40787.2023.10340432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Humans are able to listen to one speaker and disregard others in a speaking crowd, referred to as the "cocktail party effect". EEG-based auditory attention detection (AAD) seeks to identify whom a listener is listening to by decoding one's EEG signals. Recent research has demonstrated that the self-attention mechanism is effective for AAD. In this paper, we present the Recursive Gated Convolutional network (RGCnet) for AAD, which implements long-range and high-order interactions as a self-attention mechanism, while maintaining a low computational cost. The RGCnet expands the 2nd order feature interactions to a higher order to model the complex interactions between EEG features. We evaluate RGCnet on two public datasets and compare it with other AAD models. Our results demonstrate that RGCnet outperforms other comparative models under various conditions, thus potentially improving the control of neuro-steered hearing devices.
Collapse
|
8
|
De Clercq P, Vanthornhout J, Vandermosten M, Francart T. Beyond linear neural envelope tracking: a mutual information approach. J Neural Eng 2023; 20. [PMID: 36812597 DOI: 10.1088/1741-2552/acbe1d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 02/22/2023] [Indexed: 02/24/2023]
Abstract
Objective.The human brain tracks the temporal envelope of speech, which contains essential cues for speech understanding. Linear models are the most common tool to study neural envelope tracking. However, information on how speech is processed can be lost since nonlinear relations are precluded. Analysis based on mutual information (MI), on the other hand, can detect both linear and nonlinear relations and is gradually becoming more popular in the field of neural envelope tracking. Yet, several different approaches to calculating MI are applied with no consensus on which approach to use. Furthermore, the added value of nonlinear techniques remains a subject of debate in the field. The present paper aims to resolve these open questions.Approach.We analyzed electroencephalography (EEG) data of participants listening to continuous speech and applied MI analyses and linear models.Main results.Comparing the different MI approaches, we conclude that results are most reliable and robust using the Gaussian copula approach, which first transforms the data to standard Gaussians. With this approach, the MI analysis is a valid technique for studying neural envelope tracking. Like linear models, it allows spatial and temporal interpretations of speech processing, peak latency analyses, and applications to multiple EEG channels combined. In a final analysis, we tested whether nonlinear components were present in the neural response to the envelope by first removing all linear components in the data. We robustly detected nonlinear components on the single-subject level using the MI analysis.Significance.We demonstrate that the human brain processes speech in a nonlinear way. Unlike linear models, the MI analysis detects such nonlinear relations, proving its added value to neural envelope tracking. In addition, the MI analysis retains spatial and temporal characteristics of speech processing, an advantage lost when using more complex (nonlinear) deep neural networks.
Collapse
Affiliation(s)
- Pieter De Clercq
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
9
|
Accou B, Vanthornhout J, Hamme HV, Francart T. Decoding of the speech envelope from EEG using the VLAAI deep neural network. Sci Rep 2023; 13:812. [PMID: 36646740 PMCID: PMC9842721 DOI: 10.1038/s41598-022-27332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 12/30/2022] [Indexed: 01/18/2023] Open
Abstract
To investigate the processing of speech in the brain, commonly simple linear models are used to establish a relationship between brain signals and speech features. However, these linear models are ill-equipped to model a highly-dynamic, complex non-linear system like the brain, and they often require a substantial amount of subject-specific training data. This work introduces a novel speech decoder architecture: the Very Large Augmented Auditory Inference (VLAAI) network. The VLAAI network outperformed state-of-the-art subject-independent models (median Pearson correlation of 0.19, p < 0.001), yielding an increase over the well-established linear model by 52%. Using ablation techniques, we identified the relative importance of each part of the VLAAI network and found that the non-linear components and output context module influenced model performance the most (10% relative performance increase). Subsequently, the VLAAI network was evaluated on a holdout dataset of 26 subjects and a publicly available unseen dataset to test generalization for unseen subjects and stimuli. No significant difference was found between the default test and the holdout subjects, and between the default test set and the public dataset. The VLAAI network also significantly outperformed all baseline models on the public dataset. We evaluated the effect of training set size by training the VLAAI network on data from 1 up to 80 subjects and evaluated on 26 holdout subjects, revealing a relationship following a hyperbolic tangent function between the number of subjects in the training set and the performance on unseen subjects. Finally, the subject-independent VLAAI network was finetuned for 26 holdout subjects to obtain subject-specific VLAAI models. With 5 minutes of data or more, a significant performance improvement was found, up to 34% (from 0.18 to 0.25 median Pearson correlation) with regards to the subject-independent VLAAI network.
Collapse
Affiliation(s)
- Bernd Accou
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium. .,PSI, Department of Electrical Engineering, KU Leuven, Leuven, Belgium.
| | | | - Hugo Van Hamme
- PSI, Department of Electrical Engineering, KU Leuven, Leuven, Belgium
| | - Tom Francart
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium.
| |
Collapse
|
10
|
Decoding the cognitive states of attention and distraction in a real-life setting using EEG. Sci Rep 2022; 12:20649. [PMID: 36450871 PMCID: PMC9712397 DOI: 10.1038/s41598-022-24417-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 11/15/2022] [Indexed: 12/11/2022] Open
Abstract
Lapses in attention can have serious consequences in situations such as driving a car, hence there is considerable interest in tracking it using neural measures. However, as most of these studies have been done in highly controlled and artificial laboratory settings, we want to explore whether it is also possible to determine attention and distraction using electroencephalogram (EEG) data collected in a natural setting using machine/deep learning. 24 participants volunteered for the study. Data were collected from pairs of participants simultaneously while they engaged in Tibetan Monastic debate, a practice that is interesting because it is a real-life situation that generates substantial variability in attention states. We found that attention was on average associated with increased left frontal alpha, increased left parietal theta, and decreased central delta compared to distraction. In an attempt to predict attention and distraction, we found that a Long Short Term Memory model classified attention and distraction with maximum accuracy of 95.86% and 95.4% corresponding to delta and theta waves respectively. This study demonstrates that EEG data collected in a real-life setting can be used to predict attention states in participants with good accuracy, opening doors for developing Brain-Computer Interfaces that track attention in real-time using data extracted in daily life settings, rendering them much more usable.
Collapse
|
11
|
Xu Z, Bai Y, Zhao R, Zheng Q, Ni G, Ming D. Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction. Hear Res 2022; 422:108552. [PMID: 35714555 DOI: 10.1016/j.heares.2022.108552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 06/01/2022] [Accepted: 06/08/2022] [Indexed: 11/23/2022]
Abstract
In the cocktail party circumstance, the human auditory system extracts the information from a specific speaker of interest and ignores others. Many studies have focused on auditory attention decoding (AAD), but the stimulation materials were mainly non-tonal languages. We used a tonal language (Mandarin) as the speech stimulus and constructed a Long Short-Term Memory (LSTM) architecture for speech envelope reconstruction based on electroencephalogram (EEG) data. The correlation coefficient between the reconstructed and candidate envelopes was calculated to determine the subject's auditory attention. The proposed LSTM architecture outperformed the linear models. The average decoding accuracy in cross-subject and inter-subject cases varies from 63.02 to 74.29%, with the highest accuracy rate of 89.1% in a decision window of 0.15 s. In addition, the beta-band rhythm was found to play an essential role in identifying the attention and the non-attention state. These results provide a new AAD architecture to help develop neuro-steered hearing devices, especially for tonal languages.
Collapse
Affiliation(s)
- Zihao Xu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Yanru Bai
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Ran Zhao
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Qi Zheng
- Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Guangjian Ni
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China; Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China.
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China; Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
12
|
Thornton M, Mandic D, Reichenbach T. Robust decoding of the speech envelope from EEG recordings through deep neural networks. J Neural Eng 2022; 19. [PMID: 35709698 DOI: 10.1088/1741-2552/ac7976] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 06/16/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVE Smart hearing aids which can decode the focus of a user's attention could considerably improve comprehension levels in noisy environments. Methods for decoding auditory attention from electroencephalography (EEG) have attracted considerable interest for this reason. Recent studies suggest that the integration of deep neural networks (DNNs) into existing auditory attention decoding algorithms is highly beneficial, although it remains unclear whether these enhanced algorithms can perform robustly in different real-world scenarios. To this end, we sought to characterise the performance of DNNs at reconstructing the envelope of an attended speech stream from EEG recordings in different listening conditions. In addition, given the relatively sparse availability of EEG data, we investigate possibility of applying subject-independent algorithms to EEG recorded from unseen individuals. APPROACH Both linear models and nonlinear DNNs were employed to decode the envelope of clean speech from EEG recordings, with and without subject-specific information. The mean behaviour, as well as the variability of the reconstruction, was characterised for each model. We then trained subject-specific linear models and DNNs to reconstruct the envelope of speech in clean and noisy conditions, and investigated how well they performed in different listening scenarios. We also established that these models can be used to decode auditory attention in competing-speaker scenarios. MAIN RESULTS The DNNs offered a considerable advantage over their linear counterpart at reconstructing the envelope of clean speech. This advantage persisted even when subject-specific information was unavailable at the time of training. The same DNN architectures generalised to a distinct dataset, which contained EEG recorded under a variety of listening conditions. In competing-speakers and speech-in-noise conditions, the DNNs significantly outperformed the linear models. Finally, the DNNs offered a considerable improvement over the linear approach at decoding auditory attention in competing-speakers scenarios. SIGNIFICANCE We present the first detailed study into the extent to which DNNs can be employed for reconstructing the envelope of an attended speech stream. We conclusively demonstrate that DNNs have the ability to improve the reconstruction of the attended speech envelope. The variance of the reconstruction error is shown to be similar for both DNNs and the linear model. Overall, DNNs are demonstrated to show promise for real-world auditory attention decoding, since they perform well in multiple listening conditions and generalise to data recorded from unseen participants.
Collapse
Affiliation(s)
- Mike Thornton
- Imperial College London, South Kensington Campus, London, SW7 2AZ, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Danilo Mandic
- Department of Electrical and Electronic Engineering, Imperial College London, Exhibition Road, London SW7 2BT, London, SW7 2AZ, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
| | - Tobias Reichenbach
- Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universitat Erlangen-Nurnberg, Konrad-Zuse-Strasse 3, Erlangen, 91056, GERMANY
| |
Collapse
|
13
|
A neuroscience-inspired spiking neural network for EEG-based auditory spatial attention detection. Neural Netw 2022; 152:555-565. [DOI: 10.1016/j.neunet.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 03/02/2022] [Accepted: 05/02/2022] [Indexed: 11/18/2022]
|
14
|
Su E, Cai S, Xie L, Li H, Schultz T. STAnet: A Spatiotemporal Attention Network for Decoding Auditory Spatial Attention from EEG. IEEE Trans Biomed Eng 2022; 69:2233-2242. [PMID: 34982671 DOI: 10.1109/tbme.2022.3140246] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Humans are able to localize the source of a sound. This enables them to direct attention to a particular speaker in a cocktail party. Psycho-acoustic studies show that the sensory cortices of the human brain respond to the location of sound sources differently, and the auditory attention itself is a dynamic and temporally based brain activity. In this work, we seek to build a computational model which uses both spatial and temporal information manifested in EEG signals for auditory spatial attention detection (ASAD). METHODS We propose an end-to-end spatiotemporal attention network, denoted as STAnet, to detect auditory spatial attention from EEG. The STAnet is designed to assign differentiated weights dynamically to EEG channels through a spatial attention mechanism, and to temporal patterns in EEG signals through a temporal attention mechanism. RESULTS We report the ASAD experiments on two publicly available datasets. The STAnet outperforms other competitive models by a large margin under various experimental conditions. Its attention decision for 1-second decision window outperforms that of the state-of-the-art techniques for 10-second decision window. Experimental results also demonstrate that the STAnet achieves competitive performance on EEG signals ranging from 64 to as few as 16 channels. CONCLUSION This study provides evidence suggesting that efficient low-density EEG online decoding is within reach. SIGNIFICANCE This study also marks an important step towards the practical implementation of ASAD in real life applications.
Collapse
|
15
|
Reddy Katthi J, Ganapathy S. Deep Correlation Analysis for Audio-EEG Decoding. IEEE Trans Neural Syst Rehabil Eng 2021; 29:2742-2753. [PMID: 34874861 DOI: 10.1109/tnsre.2021.3129790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The electroencephalography (EEG), which is one of the easiest modes of recording brain activations in a non-invasive manner, is often distorted due to recording artifacts which adversely impacts the stimulus-response analysis. The most prominent techniques thus far attempt to improve the stimulus-response correlations using linear methods. In this paper, we propose a neural network based correlation analysis framework that significantly improves over the linear methods for auditory stimuli. A deep model is proposed for intra-subject audio-EEG analysis based on directly optimizing the correlation loss. Further, a neural network model with a shared encoder architecture is proposed for improving the inter-subject stimulus response correlations. These models attempt to suppress the EEG artifacts while preserving the components related to the stimulus. Several experiments are performed using EEG recordings from subjects listening to speech and music stimuli. In these experiments, we show that the deep models improve the Pearson correlation significantly over the linear methods (average absolute improvements of 7.4% in speech tasks and 29.3% in music tasks). We also analyze the impact of several model parameters on the stimulus-response correlation.
Collapse
|
16
|
Decoding Object-Based Auditory Attention from Source-Reconstructed MEG Alpha Oscillations. J Neurosci 2021; 41:8603-8617. [PMID: 34429378 DOI: 10.1523/jneurosci.0583-21.2021] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/08/2021] [Accepted: 08/11/2021] [Indexed: 11/21/2022] Open
Abstract
How do we attend to relevant auditory information in complex naturalistic scenes? Much research has focused on detecting which information is attended, without regarding underlying top-down control mechanisms. Studies investigating attentional control generally manipulate and cue specific features in simple stimuli. However, in naturalistic scenes it is impossible to dissociate relevant from irrelevant information based on low-level features. Instead, the brain has to parse and select auditory objects of interest. The neural underpinnings of object-based auditory attention remain not well understood. Here we recorded MEG while 15 healthy human subjects (9 female) prepared for the repetition of an auditory object presented in one of two overlapping naturalistic auditory streams. The stream containing the repetition was prospectively cued with 70% validity. Crucially, this task could not be solved by attending low-level features, but only by processing the objects fully. We trained a linear classifier on the cortical distribution of source-reconstructed oscillatory activity to distinguish which auditory stream was attended. We could successfully classify the attended stream from alpha (8-14 Hz) activity in anticipation of repetition onset. Importantly, attention could only be classified from trials in which subjects subsequently detected the repetition, but not from miss trials. Behavioral relevance was further supported by a correlation between classification accuracy and detection performance. Decodability was not sustained throughout stimulus presentation, but peaked shortly before repetition onset, suggesting that attention acted transiently according to temporal expectations. We thus demonstrate anticipatory alpha oscillations to underlie top-down control of object-based auditory attention in complex naturalistic scenes.SIGNIFICANCE STATEMENT In everyday life, we often find ourselves bombarded with auditory information, from which we need to select what is relevant to our current goals. Previous research has highlighted how we attend to specific highly controlled aspects of the auditory input. Although invaluable, it is still unclear how this relates to attentional control in naturalistic auditory scenes. Here we used the high precision of magnetoencephalography in space and time to investigate the brain mechanisms underlying top-down control of object-based attention in ecologically valid sound scenes. We show that rhythmic activity in auditory association cortex at a frequency of ∼10 Hz (alpha waves) controls attention to currently relevant segments within the auditory scene and predicts whether these segments are subsequently detected.
Collapse
|
17
|
Drgas S, Blaszak M, Przekoracka-Krawczyk A. The Combination of Neural Tracking and Alpha Power Lateralization for Auditory Attention Detection. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3603-3616. [PMID: 34403288 DOI: 10.1044/2021_jslhr-20-00608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose The acoustic source that is attended to by the listener in a mixture can be identified with a certain accuracy on the basis of their neural response recorded during listening, and various phenomena may be used to detect attention. For example, neural tracking (NT) and alpha power lateralization (APL) may be utilized in order to obtain information concerning attention. However, these methods of auditory attention detection (AAD) are typically tested in different experimental setups, which makes it impossible to compare their accuracy. The aim of this study is to compare the accuracy of AAD based on NT, APL, and their combination for a dichotic natural speech listening task. Method Thirteen adult listeners were presented with dichotic speech stimuli and instructed to attend to one of them. Electroencephalogram of the subjects was continuously recorded during the experiment using a set of 32 active electrodes. The accuracy of AAD was evaluated for trial lengths of 50, 25, and 12.5 s. AAD was tested for various parameters of NT- and APL-based modules. Results The obtained results suggest that NT of natural running speech provides similar accuracy to APL. The statistically significant improvement of the accuracy of AAD using a combined method has been observed not only for the longest duration of test samples (50 s, p = .005) but also for shorter ones (25 s, p = .011). Conclusions It seems that the combination of standard NT and APL significantly increases the effectiveness of accurate identification of the traced signal perceived by a listener under dichotic conditions. It has been demonstrated that, under certain conditions, the combination of NT and APL may provide a benefit for AAD in cocktail party scenarios.
Collapse
Affiliation(s)
- Szymon Drgas
- Institute of Automation and Robotics, Poznań University of Technology, Poland
| | - Magdalena Blaszak
- Department of Medical Physics and Radiospectroscopy, Faculty of Physics, Adam Mickiewicz University, Poznań, Poland
- Vision and Neuroscience Laboratory, NanoBioMedical Centre, Adam Mickiewicz University, Poznań, Poland
| | - Anna Przekoracka-Krawczyk
- Vision and Neuroscience Laboratory, NanoBioMedical Centre, Adam Mickiewicz University, Poznań, Poland
- Laboratory of Vision Science and Optometry, Faculty of Physics, Adam Mickiewicz University, Poznań, Poland
| |
Collapse
|
18
|
Lu Y, Wang M, Yao L, Shen H, Wu W, Zhang Q, Zhang L, Chen M, Liu H, Peng R, Liu M, Chen S. Auditory attention decoding from electroencephalography based on long short-term memory networks. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102966] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
19
|
Kuruvila I, Muncke J, Fischer E, Hoppe U. Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model. Front Physiol 2021; 12:700655. [PMID: 34408661 PMCID: PMC8365753 DOI: 10.3389/fphys.2021.700655] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 07/05/2021] [Indexed: 11/25/2022] Open
Abstract
Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.
Collapse
Affiliation(s)
- Ivine Kuruvila
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Jan Muncke
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | | | - Ulrich Hoppe
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
20
|
Geravanchizadeh M, Roushan H. Dynamic selective auditory attention detection using RNN and reinforcement learning. Sci Rep 2021; 11:15497. [PMID: 34326401 PMCID: PMC8322190 DOI: 10.1038/s41598-021-94876-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 07/16/2021] [Indexed: 11/08/2022] Open
Abstract
The cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. The proposed dynamic SAAD is modeled as a sequential decision-making problem, which is solved by recurrent neural network (RNN) and reinforcement learning methods of Q-learning and deep Q-learning. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach with RNN as agent provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.
Collapse
Affiliation(s)
- Masoud Geravanchizadeh
- Faculty of Electrical & Computer Engineering, University of Tabriz, 51666-15813, Tabriz, Iran.
| | - Hossein Roushan
- Faculty of Electrical & Computer Engineering, University of Tabriz, 51666-15813, Tabriz, Iran
| |
Collapse
|
21
|
Cai S, Li P, Su E, Xie L. Auditory Attention Detection via Cross-Modal Attention. Front Neurosci 2021; 15:652058. [PMID: 34366770 PMCID: PMC8333999 DOI: 10.3389/fnins.2021.652058] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/24/2021] [Indexed: 11/13/2022] Open
Abstract
Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.
Collapse
Affiliation(s)
| | | | | | - Longhan Xie
- Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
22
|
EEG-Based Closed-Loop Neurofeedback for Attention Monitoring and Training in Young Adults. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:5535810. [PMID: 34234929 PMCID: PMC8219410 DOI: 10.1155/2021/5535810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 06/02/2021] [Accepted: 06/07/2021] [Indexed: 12/05/2022]
Abstract
Attention is an important mechanism for young adults, whose lives largely involve interacting with media and performing technology multitasking. Nevertheless, the existing studies related to attention are characterized by low accuracy and poor attention levels in terms of attention monitoring and inefficiency during attention training. In this paper, we propose an improved random forest- (IRF-) algorithm-based attention monitoring and training method with closed-loop neurofeedback. For attention monitoring, an IRF classifier that uses grid search optimization and multiple cross-validation to improve monitoring accuracy and performance is utilized, and five attention levels are proposed. For attention training, we develop three training modes with neurofeedback corresponding to sustained attention, selective attention, and focus attention and apply a self-control method with four indicators to validate the resulting training effect. An offline experiment based on the Personal EEG Concentration Tasks dataset and an online experiment involving 10 young adults are conducted. The results show that our proposed IRF-algorithm-based attention monitoring approach achieves an average accuracy of 79.34%, thereby outperforming the current state-of-the-art algorithms. Furthermore, when excluding familiarity with the game environment, statistically significant performance improvements (p < 0.05) are achieved by the 10 young adults after attention training, which demonstrates the effectiveness of the proposed serious games. Our work involving the proposed method of attention monitoring and training proves to be reliable and efficient.
Collapse
|
23
|
Wang L, Wu EX, Chen F. EEG-based auditory attention decoding using speech-level-based segmented computational models. J Neural Eng 2021; 18. [PMID: 33957606 DOI: 10.1088/1741-2552/abfeba] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
Objective.Auditory attention in complex scenarios can be decoded by electroencephalography (EEG)-based cortical speech-envelope tracking. The relative root-mean-square (RMS) intensity is a valuable cue for the decomposition of speech into distinct characteristic segments. To improve auditory attention decoding (AAD) performance, this work proposed a novel segmented AAD approach to decode target speech envelopes from different RMS-level-based speech segments.Approach.Speech was decomposed into higher- and lower-RMS-level speech segments with a threshold of -10 dB relative RMS level. A support vector machine classifier was designed to identify higher- and lower-RMS-level speech segments, using clean target and mixed speech as reference signals based on corresponding EEG signals recorded when subjects listened to target auditory streams in competing two-speaker auditory scenes. Segmented computational models were developed with the classification results of higher- and lower-RMS-level speech segments. Speech envelopes were reconstructed based on segmented decoding models for either higher- or lower-RMS-level speech segments. AAD accuracies were calculated according to the correlations between actual and reconstructed speech envelopes. The performance of the proposed segmented AAD computational model was compared to those of traditional AAD methods with unified decoding functions.Main results.Higher- and lower-RMS-level speech segments in continuous sentences could be identified robustly with classification accuracies that approximated or exceeded 80% based on corresponding EEG signals at 6 dB, 3 dB, 0 dB, -3 dB and -6 dB signal-to-mask ratios (SMRs). Compared with unified AAD decoding methods, the proposed segmented AAD approach achieved more accurate results in the reconstruction of target speech envelopes and in the detection of attentional directions. Moreover, the proposed segmented decoding method had higher information transfer rates (ITRs) and shorter minimum expected switch times compared with the unified decoder.Significance.This study revealed that EEG signals may be used to classify higher- and lower-RMS-level-based speech segments across a wide range of SMR conditions (from 6 dB to -6 dB). A novel finding was that the specific information in different RMS-level-based speech segments facilitated EEG-based decoding of auditory attention. The significantly improved AAD accuracies and ITRs of the segmented decoding method suggests that this proposed computational model may be an effective method for the application of neuro-controlled brain-computer interfaces in complex auditory scenes.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
24
|
Vandecappelle S, Deckers L, Das N, Ansari AH, Bertrand A, Francart T. EEG-based detection of the locus of auditory attention with convolutional neural networks. eLife 2021; 10:e56481. [PMID: 33929315 PMCID: PMC8143791 DOI: 10.7554/elife.56481] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/28/2021] [Indexed: 01/16/2023] Open
Abstract
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Collapse
Affiliation(s)
- Servaas Vandecappelle
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Lucas Deckers
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Neetha Das
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Amir Hossein Ansari
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Tom Francart
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
| |
Collapse
|
25
|
Vandecappelle S, Deckers L, Das N, Ansari AH, Bertrand A, Francart T. EEG-based detection of the locus of auditory attention with convolutional neural networks. eLife 2021; 10:56481. [PMID: 33929315 DOI: 10.1101/475673] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/28/2021] [Indexed: 05/27/2023] Open
Abstract
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Collapse
Affiliation(s)
- Servaas Vandecappelle
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Lucas Deckers
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Neetha Das
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Amir Hossein Ansari
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
| |
Collapse
|
26
|
Kuruvila I, Can Demir K, Fischer E, Hoppe U. Inference of the Selective Auditory Attention Using Sequential LMMSE Estimation. IEEE Trans Biomed Eng 2021; 68:3501-3512. [PMID: 33891545 DOI: 10.1109/tbme.2021.3075337] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Attentive listening in a multispeaker environment such as a cocktail party requires suppression of the interfering speakers and the noise around. People with normal hearing perform remarkably well in such situations. Analysis of the cortical signals using electroencephalography (EEG) has revealed that the EEG signals track the envelope of the attended speech stronger than that of the interfering speech. This has enabled the development of algorithms that can decode the selective attention of a listener in controlled experimental settings. However, often these algorithms require longer trial duration and computationally expensive calibration to obtain a reliable inference of attention. In this paper, we present a novel framework to decode the attention of a listener within trial durations of the order of two seconds. It comprises of three modules: 1) Dynamic estimation of the temporal response functions (TRF) in every trial using a sequential linear minimum mean squared error (LMMSE) estimator, 2) Extract the N1 -P2 peak of the estimated TRF that serves as a marker related to the attentional state, and 3) Obtain a probabilistic measure of the attentional state using a support vector machine followed by a logistic regression. The efficacy of the proposed decoding framework was evaluated using EEG data collected from 27 subjects. The total number of electrodes required to infer the attention was four: One for the signal estimation, one for the noise estimation and the other two being the reference and the ground electrodes. Our results make further progress towards the realization of neuro-steered hearing aids.
Collapse
|
27
|
Mahmud MS, Yeasin M, Bidelman GM. Data-driven machine learning models for decoding speech categorization from evoked brain responses. J Neural Eng 2021; 18. [PMID: 33690177 DOI: 10.1101/2020.08.03.234997] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/09/2021] [Indexed: 05/24/2023]
Abstract
Objective.Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).Approach.We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.Main results. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;F1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Significance.Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States of America
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, United States of America
| |
Collapse
|
28
|
Mahmud MS, Yeasin M, Bidelman GM. Data-driven machine learning models for decoding speech categorization from evoked brain responses. J Neural Eng 2021; 18:10.1088/1741-2552/abecf0. [PMID: 33690177 PMCID: PMC8738965 DOI: 10.1088/1741-2552/abecf0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/09/2021] [Indexed: 11/12/2022]
Abstract
Objective.Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).Approach.We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.Main results. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;F1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Significance.Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States of America
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, United States of America
| |
Collapse
|
29
|
Holtze B, Jaeger M, Debener S, Adiloğlu K, Mirkovic B. Are They Calling My Name? Attention Capture Is Reflected in the Neural Tracking of Attended and Ignored Speech. Front Neurosci 2021; 15:643705. [PMID: 33828451 PMCID: PMC8019946 DOI: 10.3389/fnins.2021.643705] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/19/2021] [Indexed: 11/15/2022] Open
Abstract
Difficulties in selectively attending to one among several speakers have mainly been associated with the distraction caused by ignored speech. Thus, in the current study, we investigated the neural processing of ignored speech in a two-competing-speaker paradigm. For this, we recorded the participant’s brain activity using electroencephalography (EEG) to track the neural representation of the attended and ignored speech envelope. To provoke distraction, we occasionally embedded the participant’s first name in the ignored speech stream. Retrospective reports as well as the presence of a P3 component in response to the name indicate that participants noticed the occurrence of their name. As predicted, the neural representation of the ignored speech envelope increased after the name was presented therein, suggesting that the name had attracted the participant’s attention. Interestingly, in contrast to our hypothesis, the neural tracking of the attended speech envelope also increased after the name occurrence. On this account, we conclude that the name might not have primarily distracted the participants, at most for a brief duration, but that it alerted them to focus to their actual task. These observations remained robust even when the sound intensity of the ignored speech stream, and thus the sound intensity of the name, was attenuated.
Collapse
Affiliation(s)
- Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Kamil Adiloğlu
- Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,HörTech gGmbH, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
30
|
Baek SC, Chung JH, Lim Y. Implementation of an Online Auditory Attention Detection Model with Electroencephalography in a Dichotomous Listening Experiment. SENSORS 2021; 21:s21020531. [PMID: 33451041 PMCID: PMC7828508 DOI: 10.3390/s21020531] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 01/07/2021] [Accepted: 01/09/2021] [Indexed: 11/16/2022]
Abstract
Auditory attention detection (AAD) is the tracking of a sound source to which a listener is attending based on neural signals. Despite expectation for the applicability of AAD in real-life, most AAD research has been conducted on recorded electroencephalograms (EEGs), which is far from online implementation. In the present study, we attempted to propose an online AAD model and to implement it on a streaming EEG. The proposed model was devised by introducing a sliding window into the linear decoder model and was simulated using two datasets obtained from separate experiments to evaluate the feasibility. After simulation, the online model was constructed and evaluated based on the streaming EEG of an individual, acquired during a dichotomous listening experiment. Our model was able to detect the transient direction of a participant's attention on the order of one second during the experiment and showed up to 70% average detection accuracy. We expect that the proposed online model could be applied to develop adaptive hearing aids or neurofeedback training for auditory attention and speech perception.
Collapse
Affiliation(s)
- Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
| | - Jae Ho Chung
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul 04763, Korea
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, Korea
- Correspondence: (J.H.C.); (Y.L.); Tel.: +82-2-31-560-2298 (J.H.C.); +82-2-958-6641 (Y.L.)
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, Korea;
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, Korea
- Research Center for Diagnosis, Treatment and Care System of Dementia, Korea Institute of Science and Technology, Seoul 02792, Korea
- Correspondence: (J.H.C.); (Y.L.); Tel.: +82-2-31-560-2298 (J.H.C.); +82-2-958-6641 (Y.L.)
| |
Collapse
|
31
|
Livezey JA, Glaser JI. Deep learning approaches for neural decoding across architectures and recording modalities. Brief Bioinform 2020; 22:1577-1591. [PMID: 33372958 DOI: 10.1093/bib/bbaa355] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 10/31/2020] [Accepted: 11/04/2020] [Indexed: 12/19/2022] Open
Abstract
Decoding behavior, perception or cognitive state directly from neural signals is critical for brain-computer interface research and an important tool for systems neuroscience. In the last decade, deep learning has become the state-of-the-art method in many machine learning tasks ranging from speech recognition to image segmentation. The success of deep networks in other domains has led to a new wave of applications in neuroscience. In this article, we review deep learning approaches to neural decoding. We describe the architectures used for extracting useful features from neural recording modalities ranging from spikes to functional magnetic resonance imaging. Furthermore, we explore how deep learning has been leveraged to predict common outputs including movement, speech and vision, with a focus on how pretrained deep networks can be incorporated as priors for complex decoding targets like acoustic speech or images. Deep learning has been shown to be a useful tool for improving the accuracy and flexibility of neural decoding across a wide range of tasks, and we point out areas for future scientific development.
Collapse
Affiliation(s)
- Jesse A Livezey
- Neural Systems and Data Science Laboratory at the Lawrence Berkeley National Laboratory. He obtained his PhD in Physics from the University of California, Berkeley
| | - Joshua I Glaser
- Center for Theoretical Neuroscience and Department of Statistics at Columbia University. He obtained his PhD in Neuroscience from Northwestern University
| |
Collapse
|
32
|
Das N, Zegers J, Van hamme H, Francart T, Bertrand A. Linear versus deep learning methods for noisy speech separation for EEG-informed attention decoding. J Neural Eng 2020; 17:046039. [DOI: 10.1088/1741-2552/aba6f8] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
33
|
Jaeger M, Mirkovic B, Bleichner MG, Debener S. Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening. Front Neurosci 2020; 14:603. [PMID: 32612507 PMCID: PMC7308709 DOI: 10.3389/fnins.2020.00603] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 05/15/2020] [Indexed: 11/13/2022] Open
Abstract
Listeners differ in their ability to attend to a speech stream in the presence of a competing sound. Differences in speech intelligibility in noise cannot be fully explained by the hearing ability which suggests the involvement of additional cognitive factors. A better understanding of the temporal fluctuations in the ability to pay selective auditory attention to a desired speech stream may help in explaining these variabilities. In order to better understand the temporal dynamics of selective auditory attention, we developed an online auditory attention decoding (AAD) processing pipeline based on speech envelope tracking in the electroencephalogram (EEG). Participants had to attend to one audiobook story while a second one had to be ignored. Online AAD was applied to track the attention toward the target speech signal. Individual temporal attention profiles were computed by combining an established AAD method with an adaptive staircase procedure. The individual decoding performance over time was analyzed and linked to behavioral performance as well as subjective ratings of listening effort, motivation, and fatigue. The grand average attended speaker decoding profile derived in the online experiment indicated performance above chance level. Parameters describing the individual AAD performance in each testing block indicated significant differences in decoding performance over time to be closely related to the behavioral performance in the selective listening task. Further, an exploratory analysis indicated that subjects with poor decoding performance reported higher listening effort and fatigue compared to good performers. Taken together our results show that online EEG based AAD in a complex listening situation is feasible. Adaptive attended speaker decoding profiles over time could be used as an objective measure of behavioral performance and listening effort. The developed online processing pipeline could also serve as a basis for future EEG based near real-time auditory neurofeedback systems.
Collapse
Affiliation(s)
- Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Neurophysiology of Everyday Life Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
34
|
Ciccarelli G, Nolan M, Perricone J, Calamia PT, Haro S, O'Sullivan J, Mesgarani N, Quatieri TF, Smalt CJ. Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods. Sci Rep 2019; 9:11538. [PMID: 31395905 PMCID: PMC6687829 DOI: 10.1038/s41598-019-47795-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/24/2019] [Indexed: 12/30/2022] Open
Abstract
Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. Here, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the new architecture outperforms the baseline linear stimulus-reconstruction method, improving decoding accuracy from 66% to 81% using wet EEG and from 59% to 87% for dry EEG. Also of note was the finding that the dry EEG system can deliver comparable or even better results than the wet, despite the latter having one third as many EEG channels as the former. The 11-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available for further validation, experimentation, and modification.
Collapse
Affiliation(s)
- Gregory Ciccarelli
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Michael Nolan
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Joseph Perricone
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Paul T Calamia
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Stephanie Haro
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.,Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| | - James O'Sullivan
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Thomas F Quatieri
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.,Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| | - Christopher J Smalt
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.
| |
Collapse
|
35
|
Nogueira W, Dolhopiatenko H, Schierholz I, Büchner A, Mirkovic B, Bleichner MG, Debener S. Decoding Selective Attention in Normal Hearing Listeners and Bilateral Cochlear Implant Users With Concealed Ear EEG. Front Neurosci 2019; 13:720. [PMID: 31379479 PMCID: PMC6657402 DOI: 10.3389/fnins.2019.00720] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 06/26/2019] [Indexed: 11/29/2022] Open
Abstract
Electroencephalography (EEG) data can be used to decode an attended speech source in normal-hearing (NH) listeners using high-density EEG caps, as well as around-the-ear EEG devices. The technology may find application in identifying the target speaker in a cocktail party like scenario and steer speech enhancement algorithms in cochlear implants (CIs). However, the worse spectral resolution and the electrical artifacts introduced by a CI may limit the applicability of this approach to CI users. The goal of this study was to investigate whether selective attention can be decoded in CI users using an around-the-ear EEG system (cEEGrid). The performances of high-density cap EEG recordings and cEEGrid EEG recordings were compared in a selective attention paradigm using an envelope tracking algorithm. Speech from two audio books was presented through insert earphones to NH listeners and via direct audio cable to the CI users. 10 NH listeners and 10 bilateral CI users participated in the study. Participants were instructed to attend to one out of the two concurrent speech streams while data were recorded by a 96-channel scalp EEG and an 18-channel cEEGrid setup simultaneously. Reconstruction performance was evaluated by means of parametric correlations between the reconstructed speech and both, the envelope of the attended and the unattended speech stream. Results confirm the feasibility to decode selective attention by means of single-trial EEG data in NH and CI users using a high-density EEG. All NH listeners and 9 out of 10 CI achieved high decoding accuracies. The cEEGrid was successful in decoding selective attention in 5 out of 10 NH listeners. The same result was obtained for CI users.
Collapse
Affiliation(s)
- Waldo Nogueira
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Hanna Dolhopiatenko
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Irina Schierholz
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Andreas Büchner
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
36
|
Alickovic E, Lunner T, Gustafsson F, Ljung L. A Tutorial on Auditory Attention Identification Methods. Front Neurosci 2019; 13:153. [PMID: 30941002 PMCID: PMC6434370 DOI: 10.3389/fnins.2019.00153] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 02/11/2019] [Indexed: 01/14/2023] Open
Abstract
Auditory attention identification methods attempt to identify the sound source of a listener's interest by analyzing measurements of electrophysiological data. We present a tutorial on the numerous techniques that have been developed in recent decades, and we present an overview of current trends in multivariate correlation-based and model-based learning frameworks. The focus is on the use of linear relations between electrophysiological and audio data. The way in which these relations are computed differs. For example, canonical correlation analysis (CCA) finds a linear subset of electrophysiological data that best correlates to audio data and a similar subset of audio data that best correlates to electrophysiological data. Model-based (encoding and decoding) approaches focus on either of these two sets. We investigate the similarities and differences between these linear model philosophies. We focus on (1) correlation-based approaches (CCA), (2) encoding/decoding models based on dense estimation, and (3) (adaptive) encoding/decoding models based on sparse estimation. The specific focus is on sparsity-driven adaptive encoding models and comparing the methodology in state-of-the-art models found in the auditory literature. Furthermore, we outline the main signal processing pipeline for how to identify the attended sound source in a cocktail party environment from the raw electrophysiological data with all the necessary steps, complemented with the necessary MATLAB code and the relevant references for each step. Our main aim is to compare the methodology of the available methods, and provide numerical illustrations to some of them to get a feeling for their potential. A thorough performance comparison is outside the scope of this tutorial.
Collapse
Affiliation(s)
- Emina Alickovic
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | - Thomas Lunner
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Hearing Systems, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
- Swedish Institute for Disability Research, Linnaeus Centre HEAD, Linkoping University, Linkoping, Sweden
| | - Fredrik Gustafsson
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| | - Lennart Ljung
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| |
Collapse
|