51
|
Presacco A, Miran S, Babadi B, Simon JZ. Real-Time Tracking of Magnetoencephalographic Neuromarkers during a Dynamic Attention-Switching Task. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:4148-4151. [PMID: 31946783 DOI: 10.1109/embc.2019.8857953] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the last few years, a large number of experiments have been focused on exploring the possibility of using non-invasive techniques, such as electroencephalography (EEG) and magnetoencephalography (MEG), to identify auditory-related neuromarkers which are modulated by attention. Results from several studies where participants listen to a story narrated by one speaker, while trying to ignore a different story narrated by a competing speaker, suggest the feasibility of extracting neuromarkers that demonstrate enhanced phase locking to the attended speech stream. These promising findings have the potential to be used in clinical applications, such as EEG-driven hearing aids. One major challenge in achieving this goal is the need to devise an algorithm capable of tracking these neuromarkers in real-time when individuals are given the freedom to repeatedly switch attention among speakers at will. Here we present an algorithm pipeline that is designed to efficiently recognize changes of neural speech tracking during a dynamic-attention switching task and to use them as an input for a near real-time state-space model that translates these neuromarkers into attentional state estimates with a minimal delay. This algorithm pipeline was tested with MEG data collected from participants who had the freedom to change the focus of their attention between two speakers at will. Results suggest the feasibility of using our algorithm pipeline to track changes of attention in near-real time in a dynamic auditory scene.
Collapse
|
52
|
Das N, Vanthornhout J, Francart T, Bertrand A. Stimulus-aware spatial filtering for single-trial neural response and temporal response function estimation in high-density EEG with applications in auditory research. Neuroimage 2020; 204:116211. [PMID: 31546052 PMCID: PMC7355237 DOI: 10.1016/j.neuroimage.2019.116211] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 08/30/2019] [Accepted: 09/17/2019] [Indexed: 12/21/2022] Open
Abstract
A common problem in neural recordings is the low signal-to-noise ratio (SNR), particularly when using non-invasive techniques like magneto- or electroencephalography (M/EEG). To address this problem, experimental designs often include repeated trials, which are then averaged to improve the SNR or to infer statistics that can be used in the design of a denoising spatial filter. However, collecting enough repeated trials is often impractical and even impossible in some paradigms, while analyses on existing data sets may be hampered when these do not contain such repeated trials. Therefore, we present a data-driven method that takes advantage of the knowledge of the presented stimulus, to achieve a joint noise reduction and dimensionality reduction without the need for repeated trials. The method first estimates the stimulus-driven neural response using the given stimulus, which is then used to find a set of spatial filters that maximize the SNR based on a generalized eigenvalue decomposition. As the method is fully data-driven, the dimensionality reduction enables researchers to perform their analyses without having to rely on their knowledge of brain regions of interest, which increases accuracy and reduces the human factor in the results. In the context of neural tracking of a speech stimulus using EEG, our method resulted in more accurate short-term temporal response function (TRF) estimates, higher correlations between predicted and actual neural responses, and higher attention decoding accuracies compared to existing TRF-based decoding methods. We also provide an extensive discussion on the central role played by the generalized eigenvalue decomposition in various denoising methods in the literature, and address the conceptual similarities and differences with our proposed method.
Collapse
Affiliation(s)
- Neetha Das
- Dept. Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium; Dept. Neurosciences, ExpORL, KU Leuven, Herestraat 49 Bus 721, B-3000, Leuven, Belgium.
| | - Jonas Vanthornhout
- Dept. Neurosciences, ExpORL, KU Leuven, Herestraat 49 Bus 721, B-3000, Leuven, Belgium
| | - Tom Francart
- Dept. Neurosciences, ExpORL, KU Leuven, Herestraat 49 Bus 721, B-3000, Leuven, Belgium
| | - Alexander Bertrand
- Dept. Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium.
| |
Collapse
|
53
|
Fu Z, Wu X, Chen J. Congruent audiovisual speech enhances auditory attention decoding with EEG. J Neural Eng 2019; 16:066033. [PMID: 31505476 DOI: 10.1088/1741-2552/ab4340] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE The auditory attention decoding (AAD) approach can be used to determine the identity of the attended speaker during an auditory selective attention task, by analyzing measurements of electroencephalography (EEG) data. The AAD approach has the potential to guide the design of speech enhancement algorithms in hearing aids, i.e. to identify the speech stream of listener's interest so that hearing aids algorithms can amplify the target speech and attenuate other distracting sounds. This would consequently result in improved speech understanding and communication and reduced cognitive load, etc. The present work aimed to investigate whether additional visual input (i.e. lipreading) would enhance the AAD performance for normal-hearing listeners. APPROACH In a two-talker scenario, where auditory stimuli of audiobooks narrated by two speakers were presented, multi-channel EEG signals were recorded while participants were selectively attending to one speaker and ignoring the other one. Speakers' mouth movements were recorded during narrating for providing visual stimuli. Stimulus conditions included audio-only, visual input congruent with either (i.e. attended or unattended) speaker, and visual input incongruent with either speaker. The AAD approach was performed separately for each condition to evaluate the effect of additional visual input on AAD. MAIN RESULTS Relative to the audio-only condition, the AAD performance was found improved by visual input only when it was congruent with the attended speech stream, and the improvement was about 14 percentage points on decoding accuracy. Cortical envelope tracking activities in both auditory and visual cortex were demonstrated stronger for the congruent audiovisual speech condition than other conditions. In addition, a higher AAD robustness was revealed for the congruent audiovisual condition, with reduced channel number and trial duration achieving higher accuracy than the audio-only condition. SIGNIFICANCE The present work complements previous studies and further manifests the feasibility of the AAD-guided design of hearing aids in daily face-to-face conversations. The present work also has a directive significance for designing a low-density EEG setup for the AAD approach.
Collapse
Affiliation(s)
- Zhen Fu
- Department of Machine Intelligence, Speech and Hearing Research Center, and Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, People's Republic of China
| | | | | |
Collapse
|
54
|
Lesenfants D, Vanthornhout J, Verschueren E, Francart T. Data-driven spatial filtering for improved measurement of cortical tracking of multiple representations of speech. J Neural Eng 2019; 16:066017. [PMID: 31426053 DOI: 10.1088/1741-2552/ab3c92] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE Measurement of the cortical tracking of continuous speech from electroencephalography (EEG) recordings using a forward model is an important tool in auditory neuroscience. Usually the stimulus is represented by its temporal envelope. Recently, the phonetic representation of speech was successfully introduced in English. We aim to show that the EEG prediction from phoneme-related speech features is possible in Dutch. The method requires a manual channel selection based on visual inspection or prior knowledge to obtain a summary measure of cortical tracking. We evaluate a method to (1) remove non-stimulus-related activity from the EEG signals to be predicted, and (2) automatically select the channels of interest. APPROACH Eighteen participants listened to a Flemish story, while their EEG was recorded. Subject-specific and grand-average temporal response functions were determined between the EEG activity in different frequency bands and several stimulus features: the envelope, spectrogram, phonemes, phonetic features or a combination. The temporal response functions were used to predict EEG from the stimulus, and the predicted was compared with the recorded EEG, yielding a measure of cortical tracking of stimulus features. A spatial filter was calculated based on the generalized eigenvalue decomposition (GEVD), and the effect on EEG prediction accuracy was determined. MAIN RESULTS A model including both low- and high-level speech representations was able to better predict the brain responses to the speech than a model only including low-level features. The inclusion of a GEVD-based spatial filter in the model increased the prediction accuracy of cortical responses to each speech feature at both single-subject (270% improvement) and group-level (310%). SIGNIFICANCE We showed that the inclusion of acoustical and phonetic speech information and the addition of a data-driven spatial filter allow improved modelling of the relationship between the speech and its brain responses and offer an automatic channel selection.
Collapse
Affiliation(s)
- D Lesenfants
- Department of Neurosciences, Experimental Oto-Rhino-Laryngology, KU Leuven, Belgium
| | | | | | | |
Collapse
|
55
|
Bednar A, Lalor EC. Where is the cocktail party? Decoding locations of attended and unattended moving sound sources using EEG. Neuroimage 2019; 205:116283. [PMID: 31629828 DOI: 10.1016/j.neuroimage.2019.116283] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 10/08/2019] [Accepted: 10/14/2019] [Indexed: 11/18/2022] Open
Abstract
Recently, we showed that in a simple acoustic scene with one sound source, auditory cortex tracks the time-varying location of a continuously moving sound. Specifically, we found that both the delta phase and alpha power of the electroencephalogram (EEG) can be used to reconstruct the sound source azimuth. However, in natural settings, we are often presented with a mixture of multiple competing sounds and so we must focus our attention on the relevant source in order to segregate it from the competing sources e.g. 'cocktail party effect'. While many studies have examined this phenomenon in the context of sound envelope tracking by the cortex, it is unclear how we process and utilize spatial information in complex acoustic scenes with multiple sound sources. To test this, we created an experiment where subjects listened to two concurrent sound stimuli that were moving within the horizontal plane over headphones while we recorded their EEG. Participants were tasked with paying attention to one of the two presented stimuli. The data were analyzed by deriving linear mappings, temporal response functions (TRF), between EEG data and attended as well unattended sound source trajectories. Next, we used these TRFs to reconstruct both trajectories from previously unseen EEG data. In a first experiment we used noise stimuli and included the task involved spatially localizing embedded targets. Then, in a second experiment, we employed speech stimuli and a non-spatial speech comprehension task. Results showed the trajectory of an attended sound source can be reliably reconstructed from both delta phase and alpha power of EEG even in the presence of distracting stimuli. Moreover, the reconstruction was robust to task and stimulus type. The cortical representation of the unattended source position was below detection level for the noise stimuli, but we observed weak tracking of the unattended source location for the speech stimuli by the delta phase of EEG. In addition, we demonstrated that the trajectory reconstruction method can in principle be used to decode selective attention on a single-trial basis, however, its performance was inferior to envelope-based decoders. These results suggest a possible dissociation of delta phase and alpha power of EEG in the context of sound trajectory tracking. Moreover, the demonstrated ability to localize and determine the attended speaker in complex acoustic environments is particularly relevant for cognitively controlled hearing devices.
Collapse
Affiliation(s)
- Adam Bednar
- School of Engineering, Trinity College Dublin, Dublin, Ireland; Trinity Center for Bioengineering, Trinity College Dublin, Dublin, Ireland.
| | - Edmund C Lalor
- School of Engineering, Trinity College Dublin, Dublin, Ireland; Trinity Center for Bioengineering, Trinity College Dublin, Dublin, Ireland; Department of Biomedical Engineering, Department of Neuroscience, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
56
|
Etard O, Kegler M, Braiman C, Forte AE, Reichenbach T. Decoding of selective attention to continuous speech from the human auditory brainstem response. Neuroimage 2019; 200:1-11. [DOI: 10.1016/j.neuroimage.2019.06.029] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 05/12/2019] [Accepted: 06/14/2019] [Indexed: 10/26/2022] Open
|
57
|
McCarthy-Jones S. The Autonomous Mind: The Right to Freedom of Thought in the Twenty-First Century. Front Artif Intell 2019; 2:19. [PMID: 33733108 PMCID: PMC7861318 DOI: 10.3389/frai.2019.00019] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 09/04/2019] [Indexed: 11/13/2022] Open
Abstract
To lose freedom of thought (FoT) is to lose our dignity, our democracy and our very selves. Accordingly, the right to FoT receives absolute protection under international human rights law. However, this foundational right has been neither significantly developed nor often utilized. The contours of this right urgently need to be defined due to twenty-first century threats to FoT posed by new technologies. As such, this paper draws on law and psychology to consider what the right to FoT should be in the twenty-first century. After discussing contemporary threats to FoT, and recent developments in our understanding of thought that can inform the development of the right, this paper considers three elements of the right; the rights not to reveal one's thoughts, not to be penalized for one's thoughts, and not to have one's thoughts manipulated. The paper then considers, for each element, why it should exist, how the law currently treats it, and challenges that will shape it going forward. The paper concludes that the law should develop the right to FoT with the clear understanding that what this aims to secure is mental autonomy. This process should hence begin by establishing the core mental processes that enable mental autonomy, such as attentional and cognitive agency. The paper argues that the domain of the right to FoT should be extended to include external actions that are arguably constitutive of thought, including internet searches and diaries, hence shielding them with absolute protection. It is stressed that law must protect us from threats to FoT from both states and corporations, with governments needing to act under the positive aspect of the right to ensure societies are structured to facilitate mental autonomy. It is suggested that in order to support mental autonomy, information should be provided in autonomy-supportive contexts and friction introduced into decision making processes to facilitate second-order thought. The need for public debate about how society wishes to balance risk and mental autonomy is highlighted, and the question is raised as to whether the importance attached to thought has changed in our culture. The urgency of defending FoT is re-iterated.
Collapse
|
58
|
Vanthornhout J, Decruy L, Francart T. Effect of Task and Attention on Neural Tracking of Speech. Front Neurosci 2019; 13:977. [PMID: 31607841 PMCID: PMC6756133 DOI: 10.3389/fnins.2019.00977] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 08/30/2019] [Indexed: 12/02/2022] Open
Abstract
EEG-based measures of neural tracking of natural running speech are becoming increasingly popular to investigate neural processing of speech and have applications in audiology. When the stimulus is a single speaker, it is usually assumed that the listener actively attends to and understands the stimulus. However, as the level of attention of the listener is inherently variable, we investigated how this affected neural envelope tracking. Using a movie as a distractor, we varied the level of attention while we estimated neural envelope tracking. We varied the intelligibility level by adding stationary noise. We found a significant difference in neural envelope tracking between the condition with maximal attention and the movie condition. This difference was most pronounced in the right-frontal region of the brain. The degree of neural envelope tracking was highly correlated with the stimulus signal-to-noise ratio, even in the movie condition. This could be due to residual neural resources to passively attend to the stimulus. When envelope tracking is used to measure speech understanding objectively, this means that the procedure can be made more enjoyable and feasible by letting participants watch a movie during stimulus presentation.
Collapse
Affiliation(s)
| | - Lien Decruy
- Department of Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| |
Collapse
|
59
|
Narayanan AM, Bertrand AA. The effect of miniaturization and galvanic separation of EEG sensor devices in an auditory attention detection task. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2019; 2018:77-80. [PMID: 30440345 DOI: 10.1109/embc.2018.8512212] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent technological advances in the design of concealable miniature electroencephalography (mini-EEG) devices are paving the way towards 24/7 neuromonitoring applications in daily life. However, such mini-EEG devices only cover a small area and record EEG over much shorter inter- electrode distances than in traditional EEG headsets. These drawbacks can potentially be compensated for by deploying a multitude of such mini-EEG devices and then jointly processing their recorded EEG signals. In this study, we simulate and investigate the effect of using such multi-node EEG recordings in which the nodes are galvanically separated from each other, and only use their internal electrodes to make short- distance EEG recordings. We focus on a use-case in auditory attention detection (AAD), and we demonstrate that the AAD performance using galvanically separated short-distance EEG measurements is comparable to using an equal number of long- distance EEG measurements if in both cases the electrodes are optimally placed on the scalp. To this end, we use a channel selection method based on a modified version of the least absolute shrinkage and selection operator (LASSO) technique, viz. the group-LASSO, in order to find these optimal locations.
Collapse
|
60
|
Ciccarelli G, Nolan M, Perricone J, Calamia PT, Haro S, O'Sullivan J, Mesgarani N, Quatieri TF, Smalt CJ. Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods. Sci Rep 2019; 9:11538. [PMID: 31395905 PMCID: PMC6687829 DOI: 10.1038/s41598-019-47795-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/24/2019] [Indexed: 12/30/2022] Open
Abstract
Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. Here, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the new architecture outperforms the baseline linear stimulus-reconstruction method, improving decoding accuracy from 66% to 81% using wet EEG and from 59% to 87% for dry EEG. Also of note was the finding that the dry EEG system can deliver comparable or even better results than the wet, despite the latter having one third as many EEG channels as the former. The 11-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available for further validation, experimentation, and modification.
Collapse
Affiliation(s)
- Gregory Ciccarelli
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Michael Nolan
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Joseph Perricone
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Paul T Calamia
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA
| | - Stephanie Haro
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.,Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| | - James O'Sullivan
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Thomas F Quatieri
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.,Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| | - Christopher J Smalt
- Bioengineering Systems and Technologies Group, MIT Lincoln Laboratory, Lexington, MA, USA.
| |
Collapse
|
61
|
Nogueira W, Dolhopiatenko H, Schierholz I, Büchner A, Mirkovic B, Bleichner MG, Debener S. Decoding Selective Attention in Normal Hearing Listeners and Bilateral Cochlear Implant Users With Concealed Ear EEG. Front Neurosci 2019; 13:720. [PMID: 31379479 PMCID: PMC6657402 DOI: 10.3389/fnins.2019.00720] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 06/26/2019] [Indexed: 11/29/2022] Open
Abstract
Electroencephalography (EEG) data can be used to decode an attended speech source in normal-hearing (NH) listeners using high-density EEG caps, as well as around-the-ear EEG devices. The technology may find application in identifying the target speaker in a cocktail party like scenario and steer speech enhancement algorithms in cochlear implants (CIs). However, the worse spectral resolution and the electrical artifacts introduced by a CI may limit the applicability of this approach to CI users. The goal of this study was to investigate whether selective attention can be decoded in CI users using an around-the-ear EEG system (cEEGrid). The performances of high-density cap EEG recordings and cEEGrid EEG recordings were compared in a selective attention paradigm using an envelope tracking algorithm. Speech from two audio books was presented through insert earphones to NH listeners and via direct audio cable to the CI users. 10 NH listeners and 10 bilateral CI users participated in the study. Participants were instructed to attend to one out of the two concurrent speech streams while data were recorded by a 96-channel scalp EEG and an 18-channel cEEGrid setup simultaneously. Reconstruction performance was evaluated by means of parametric correlations between the reconstructed speech and both, the envelope of the attended and the unattended speech stream. Results confirm the feasibility to decode selective attention by means of single-trial EEG data in NH and CI users using a high-density EEG. All NH listeners and 9 out of 10 CI achieved high decoding accuracies. The cEEGrid was successful in decoding selective attention in 5 out of 10 NH listeners. The same result was obtained for CI users.
Collapse
Affiliation(s)
- Waldo Nogueira
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Hanna Dolhopiatenko
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Irina Schierholz
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Andreas Büchner
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
62
|
O'Sullivan AE, Lim CY, Lalor EC. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations. Eur J Neurosci 2019; 50:3282-3295. [PMID: 31013361 DOI: 10.1111/ejn.14425] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 03/25/2019] [Accepted: 04/17/2019] [Indexed: 11/30/2022]
Abstract
Recent work using electroencephalography has applied stimulus reconstruction techniques to identify the attended speaker in a cocktail party environment. The success of these approaches has been primarily based on the ability to detect cortical tracking of the acoustic envelope at the scalp level. However, most studies have ignored the effects of visual input, which is almost always present in naturalistic scenarios. In this study, we investigated the effects of visual input on envelope-based cocktail party decoding in two multisensory cocktail party situations: (a) Congruent AV-facing the attended speaker while ignoring another speaker represented by the audio-only stream and (b) Incongruent AV (eavesdropping)-attending the audio-only speaker while looking at the unattended speaker. We trained and tested decoders for each condition separately and found that we can successfully decode attention to congruent audiovisual speech and can also decode attention when listeners were eavesdropping, i.e., looking at the face of the unattended talker. In addition to this, we found alpha power to be a reliable measure of attention to the visual speech. Using parieto-occipital alpha power, we found that we can distinguish whether subjects are attending or ignoring the speaker's face. Considering the practical applications of these methods, we demonstrate that with only six near-ear electrodes we can successfully determine the attended speech. This work extends the current framework for decoding attention to speech to more naturalistic scenarios, and in doing so provides additional neural measures which may be incorporated to improve decoding accuracy.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Chantelle Y Lim
- Department of Biomedical Engineering, University of Rochester, Rochester, New York
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.,Department of Biomedical Engineering, University of Rochester, Rochester, New York.,Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York
| |
Collapse
|
63
|
Han C, O’Sullivan J, Luo Y, Herrero J, Mehta AD, Mesgarani N. Speaker-independent auditory attention decoding without access to clean speech sources. SCIENCE ADVANCES 2019; 5:eaav6134. [PMID: 31106271 PMCID: PMC6520028 DOI: 10.1126/sciadv.aav6134] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Accepted: 04/09/2019] [Indexed: 06/08/2023]
Abstract
Speech perception in crowded environments is challenging for hearing-impaired listeners. Assistive hearing devices cannot lower interfering speakers without knowing which speaker the listener is focusing on. One possible solution is auditory attention decoding in which the brainwaves of listeners are compared with sound sources to determine the attended source, which can then be amplified to facilitate hearing. In realistic situations, however, only mixed audio is available. We utilize a novel speech separation algorithm to automatically separate speakers in mixed audio, with no need for the speakers to have prior training. Our results show that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. The proposed method significantly improves the subjective and objective quality of the attended speaker. Our study addresses a major obstacle in actualization of auditory attention decoding that can assist hearing-impaired listeners and reduce listening effort for normal-hearing subjects.
Collapse
Affiliation(s)
- Cong Han
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - James O’Sullivan
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Yi Luo
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Jose Herrero
- Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, New York, NY, USA
| | - Ashesh D. Mehta
- Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, New York, NY, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| |
Collapse
|
64
|
Narayanan AM, Bertrand A. Analysis of Miniaturization Effects and Channel Selection Strategies for EEG Sensor Networks With Application to Auditory Attention Detection. IEEE Trans Biomed Eng 2019; 67:234-244. [PMID: 30998455 DOI: 10.1109/tbme.2019.2911728] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Concealable, miniaturized electroencephalography (mini-EEG) recording devices are crucial enablers toward long-term ambulatory EEG monitoring. However, the resulting miniaturization limits the inter-electrode distance and the scalp area that can be covered by a single device. The concept of wireless EEG sensor networks (WESNs) attempts to overcome this limitation by placing a multitude of these mini-EEG devices at various scalp locations. We investigate whether optimizing the WESN topology can compensate for miniaturization effects in an auditory attention detection (AAD) paradigm. METHODS Starting from standard full-cap high-density EEG data, we emulate several candidate mini-EEG sensor nodes that locally collect EEG data with embedded electrodes separated by short distances. We propose a greedy group-utility based channel selection strategy to select a subset of these candidate nodes to form a WESN. We compare the AAD performance of this WESN with the performance obtained using long-distance EEG recordings. RESULTS The AAD performance using short-distance EEG measurements is comparable to using an equal number of long-distance EEG measurements if, in both cases, the optimal electrode positions are selected. A significant increase in performance was found when using nodes with three electrodes over nodes with two electrodes. CONCLUSION When the nodes are optimally placed, WESNs do not significantly suffer from EEG miniaturization effects in the case of AAD. SIGNIFICANCE WESN-like platforms allow us to achieve similar AAD performance as with long-distance EEG recordings while adhering to the stringent miniaturization constraints for ambulatory EEG. Their applicability in an AAD task is important for the design of neuro-steered auditory prostheses.
Collapse
|
65
|
Mirkovic B, Debener S, Schmidt J, Jaeger M, Neher T. Effects of directional sound processing and listener's motivation on EEG responses to continuous noisy speech: Do normal-hearing and aided hearing-impaired listeners differ? Hear Res 2019; 377:260-270. [PMID: 31003037 DOI: 10.1016/j.heares.2019.04.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 04/02/2019] [Accepted: 04/10/2019] [Indexed: 10/27/2022]
Abstract
OBJECTIVE It has been suggested that the next major advancement in hearing aid (HA) technology needs to include cognitive feedback from the user to control HA functionality. In order to enable automatic brainwave-steered HA adjustments, attentional processes underlying speech-in-noise perception in aided hearing-impaired individuals need to be better understood. Here, we addressed the influence of two important factors for the listening performance of HA users - hearing aid processing and motivation - by analysing ongoing neural responses during long-term listening to continuous noisy speech. METHODS Sixteen normal-hearing (NH) and 15 linearly aided hearing-impaired (aHI) participants listened to an audiobook recording embedded in realistic speech babble noise at individually adjusted signal-to-noise ratios (SNRs). A HA simulator was used for simulating a directional microphone setting as well as for providing individual amplification. To assess listening performance behaviourally, participants answered questions about the contents of the audiobook. We manipulated (1) the participants' motivation by offering a monetary reward for good listening performance in one half of the measurements and (2) the SNR by engaging/disengaging the directional microphone setting. During the speech-in-noise task, electroencephalography (EEG) signals were recorded using wireless, mobile hardware. EEG correlates of listening performance were investigated using EEG impulse responses, as estimated using the cross-correlation between the recorded EEG signal and the temporal envelope of the audiobook at the output of the HA simulator. RESULTS At the behavioural level, we observed better performance for the NH listeners than for the aHI listeners. Furthermore, the directional microphone setting led to better performance for both participant groups, and when the directional microphone setting was disengaged motivation also improved the performance of the aHI participants. Analysis of the EEG impulse responses showed faster N1P2 responses for both groups and larger N2 peak amplitudes for the aHI group when the directional microphone setting was activated, but no physiological correlates of motivation. SIGNIFICANCE The results of this study indicate that motivation plays an important role for speech understanding in noise. In terms of neuro-steered HAs, our results suggest that the latency of attentional processes is influenced by HA-induced stimulus changes, which can potentially be used for inferring benefit from noise suppression processing automatically. Further research is necessary to identify the neural correlates of motivation as an exclusive top-down process and to combine such features with HA-driven ones for online HA adjustments.
Collapse
Affiliation(s)
- Bojana Mirkovic
- Department of Psychology, University of Oldenburg, Ammerländer Heerstraße 114, 26129, Oldenburg, Germany; Cluster of Excellence "Hearing4all", Oldenburg, Germany.
| | - Stefan Debener
- Department of Psychology, University of Oldenburg, Ammerländer Heerstraße 114, 26129, Oldenburg, Germany; Cluster of Excellence "Hearing4all", Oldenburg, Germany.
| | - Julia Schmidt
- Department of Psychology, University of Oldenburg, Ammerländer Heerstraße 114, 26129, Oldenburg, Germany.
| | - Manuela Jaeger
- Department of Psychology, University of Oldenburg, Ammerländer Heerstraße 114, 26129, Oldenburg, Germany.
| | - Tobias Neher
- Institute of Clinical Research, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark.
| |
Collapse
|
66
|
Hearing-impaired listeners show increased audiovisual benefit when listening to speech in noise. Neuroimage 2019; 196:261-268. [PMID: 30978494 DOI: 10.1016/j.neuroimage.2019.04.017] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 04/02/2019] [Accepted: 04/04/2019] [Indexed: 11/22/2022] Open
Abstract
Recent studies provide evidence for changes in audiovisual perception as well as for adaptive cross-modal auditory cortex plasticity in older individuals with high-frequency hearing impairments (presbycusis). We here investigated whether these changes facilitate the use of visual information, leading to an increased audiovisual benefit of hearing-impaired individuals when listening to speech in noise. We used a naturalistic design in which older participants with a varying degree of high-frequency hearing loss attended to running auditory or audiovisual speech in noise and detected rare target words. Passages containing only visual speech served as a control condition. Simultaneously acquired scalp electroencephalography (EEG) data were used to study cortical speech tracking. Target word detection accuracy was significantly increased in the audiovisual as compared to the auditory listening condition. The degree of this audiovisual enhancement was positively related to individual high-frequency hearing loss and subjectively reported listening effort in challenging daily life situations, which served as a subjective marker of hearing problems. On the neural level, the early cortical tracking of the speech envelope was enhanced in the audiovisual condition. Similar to the behavioral findings, individual differences in the magnitude of the enhancement were positively associated with listening effort ratings. Our results therefore suggest that hearing-impaired older individuals make increased use of congruent visual information to compensate for the degraded auditory input.
Collapse
|
67
|
Nogueira W, Cosatti G, Schierholz I, Egger M, Mirkovic B, Buchner A. Toward Decoding Selective Attention From Single-Trial EEG Data in Cochlear Implant Users. IEEE Trans Biomed Eng 2019; 67:38-49. [PMID: 30932825 DOI: 10.1109/tbme.2019.2907638] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Previous results showed that it is possible to decode an attended speech source from EEG data via the reconstruction of the speech envelope in normal hearing (NH) listeners. However, so far it is unknown that how the performance of such a decoder is affected by the decrease in spectral resolution and the electrical artifacts introduced by a cochlear implant (CI) in users of these prostheses. NH listeners and bilateral CI users participated in the present study. Speeches from two audio books, one uttered by a male voice and one by a female voice, were presented to NH listeners and CI users. Participants were instructed to attend to one of the two speech streams presented dichotically while a 96-channel EEG was recorded. Speech envelope reconstruction from the EEG data was obtained by training decoders using a regularized least square estimation method. Decoding accuracy was defined as the percentage of accurately reconstructed trials for each subject. For NH listeners, the experiment was repeated using a vocoder to reduce spectral resolution and simulate speech perception with a CI in NH listeners. The results showed a decoding accuracy of 80.9 % using the original sound files in NH listeners. The performance dropped to 73.2 % in the vocoder condition and to 71.5 % in the group of CI users. In sum, although the accuracy drops when the spectral resolution becomes worse, the results show the feasibility to decode the attended sound source in NH listeners with a vocoder simulation, and even in CI users, albeit more training data are needed.
Collapse
|
68
|
Aroudi A, Mirkovic B, De Vos M, Doclo S. Impact of Different Acoustic Components on EEG-Based Auditory Attention Decoding in Noisy and Reverberant Conditions. IEEE Trans Neural Syst Rehabil Eng 2019; 27:652-663. [DOI: 10.1109/tnsre.2019.2903404] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
69
|
Xie Z, Reetzke R, Chandrasekaran B. Machine Learning Approaches to Analyze Speech-Evoked Neurophysiological Responses. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:587-601. [PMID: 30950746 PMCID: PMC6802895 DOI: 10.1044/2018_jslhr-s-astm-18-0244] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 10/28/2018] [Accepted: 11/26/2018] [Indexed: 05/27/2023]
Abstract
Purpose Speech-evoked neurophysiological responses are often collected to answer clinically and theoretically driven questions concerning speech and language processing. Here, we highlight the practical application of machine learning (ML)-based approaches to analyzing speech-evoked neurophysiological responses. Method Two categories of ML-based approaches are introduced: decoding models, which generate a speech stimulus output using the features from the neurophysiological responses, and encoding models, which use speech stimulus features to predict neurophysiological responses. In this review, we focus on (a) a decoding model classification approach, wherein speech-evoked neurophysiological responses are classified as belonging to 1 of a finite set of possible speech events (e.g., phonological categories), and (b) an encoding model temporal response function approach, which quantifies the transformation of a speech stimulus feature to continuous neural activity. Results We illustrate the utility of the classification approach to analyze early electroencephalographic (EEG) responses to Mandarin lexical tone categories from a traditional experimental design, and to classify EEG responses to English phonemes evoked by natural continuous speech (i.e., an audiobook) into phonological categories (plosive, fricative, nasal, and vowel). We also demonstrate the utility of temporal response function to predict EEG responses to natural continuous speech from acoustic features. Neural metrics from the 3 examples all exhibit statistically significant effects at the individual level. Conclusion We propose that ML-based approaches can complement traditional analysis approaches to analyze neurophysiological responses to speech signals and provide a deeper understanding of natural speech and language processing using ecologically valid paradigms in both typical and clinical populations.
Collapse
Affiliation(s)
- Zilong Xie
- Department of Communication Sciences and Disorders, The University of Texas at Austin
| | - Rachel Reetzke
- Department of Communication Sciences and Disorders, The University of Texas at Austin
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh
| |
Collapse
|
70
|
Alickovic E, Lunner T, Gustafsson F, Ljung L. A Tutorial on Auditory Attention Identification Methods. Front Neurosci 2019; 13:153. [PMID: 30941002 PMCID: PMC6434370 DOI: 10.3389/fnins.2019.00153] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 02/11/2019] [Indexed: 01/14/2023] Open
Abstract
Auditory attention identification methods attempt to identify the sound source of a listener's interest by analyzing measurements of electrophysiological data. We present a tutorial on the numerous techniques that have been developed in recent decades, and we present an overview of current trends in multivariate correlation-based and model-based learning frameworks. The focus is on the use of linear relations between electrophysiological and audio data. The way in which these relations are computed differs. For example, canonical correlation analysis (CCA) finds a linear subset of electrophysiological data that best correlates to audio data and a similar subset of audio data that best correlates to electrophysiological data. Model-based (encoding and decoding) approaches focus on either of these two sets. We investigate the similarities and differences between these linear model philosophies. We focus on (1) correlation-based approaches (CCA), (2) encoding/decoding models based on dense estimation, and (3) (adaptive) encoding/decoding models based on sparse estimation. The specific focus is on sparsity-driven adaptive encoding models and comparing the methodology in state-of-the-art models found in the auditory literature. Furthermore, we outline the main signal processing pipeline for how to identify the attended sound source in a cocktail party environment from the raw electrophysiological data with all the necessary steps, complemented with the necessary MATLAB code and the relevant references for each step. Our main aim is to compare the methodology of the available methods, and provide numerical illustrations to some of them to get a feeling for their potential. A thorough performance comparison is outside the scope of this tutorial.
Collapse
Affiliation(s)
- Emina Alickovic
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | - Thomas Lunner
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Hearing Systems, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
- Swedish Institute for Disability Research, Linnaeus Centre HEAD, Linkoping University, Linkoping, Sweden
| | - Fredrik Gustafsson
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| | - Lennart Ljung
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| |
Collapse
|
71
|
Müller JA, Wendt D, Kollmeier B, Debener S, Brand T. Effect of Speech Rate on Neural Tracking of Speech. Front Psychol 2019; 10:449. [PMID: 30906273 PMCID: PMC6418035 DOI: 10.3389/fpsyg.2019.00449] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 02/14/2019] [Indexed: 12/03/2022] Open
Abstract
Speech comprehension requires effort in demanding listening situations. Selective attention may be required for focusing on a specific talker in a multi-talker environment, may enhance effort by requiring additional cognitive resources, and is known to enhance the neural representation of the attended talker in the listener's neural response. The aim of the study was to investigate the relation of listening effort, as quantified by subjective effort ratings and pupil dilation, and neural speech tracking during sentence recognition. Task demands were varied using sentences with varying levels of linguistic complexity and using two different speech rates in a picture-matching paradigm with 20 normal-hearing listeners. The participants' task was to match the acoustically presented sentence with a picture presented before the acoustic stimulus. Afterwards they rated their perceived effort on a categorical effort scale. During each trial, pupil dilation (as an indicator of listening effort) and electroencephalogram (as an indicator of neural speech tracking) were recorded. Neither measure was significantly affected by linguistic complexity. However, speech rate showed a strong influence on subjectively rated effort, pupil dilation, and neural tracking. The neural tracking analysis revealed a shorter latency for faster sentences, which may reflect a neural adaptation to the rate of the input. No relation was found between neural tracking and listening effort, even though both measures were clearly influenced by speech rate. This is probably due to factors that influence both measures differently. Consequently, the amount of listening effort is not clearly represented in the neural tracking.
Collapse
Affiliation(s)
- Jana Annina Müller
- Cluster of Excellence ‘Hearing4all’, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Medizinische Physik, Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Dorothea Wendt
- Hearing Systems, Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
- Eriksholm Research Centre, Snekkersten, Denmark
| | - Birger Kollmeier
- Cluster of Excellence ‘Hearing4all’, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Medizinische Physik, Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Cluster of Excellence ‘Hearing4all’, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Neuropsychology Lab, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Thomas Brand
- Cluster of Excellence ‘Hearing4all’, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Medizinische Physik, Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
72
|
Teoh ES, Lalor EC. EEG decoding of the target speaker in a cocktail party scenario: considerations regarding dynamic switching of talker location. J Neural Eng 2019; 16:036017. [PMID: 30836345 DOI: 10.1088/1741-2552/ab0cf1] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE It has been shown that attentional selection in a simple dichotic listening paradigm can be decoded offline by reconstructing the stimulus envelope from single-trial neural response data. Here, we test the efficacy of this approach in an environment with non-stationary talkers. We then look beyond the envelope reconstructions themselves and consider whether incorporating the decoder values-which reflect the weightings applied to the multichannel EEG data at different time lags and scalp locations when reconstructing the stimulus envelope-can improve decoding performance. APPROACH High-density EEG was recorded as subjects attended to one of two talkers. The two speech streams were filtered using HRTFs, and the talkers were alternated between the left and right locations at varying intervals to simulate a dynamic environment. We trained spatio-temporal decoders mapping from EEG data to the attended and unattended stimulus envelopes. We then decoded auditory attention by (1) using the attended decoder to reconstruct the envelope and (2) exploiting the fact that decoder weightings themselves contain signatures of attention, resulting in consistent patterns across subjects that can be classified. MAIN RESULTS The previously established decoding approach was found to be effective even with non-stationary talkers. Signatures of attentional selection and attended direction were found in the spatio-temporal structure of the decoders and were consistent across subjects. The inclusion of decoder weights into the decoding algorithm resulted in significantly improved decoding accuracies (from 61.07% to 65.31% for 4 s windows). An attempt was made to include alpha power lateralization as another feature to improve decoding, although this was unsuccessful at the single-trial level. SIGNIFICANCE This work suggests that the spatial-temporal decoder weights can be utilised to improve decoding. More generally, looking beyond envelope reconstruction and incorporating other signatures of attention is an avenue that should be explored to improve selective auditory attention decoding.
Collapse
Affiliation(s)
- Emily S Teoh
- School of Engineering, Trinity College Dublin, University of Dublin, Dublin, Ireland. Trinity Centre for Bioengineering, Trinity College Dublin, Dublin, Ireland
| | | |
Collapse
|
73
|
Hendrikse MME, Llorach G, Hohmann V, Grimm G. Movement and Gaze Behavior in Virtual Audiovisual Listening Environments Resembling Everyday Life. Trends Hear 2019; 23:2331216519872362. [PMID: 32516060 PMCID: PMC6732870 DOI: 10.1177/2331216519872362] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 08/05/2019] [Indexed: 11/25/2022] Open
Abstract
Recent achievements in hearing aid development, such as visually guided hearing aids, make it increasingly important to study movement behavior in everyday situations in order to develop test methods and evaluate hearing aid performance. In this work, audiovisual virtual environments (VEs) were designed for communication conditions in a living room, a lecture hall, a cafeteria, a train station, and a street environment. Movement behavior (head movement, gaze direction, and torso rotation) and electroencephalography signals were measured in these VEs in the laboratory for 22 younger normal-hearing participants and 19 older normal-hearing participants. These data establish a reference for future studies that will investigate the movement behavior of hearing-impaired listeners and hearing aid users for comparison. Questionnaires were used to evaluate the subjective experience in the VEs. A test-retest comparison showed that the measured movement behavior is reproducible and that the measures of movement behavior used in this study are reliable. Moreover, evaluation of the questionnaires indicated that the VEs are sufficiently realistic. The participants rated the experienced acoustic realism of the VEs positively, and although the rating of the experienced visual realism was lower, the participants felt to some extent present and involved in the VEs. Analysis of the movement data showed that movement behavior depends on the VE and the age of the subject and is predictable in multitalker conversations and for moving distractors. The VEs and a database of the collected data are publicly available.
Collapse
Affiliation(s)
| | - Gerard Llorach
- Medizinische Physik and Cluster of
Excellence ‘Hearing4all’, Universität Oldenburg, Germany
- Hörzentrum Oldenburg GmbH, Germany
| | - Volker Hohmann
- Medizinische Physik and Cluster of
Excellence ‘Hearing4all’, Universität Oldenburg, Germany
- Hörzentrum Oldenburg GmbH, Germany
| | - Giso Grimm
- Medizinische Physik and Cluster of
Excellence ‘Hearing4all’, Universität Oldenburg, Germany
| |
Collapse
|
74
|
de Cheveigné A, Di Liberto GM, Arzounian D, Wong DDE, Hjortkjær J, Fuglsang S, Parra LC. Multiway canonical correlation analysis of brain data. Neuroimage 2018; 186:728-740. [PMID: 30496819 DOI: 10.1016/j.neuroimage.2018.11.026] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 10/11/2018] [Accepted: 11/16/2018] [Indexed: 01/12/2023] Open
Abstract
Brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and related techniques often have poor signal-to-noise ratios due to the presence of multiple competing sources and artifacts. A common remedy is to average responses over repeats of the same stimulus, but this is not applicable for temporally extended stimuli that are presented only once (speech, music, movies, natural sound). An alternative is to average responses over multiple subjects that were presented with identical stimuli, but differences in geometry of brain sources and sensors reduce the effectiveness of this solution. Multiway canonical correlation analysis (MCCA) brings a solution to this problem by allowing data from multiple subjects to be fused in such a way as to extract components common to all. This paper reviews the method, offers application examples that illustrate its effectiveness, and outlines the caveats and risks entailed by the method.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France; UCL Ear Institute, London, United Kingdom.
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France
| | - Dorothée Arzounian
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France
| | - Daniel D E Wong
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France
| | - Jens Hjortkjær
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Denmark; Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Denmark
| | - Søren Fuglsang
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Denmark
| | | |
Collapse
|
75
|
Miran S, Akram S, Sheikhattar A, Simon JZ, Zhang T, Babadi B. Real-Time Decoding of Auditory Attention from EEG via Bayesian Filtering. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:25-28. [PMID: 30440332 DOI: 10.1109/embc.2018.8512210] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In a complex auditory scene comprising multiple sound sources, humans are able to target and track a single speaker. Recent studies have provided promising algorithms to decode the attentional state of a listener in a competing-speaker environment from non-invasive brain recordings sun exhibit poor performance at temporal resolutions suitable for real-time implementation, which hinders their utilization in emerging applications such as smart hearich as electroencephalography (EEG). These algorithms require substantial training datasets and ofteng aids. In this work, we propose a real-time attention decoding framework by integrating techniques from Bayesian filtering, $\ell_{1}$-regularization, state-space modeling, and Expectation Maximization, which is capable of producing robust and statistically interpretable measures of auditory attention at high temporal resolution. Application of our proposed algorithm to synthetic and real EEG data yields a performance close to the state-of-the-art offline methods, while operating in near real-time with a minimal amount of training data.
Collapse
|
76
|
Cortical tracking of multiple streams outside the focus of attention in naturalistic auditory scenes. Neuroimage 2018; 181:617-626. [DOI: 10.1016/j.neuroimage.2018.07.052] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 07/19/2018] [Accepted: 07/22/2018] [Indexed: 11/30/2022] Open
|
77
|
Fiedler L, Wöstmann M, Herbst SK, Obleser J. Late cortical tracking of ignored speech facilitates neural selectivity in acoustically challenging conditions. Neuroimage 2018; 186:33-42. [PMID: 30367953 DOI: 10.1016/j.neuroimage.2018.10.057] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 09/12/2018] [Accepted: 10/21/2018] [Indexed: 11/25/2022] Open
Abstract
Listening requires selective neural processing of the incoming sound mixture, which in humans is borne out by a surprisingly clean representation of attended-only speech in auditory cortex. How this neural selectivity is achieved even at negative signal-to-noise ratios (SNR) remains unclear. We show that, under such conditions, a late cortical representation (i.e., neural tracking) of the ignored acoustic signal is key to successful separation of attended and distracting talkers (i.e., neural selectivity). We recorded and modeled the electroencephalographic response of 18 participants who attended to one of two simultaneously presented stories, while the SNR between the two talkers varied dynamically between +6 and -6 dB. The neural tracking showed an increasing early-to-late attention-biased selectivity. Importantly, acoustically dominant (i.e., louder) ignored talkers were tracked neurally by late involvement of fronto-parietal regions, which contributed to enhanced neural selectivity. This neural selectivity, by way of representing the ignored talker, poses a mechanistic neural account of attention under real-life acoustic conditions.
Collapse
Affiliation(s)
- Lorenz Fiedler
- Department of Psychology, University of Lübeck, Lübeck, Germany.
| | - Malte Wöstmann
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Sophie K Herbst
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
78
|
Gandras K, Grimm S, Bendixen A. Electrophysiological Correlates of Speaker Segregation and Foreground-Background Selection in Ambiguous Listening Situations. Neuroscience 2018; 389:19-29. [PMID: 28735101 DOI: 10.1016/j.neuroscience.2017.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 07/10/2017] [Accepted: 07/10/2017] [Indexed: 11/15/2022]
Abstract
In everyday listening environments, a main task for our auditory system is to follow one out of multiple speakers talking simultaneously. The present study was designed to find electrophysiological indicators of two central processes involved - segregating the speech mixture into distinct speech sequences corresponding to the two speakers, and then attending to one of the speech sequences. We generated multistable speech stimuli that were set up to create ambiguity as to whether only one or two speakers are talking. Thereby we were able to investigate three perceptual alternatives (no segregation, segregated - speaker A in the foreground, segregated - speaker B in the foreground) without any confounding stimulus changes. Participants listened to a continuously repeating sequence of syllables, which were uttered alternately by two human speakers, and indicated whether they perceived the sequence as an inseparable mixture or as originating from two separate speakers. In the latter case, they distinguished which speaker was in their attentional foreground. Our data show a long-lasting event-related potential (ERP) modulation starting at 130ms after stimulus onset, which can be explained by the perceptual organization of the two speech sequences into attended foreground and ignored background streams. Our paradigm extends previous work with pure-tone sequences toward speech stimuli and adds the possibility to obtain neural correlates of the difficulty to segregate a speech mixture into distinct streams.
Collapse
Affiliation(s)
- Katharina Gandras
- Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany.
| | - Sabine Grimm
- Department of Physics, School of Natural Sciences, Chemnitz University of Technology, D-09126 Chemnitz, Germany.
| | - Alexandra Bendixen
- Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany; Department of Physics, School of Natural Sciences, Chemnitz University of Technology, D-09126 Chemnitz, Germany.
| |
Collapse
|
79
|
Das N, Bertrand A, Francart T. EEG-based auditory attention detection: boundary conditions for background noise and speaker positions. J Neural Eng 2018; 15:066017. [PMID: 30207293 DOI: 10.1088/1741-2552/aae0a6] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE A listener's neural responses can be decoded to identify the speaker the person is attending to in a cocktail party environment. Such auditory attention detection methods have the potential to provide noise suppression algorithms in hearing devices with information about the listener's attention. A challenge is the effect of noise and other acoustic conditions that can reduce the attention detection accuracy. Specifically, noise can impact the ability of the person to segregate the sound sources and perform selective attention, as well as the external signal processing necessary to decode the attention effectively. The aim of this work is to systematically analyze the effect of noise level and speaker position on attention decoding accuracy. APPROACH 28 subjects participated in the experiment. Auditory stimuli consisted of stories narrated by different speakers from two different locations, along with surrounding multi-talker background babble. EEG signals of the subjects were recorded while they focused on one story and ignored the other. The strength of the babble noise as well as the spatial separation between the two speakers were varied between presentations. Spatio-temporal decoders were trained for each subject, and applied to decode attention of the subjects from every 30 s segment of data. Behavioral speech recognition thresholds were obtained for the different speaker separations. MAIN RESULTS Both the background noise level and the angular separation between speakers affected attention decoding accuracy. Remarkably, attention decoding performance was seen to increase with the inclusion of moderate background noise (versus no noise), while across the different noise conditions performance dropped significantly with increasing noise level. We also observed that decoding accuracy improved with increasing speaker separation, exhibiting the advantage of spatial release from masking. Furthermore, the effect of speaker separation on the decoding accuracy became stronger when the background noise level increased. A significant correlation between speech intelligibility and attention decoding accuracy was found across conditions. SIGNIFICANCE This work shows how the background noise level and relative positions of competing talkers impact attention decoding accuracy. It indicates in which circumstances a neuro-steered noise suppression system may need to operate, in function of acoustic conditions. It also indicates the boundary conditions for the operation of EEG-based attention detection systems in neuro-steered hearing prostheses.
Collapse
Affiliation(s)
- Neetha Das
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium. Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49 bus 721, B-3000 Leuven, Belgium
| | | | | |
Collapse
|
80
|
Wong DDE, Fuglsang SA, Hjortkjær J, Ceolini E, Slaney M, de Cheveigné A. A Comparison of Regularization Methods in Forward and Backward Models for Auditory Attention Decoding. Front Neurosci 2018; 12:531. [PMID: 30131670 PMCID: PMC6090837 DOI: 10.3389/fnins.2018.00531] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 07/16/2018] [Indexed: 11/17/2022] Open
Abstract
The decoding of selective auditory attention from noninvasive electroencephalogram (EEG) data is of interest in brain computer interface and auditory perception research. The current state-of-the-art approaches for decoding the attentional selection of listeners are based on linear mappings between features of sound streams and EEG responses (forward model), or vice versa (backward model). It has been shown that when the envelope of attended speech and EEG responses are used to derive such mapping functions, the model estimates can be used to discriminate between attended and unattended talkers. However, the predictive/reconstructive performance of the models is dependent on how the model parameters are estimated. There exist a number of model estimation methods that have been published, along with a variety of datasets. It is currently unclear if any of these methods perform better than others, as they have not yet been compared side by side on a single standardized dataset in a controlled fashion. Here, we present a comparative study of the ability of different estimation methods to classify attended speakers from multi-channel EEG data. The performance of the model estimation methods is evaluated using different performance metrics on a set of labeled EEG data from 18 subjects listening to mixtures of two speech streams. We find that when forward models predict the EEG from the attended audio, regularized models do not improve regression or classification accuracies. When backward models decode the attended speech from the EEG, regularization provides higher regression and classification accuracies.
Collapse
Affiliation(s)
- Daniel D. E. Wong
- Laboratoire des Systèmes Perceptifs, CNRS, UMR 8248, Paris, France
- Département d'Études Cognitives, École Normale Supérieure, PSL Research University, Paris, France
| | - Søren A. Fuglsang
- Department of Electrical Engineering, Danmarks Tekniske Universitet, Kongens Lyngby, Denmark
| | - Jens Hjortkjær
- Department of Electrical Engineering, Danmarks Tekniske Universitet, Kongens Lyngby, Denmark
- Danish Research Centre for Magnetic Resonance, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
| | - Enea Ceolini
- Institute of Neuroinformatics, University of Zürich, Zurich, Switzerland
| | - Malcolm Slaney
- AI Machine Perception, Google, Mountain View, CA, United States
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, CNRS, UMR 8248, Paris, France
- Département d'Études Cognitives, École Normale Supérieure, PSL Research University, Paris, France
- Ear Institute, University College London, London, United Kingdom
| |
Collapse
|
81
|
O'Sullivan J, Sheth SA, McKhann G, Mehta AD, Mesgarani N. Neural decoding of attentional selection in multi-speaker environments without access to separated sources. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2017:1644-1647. [PMID: 29060199 DOI: 10.1109/embc.2017.8037155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Modern hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation without knowing which speaker is being attended to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. A number of challenges exist, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. We present an end-to-end system that 1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener's neural signals, 2) automatically separates the individual speakers in the mixture, 3) determines the attended speaker, and 4) amplifies the attended speaker's voice to assist the listener. Using invasive electrophysiology recordings, our system is able to decode the attention of a subject and detect switches in attention using only the mixed audio. We also identified the regions of the auditory cortex that contribute to AAD. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearing aids.
Collapse
|
82
|
Lu Y, Wang M, Zhang Q, Han Y. Identification of Auditory Object-Specific Attention from Single-Trial Electroencephalogram Signals via Entropy Measures and Machine Learning. ENTROPY 2018; 20:e20050386. [PMID: 33265476 PMCID: PMC7512905 DOI: 10.3390/e20050386] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 05/16/2018] [Accepted: 05/16/2018] [Indexed: 01/04/2023]
Abstract
Existing research has revealed that auditory attention can be tracked from ongoing electroencephalography (EEG) signals. The aim of this novel study was to investigate the identification of peoples’ attention to a specific auditory object from single-trial EEG signals via entropy measures and machine learning. Approximate entropy (ApEn), sample entropy (SampEn), composite multiscale entropy (CmpMSE) and fuzzy entropy (FuzzyEn) were used to extract the informative features of EEG signals under three kinds of auditory object-specific attention (Rest, Auditory Object1 Attention (AOA1) and Auditory Object2 Attention (AOA2)). The linear discriminant analysis and support vector machine (SVM), were used to construct two auditory attention classifiers. The statistical results of entropy measures indicated that there were significant differences in the values of ApEn, SampEn, CmpMSE and FuzzyEn between Rest, AOA1 and AOA2. For the SVM-based auditory attention classifier, the auditory object-specific attention of Rest, AOA1 and AOA2 could be identified from EEG signals using ApEn, SampEn, CmpMSE and FuzzyEn as features and the identification rates were significantly different from chance level. The optimal identification was achieved by the SVM-based auditory attention classifier using CmpMSE with the scale factor τ = 10. This study demonstrated a novel solution to identify the auditory object-specific attention from single-trial EEG signals without the need to access the auditory stimulus.
Collapse
|
83
|
Miran S, Akram S, Sheikhattar A, Simon JZ, Zhang T, Babadi B. Real-Time Tracking of Selective Auditory Attention From M/EEG: A Bayesian Filtering Approach. Front Neurosci 2018; 12:262. [PMID: 29765298 PMCID: PMC5938416 DOI: 10.3389/fnins.2018.00262] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2017] [Accepted: 04/05/2018] [Indexed: 11/13/2022] Open
Abstract
Humans are able to identify and track a target speaker amid a cacophony of acoustic interference, an ability which is often referred to as the cocktail party phenomenon. Results from several decades of studying this phenomenon have culminated in recent years in various promising attempts to decode the attentional state of a listener in a competing-speaker environment from non-invasive neuroimaging recordings such as magnetoencephalography (MEG) and electroencephalography (EEG). To this end, most existing approaches compute correlation-based measures by either regressing the features of each speech stream to the M/EEG channels (the decoding approach) or vice versa (the encoding approach). To produce robust results, these procedures require multiple trials for training purposes. Also, their decoding accuracy drops significantly when operating at high temporal resolutions. Thus, they are not well-suited for emerging real-time applications such as smart hearing aid devices or brain-computer interface systems, where training data might be limited and high temporal resolutions are desired. In this paper, we close this gap by developing an algorithmic pipeline for real-time decoding of the attentional state. Our proposed framework consists of three main modules: (1) Real-time and robust estimation of encoding or decoding coefficients, achieved by sparse adaptive filtering, (2) Extracting reliable markers of the attentional state, and thereby generalizing the widely-used correlation-based measures thereof, and (3) Devising a near real-time state-space estimator that translates the noisy and variable attention markers to robust and statistically interpretable estimates of the attentional state with minimal delay. Our proposed algorithms integrate various techniques including forgetting factor-based adaptive filtering, ℓ1-regularization, forward-backward splitting algorithms, fixed-lag smoothing, and Expectation Maximization. We validate the performance of our proposed framework using comprehensive simulations as well as application to experimentally acquired M/EEG data. Our results reveal that the proposed real-time algorithms perform nearly as accurately as the existing state-of-the-art offline techniques, while providing a significant degree of adaptivity, statistical robustness, and computational savings.
Collapse
Affiliation(s)
- Sina Miran
- Department of Electrical and Computer Engineering, University of Maryland College Park, MD, United States
| | | | - Alireza Sheikhattar
- Department of Electrical and Computer Engineering, University of Maryland College Park, MD, United States
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland College Park, MD, United States.,Institute for Systems Research, University of Maryland College Park, MD, United States.,Department of Biology, University of Maryland College Park, MD, United States
| | - Tao Zhang
- Starkey Hearing Technologies Eden Prairie, MN, United States
| | - Behtash Babadi
- Department of Electrical and Computer Engineering, University of Maryland College Park, MD, United States.,Institute for Systems Research, University of Maryland College Park, MD, United States
| |
Collapse
|
84
|
de Cheveigné A, Wong DD, Di Liberto GM, Hjortkjær J, Slaney M, Lalor E. Decoding the auditory brain with canonical component analysis. Neuroimage 2018; 172:206-216. [DOI: 10.1016/j.neuroimage.2018.01.033] [Citation(s) in RCA: 100] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/11/2017] [Accepted: 01/15/2018] [Indexed: 11/28/2022] Open
|
85
|
Jaeger M, Bleichner MG, Bauer AKR, Mirkovic B, Debener S. Did You Listen to the Beat? Auditory Steady-State Responses in the Human Electroencephalogram at 4 and 7 Hz Modulation Rates Reflect Selective Attention. Brain Topogr 2018; 31:811-826. [DOI: 10.1007/s10548-018-0637-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 02/23/2018] [Indexed: 01/23/2023]
|
86
|
Abstract
There are many kinds of neural prostheses available or being researched today. In most cases they are intended to cure or improve the condition of patients affected by some cerebral deficiency. In other cases, their goal is to provide new means to maintain or improve an individual's normal performance. In all these circumstances, one of the possible risks is that of violating the privacy of brain contents (which partly coincide with mental contents) or of depriving individuals of full control over their thoughts (mental states), as the latter are at least partly detectable by new prosthetic technologies. Given the (ethical) premise that the absolute privacy and integrity of the most relevant part of one's brain data is (one of) the most valuable and inviolable human right(s), I argue that a (technical) principle should guide the design and regulation of new neural prostheses. The premise is justified by the fact that whatever the coercion, the threat or the violence undergone, the person can generally preserve a "private repository" of thought in which to defend her convictions and identity, her dignity, and autonomy. Without it, the person may end up in a state of complete subjection to other individuals. The following functional principle is that neural prostheses should be technically designed and built so as to prevent such outcomes. They should: (a) incorporate systems that can find and signal the unauthorized detection, alteration, and diffusion of brain data and brain functioning; (b) be able to stop any unauthorized detection, alteration, and diffusion of brain data. This should not only regard individual devices, but act as a general (technical) operating principle shared by all interconnected systems that deal with decoding brain activity and brain functioning.
Collapse
Affiliation(s)
- Andrea Lavazza
- Neuroethics, Centro Universitario Internazionale, Arezzo, Italy
| |
Collapse
|
87
|
de Taillez T, Kollmeier B, Meyer BT. Machine learning for decoding listeners' attention from electroencephalography evoked by continuous speech. Eur J Neurosci 2018; 51:1234-1241. [PMID: 29205588 DOI: 10.1111/ejn.13790] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 11/23/2017] [Accepted: 11/27/2017] [Indexed: 11/27/2022]
Abstract
Previous research has shown that it is possible to predict which speaker is attended in a multispeaker scene by analyzing a listener's electroencephalography (EEG) activity. In this study, existing linear models that learn the mapping from neural activity to an attended speech envelope are replaced by a non-linear neural network (NN). The proposed architecture takes into account the temporal context of the estimated envelope and is evaluated using EEG data obtained from 20 normal-hearing listeners who focused on one speaker in a two-speaker setting. The network is optimized with respect to the frequency range and the temporal segmentation of the EEG input, as well as the cost function used to estimate the model parameters. To identify the salient cues involved in auditory attention, a relevance algorithm is applied that highlights the electrode signals most important for attention decoding. In contrast to linear approaches, the NN profits from a wider EEG frequency range (1-32 Hz) and achieves a performance seven times higher than the linear baseline. Relevant EEG activations following the speech stimulus after 170 ms at physiologically plausible locations were found. This was not observed when the model was trained on the unattended speaker. Our findings therefore indicate that non-linear NNs can provide insight into physiological processes by analyzing EEG activity.
Collapse
Affiliation(s)
- Tobias de Taillez
- Medizinische Physik and Cluster of Excellence Hearing4all, Carl von Ossietzky Universität, Oldenburg, 26129, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, Carl von Ossietzky Universität, Oldenburg, 26129, Germany
| | - Bernd T Meyer
- Medizinische Physik and Cluster of Excellence Hearing4all, Carl von Ossietzky Universität, Oldenburg, 26129, Germany
| |
Collapse
|
88
|
Zhang M, Mary Ying YL, Ihlefeld A. Spatial Release From Informational Masking: Evidence From Functional Near Infrared Spectroscopy. Trends Hear 2018; 22:2331216518817464. [PMID: 30558491 PMCID: PMC6299332 DOI: 10.1177/2331216518817464] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 10/31/2018] [Accepted: 11/13/2018] [Indexed: 11/30/2022] Open
Abstract
Informational masking (IM) can greatly reduce speech intelligibility, but the neural mechanisms underlying IM are not understood. Binaural differences between target and masker can improve speech perception. In general, improvement in masked speech intelligibility due to provision of spatial cues is called spatial release from masking. Here, we focused on an aspect of spatial release from masking, specifically, the role of spatial attention. We hypothesized that in a situation with IM background sound (a) attention to speech recruits lateral frontal cortex (LFCx) and (b) LFCx activity varies with direction of spatial attention. Using functional near infrared spectroscopy, we assessed LFCx activity bilaterally in normal-hearing listeners. In Experiment 1, two talkers were simultaneously presented. Listeners either attended to the target talker (speech task) or they listened passively to an unintelligible, scrambled version of the acoustic mixture (control task). Target and masker differed in pitch and interaural time difference (ITD). Relative to the passive control, LFCx activity increased during attentive listening. Experiment 2 measured how LFCx activity varied with ITD, by testing listeners on the speech task in Experiment 1, except that talkers either were spatially separated by ITD or colocated. Results show that directing of auditory attention activates LFCx bilaterally. Moreover, right LFCx is recruited more strongly in the spatially separated as compared with colocated configurations. Findings hint that LFCx function contributes to spatial release from masking in situations with IM.
Collapse
Affiliation(s)
- Min Zhang
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
- Graduate School of Biomedical Sciences, Rutgers University, Newark, NJ, USA
| | - Yu-Lan Mary Ying
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Antje Ihlefeld
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
| |
Collapse
|
89
|
Ienca M, Andorno R. Towards new human rights in the age of neuroscience and neurotechnology. LIFE SCIENCES, SOCIETY AND POLICY 2017; 13:5. [PMID: 28444626 PMCID: PMC5447561 DOI: 10.1186/s40504-017-0050-1] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 03/20/2017] [Indexed: 05/10/2023]
Abstract
Rapid advancements in human neuroscience and neurotechnology open unprecedented possibilities for accessing, collecting, sharing and manipulating information from the human brain. Such applications raise important challenges to human rights principles that need to be addressed to prevent unintended consequences. This paper assesses the implications of emerging neurotechnology applications in the context of the human rights framework and suggests that existing human rights may not be sufficient to respond to these emerging issues. After analysing the relationship between neuroscience and human rights, we identify four new rights that may become of great relevance in the coming decades: the right to cognitive liberty, the right to mental privacy, the right to mental integrity, and the right to psychological continuity.
Collapse
Affiliation(s)
- Marcello Ienca
- Institute for Biomedical Ethics, University of Basel, Bernouillstrasse 28, 4056, Basel, Switzerland.
| | | |
Collapse
|
90
|
Bauer AKR, Bleichner MG, Jaeger M, Thorne JD, Debener S. Dynamic phase alignment of ongoing auditory cortex oscillations. Neuroimage 2017; 167:396-407. [PMID: 29170070 DOI: 10.1016/j.neuroimage.2017.11.037] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Revised: 11/13/2017] [Accepted: 11/18/2017] [Indexed: 11/19/2022] Open
Abstract
Neural oscillations can synchronize to external rhythmic stimuli, as for example in speech and music. While previous studies have mainly focused on elucidating the fundamental concept of neural entrainment, less is known about the time course of entrainment. In this human electroencephalography (EEG) study, we unravel the temporal evolution of neural entrainment by contrasting short and long periods of rhythmic stimulation. Listeners had to detect short silent gaps that were systematically distributed with respect to the phase of a 3 Hz frequency-modulated tone. We found that gap detection performance was modulated by the stimulus stream with a consistent stimulus phase across participants for short and long stimulation. Electrophysiological analysis confirmed neural entrainment effects at 3 Hz and the 6 Hz harmonic for both short and long stimulation lengths. 3 Hz source level analysis revealed that longer stimulation resulted in a phase shift of a participant's neural phase relative to the stimulus phase. Phase coupling increased over the first second of stimulation, but no effects for phase coupling strength were observed over time. The dynamic evolution of phase alignment suggests that the brain attunes to external rhythmic stimulation by adapting the brain's internal representation of incoming environmental stimuli.
Collapse
Affiliation(s)
- Anna-Katharina R Bauer
- Neuropsychology Lab, Department of Psychology, European Medical School, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany.
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, European Medical School, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany; Cluster of Excellence Hearing4all, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, European Medical School, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany; Research Centre Neurosensory Science, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany
| | - Jeremy D Thorne
- Neuropsychology Lab, Department of Psychology, European Medical School, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, European Medical School, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany; Cluster of Excellence Hearing4all, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany; Research Centre Neurosensory Science, University of Oldenburg, Ammerlaender Heerstraße 114-118, 26129, Oldenburg, Germany
| |
Collapse
|
91
|
Haghighi M, Moghadamfalahi M, Akcakaya M, Shinn-Cunningham BG, Erdogmus D. A Graphical Model for Online Auditory Scene Modulation Using EEG Evidence for Attention. IEEE Trans Neural Syst Rehabil Eng 2017; 25:1970-1977. [PMID: 28600256 PMCID: PMC5681401 DOI: 10.1109/tnsre.2017.2712419] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent findings indicate that brain interfaces have the potential to enable attention-guided auditory scene analysis and manipulation in applications, such as hearing aids and augmented/virtual environments. Specifically, noninvasively acquired electroencephalography (EEG) signals have been demonstrated to carry some evidence regarding, which of multiple synchronous speech waveforms the subject attends to. In this paper, we demonstrate that: 1) using data- and model-driven cross-correlation features yield competitive binary auditory attention classification results with at most 20 s of EEG from 16 channels or even a single well-positioned channel; 2) a model calibrated using equal-energy speech waveforms competing for attention could perform well on estimating attention in closed-loop unbalanced-energy speech waveform situations, where the speech amplitudes are modulated by the estimated attention posterior probability distribution; 3) such a model would perform even better if it is corrected (linearly, in this instance) based on EEG evidence dependence on speech weights in the mixture; and 4) calibrating a model based on population EEG could result in acceptable performance for new individuals/users; therefore, EEG-based auditory attention classifiers may generalize across individuals, leading to reduced or eliminated calibration time and effort.
Collapse
|
92
|
The Right Temporoparietal Junction Supports Speech Tracking During Selective Listening: Evidence from Concurrent EEG-fMRI. J Neurosci 2017; 37:11505-11516. [PMID: 29061698 DOI: 10.1523/jneurosci.1007-17.2017] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 08/28/2017] [Accepted: 09/05/2017] [Indexed: 11/21/2022] Open
Abstract
Listening selectively to one out of several competing speakers in a "cocktail party" situation is a highly demanding task. It relies on a widespread cortical network, including auditory sensory, but also frontal and parietal brain regions involved in controlling auditory attention. Previous work has shown that, during selective listening, ongoing neural activity in auditory sensory areas is dominated by the attended speech stream, whereas competing input is suppressed. The relationship between these attentional modulations in the sensory tracking of the attended speech stream and frontoparietal activity during selective listening is, however, not understood. We studied this question in young, healthy human participants (both sexes) using concurrent EEG-fMRI and a sustained selective listening task, in which one out of two competing speech streams had to be attended selectively. An EEG-based speech envelope reconstruction method was applied to assess the strength of the cortical tracking of the to-be-attended and the to-be-ignored stream during selective listening. Our results show that individual speech envelope reconstruction accuracies obtained for the to-be-attended speech stream were positively correlated with the amplitude of sustained BOLD responses in the right temporoparietal junction, a core region of the ventral attention network. This brain region further showed task-related functional connectivity to secondary auditory cortex and regions of the frontoparietal attention network, including the intraparietal sulcus and the inferior frontal gyrus. This suggests that the right temporoparietal junction is involved in controlling attention during selective listening, allowing for a better cortical tracking of the attended speech stream.SIGNIFICANCE STATEMENT Listening selectively to one out of several simultaneously talking speakers in a "cocktail party" situation is a highly demanding task. It activates a widespread network of auditory sensory and hierarchically higher frontoparietal brain regions. However, how these different processing levels interact during selective listening is not understood. Here, we investigated this question using fMRI and concurrently acquired scalp EEG. We found that activation levels in the right temporoparietal junction correlate with the sensory representation of a selectively attended speech stream. In addition, this region showed significant functional connectivity to both auditory sensory and other frontoparietal brain areas during selective listening. This suggests that the right temporoparietal junction contributes to controlling selective auditory attention in "cocktail party" situations.
Collapse
|
93
|
O'Sullivan J, Chen Z, Herrero J, McKhann GM, Sheth SA, Mehta AD, Mesgarani N. Neural decoding of attentional selection in multi-speaker environments without access to clean sources. J Neural Eng 2017; 14:056001. [PMID: 28776506 PMCID: PMC5805380 DOI: 10.1088/1741-2552/aa7ab4] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE People who suffer from hearing impairments can find it difficult to follow a conversation in a multi-speaker environment. Current hearing aids can suppress background noise; however, there is little that can be done to help a user attend to a single conversation amongst many without knowing which speaker the user is attending to. Cognitively controlled hearing aids that use auditory attention decoding (AAD) methods are the next step in offering help. Translating the successes in AAD research to real-world applications poses a number of challenges, including the lack of access to the clean sound sources in the environment with which to compare with the neural signals. We propose a novel framework that combines single-channel speech separation algorithms with AAD. APPROACH We present an end-to-end system that (1) receives a single audio channel containing a mixture of speakers that is heard by a listener along with the listener's neural signals, (2) automatically separates the individual speakers in the mixture, (3) determines the attended speaker, and (4) amplifies the attended speaker's voice to assist the listener. MAIN RESULTS Using invasive electrophysiology recordings, we identified the regions of the auditory cortex that contribute to AAD. Given appropriate electrode locations, our system is able to decode the attention of subjects and amplify the attended speaker using only the mixed audio. Our quality assessment of the modified audio demonstrates a significant improvement in both subjective and objective speech quality measures. SIGNIFICANCE Our novel framework for AAD bridges the gap between the most recent advancements in speech processing technologies and speech prosthesis research and moves us closer to the development of cognitively controlled hearable devices for the hearing impaired.
Collapse
Affiliation(s)
- James O'Sullivan
- Department of Electrical Engineering, Columbia University, New York, NY, United States of America. Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States of America
| | | | | | | | | | | | | |
Collapse
|
94
|
Haghighi M, Moghadamfalahi M, Akcakaya M, Erdogmus D. EEG-assisted Modulation of Sound Sources in the Auditory Scene. Biomed Signal Process Control 2017; 39:263-270. [PMID: 31118975 DOI: 10.1016/j.bspc.2017.08.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Noninvasive EEG (electroencephalography) based auditory attention detection could be useful for improved hearing aids in the future. This work is a novel attempt to investigate the feasibility of online modulation of sound sources by probabilistic detection of auditory attention, using a noninvasive EEG-based brain computer interface. Proposed online system modulates the upcoming sound sources through gain adaptation which employs probabilistic decisions (soft decisions) from a classifier trained on offline calibration data. In this work, calibration EEG data were collected in sessions where the participants listened to two sound sources (one attended and one unattended). Cross-correlation coefficients between the EEG measurements and the attended and unattended sound source envelope (estimates) are used to show differences in sharpness and delays of neural responses for attended versus unattended sound source. Salient features to distinguish attended sources from the unattended ones in the correlation patterns have been identified, and later they have been used to train an auditory attention classifier. Using this classifier, we have shown high offline detection performance with single channel EEG measurements compared to the existing approaches in the literature which employ large number of channels. In addition, using the classifier trained offline in the calibration session, we have shown the performance of the online sound source modulation system. We observe that online sound source modulation system is able to keep the level of attended sound source higher than the unattended source.
Collapse
Affiliation(s)
| | | | - Murat Akcakaya
- University of Pittsburgh, 4200 Fifth Ave, Pittsburgh, PA 15260
| | - Deniz Erdogmus
- Northeastern University, 360 Huntington Ave, Boston, MA 02115
| |
Collapse
|
95
|
Kappel SL, Looney D, Mandic DP, Kidmose P. Physiological artifacts in scalp EEG and ear-EEG. Biomed Eng Online 2017; 16:103. [PMID: 28800744 PMCID: PMC5553928 DOI: 10.1186/s12938-017-0391-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Accepted: 08/04/2017] [Indexed: 11/25/2022] Open
Abstract
Background A problem inherent to recording EEG is the interference arising from noise and artifacts. While in a laboratory environment, artifacts and interference can, to a large extent, be avoided or controlled, in real-life scenarios this is a challenge. Ear-EEG is a concept where EEG is acquired from electrodes in the ear. Methods We present a characterization of physiological artifacts generated in a controlled environment for nine subjects. The influence of the artifacts was quantified in terms of the signal-to-noise ratio (SNR) deterioration of the auditory steady-state response. Alpha band modulation was also studied in an open/closed eyes paradigm. Results Artifacts related to jaw muscle contractions were present all over the scalp and in the ear, with the highest SNR deteriorations in the gamma band. The SNR deterioration for jaw artifacts were in general higher in the ear compared to the scalp. Whereas eye-blinking did not influence the SNR in the ear, it was significant for all groups of scalps electrodes in the delta and theta bands. Eye movements resulted in statistical significant SNR deterioration in both frontal, temporal and ear electrodes. Recordings of alpha band modulation showed increased power and coherence of the EEG for ear and scalp electrodes in the closed-eyes periods. Conclusions Ear-EEG is a method developed for unobtrusive and discreet recording over long periods of time and in real-life environments. This study investigated the influence of the most important types of physiological artifacts, and demonstrated that spontaneous activity, in terms of alpha band oscillations, could be recorded from the ear-EEG platform. In its present form ear-EEG was more prone to jaw related artifacts and less prone to eye-blinking artifacts compared to state-of-the-art scalp based systems.
Collapse
Affiliation(s)
- Simon L Kappel
- Department of Engineering, Aarhus University, Finlandsgade 22, 8200, Aarhus N, Denmark.
| | - David Looney
- Pindrop, 817 West Peachtree Street NW, Suite 770, 24105, Atlanta, GA, USA.,Department of Electrical and Electronic Engineering, Imperial College, London, SW7 2BT, UK
| | - Danilo P Mandic
- Department of Electrical and Electronic Engineering, Imperial College, London, SW7 2BT, UK
| | - Preben Kidmose
- Department of Engineering, Aarhus University, Finlandsgade 22, 8200, Aarhus N, Denmark
| |
Collapse
|
96
|
Aroudi A, Doclo S. EEG-based auditory attention decoding using unprocessed binaural signals in reverberant and noisy conditions? ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2017:484-488. [PMID: 29059915 DOI: 10.1109/embc.2017.8036867] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
To decode auditory attention from single-trial EEG recordings in an acoustic scenario with two competing speakers, a least-squares method has been recently proposed. This method however requires the clean speech signals of both the attended and the unattended speaker to be available as reference signals. Since in practice only the binaural signals consisting of a reverberant mixture of both speakers and background noise are available, in this paper we explore the potential of using these (unprocessed) signals as reference signals for decoding auditory attention in different acoustic conditions (anechoic, reverberant, noisy, and reverberant-noisy). In addition, we investigate whether it is possible to use these signals instead of the clean attended speech signal for filter training. The experimental results show that using the unprocessed binaural signals for filter training and for decoding auditory attention is feasible with a relatively large decoding performance, although for most acoustic conditions the decoding performance is significantly lower than when using the clean speech signals.
Collapse
|
97
|
Das N, Van Eyndhoven S, Francart T, Bertrand A. Adaptive attention-driven speech enhancement for EEG-informed hearing prostheses. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:77-80. [PMID: 28268285 DOI: 10.1109/embc.2016.7590644] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
State-of-the-art hearing prostheses are equipped with acoustic noise reduction algorithms to improve speech intelligibility. Currently, one of the major challenges is to perform acoustic noise reduction in so-called cocktail party scenarios with multiple speakers, in particular because it is difficult-if not impossible-for the algorithm to determine which are the target speaker(s) that should be enhanced, and which speaker(s) should be treated as interfering sources. Recently, it has been shown that electroencephalography (EEG) can be used to perform auditory attention detection, i.e., to detect to which speaker a subject is attending based on recordings of neural activity. In this paper, we combine such an EEG-based auditory attention detection (AAD) paradigm with an acoustic noise reduction algorithm based on the multi-channel Wiener filter (MWF), leading to a neuro-steered MWF. In particular, we analyze how the AAD accuracy affects the noise suppression performance of an adaptive MWF in a sliding-window implementation, where the user switches his attention between two speakers.
Collapse
|
98
|
Van Eyndhoven S, Francart T, Bertrand A. EEG-Informed Attended Speaker Extraction From Recorded Speech Mixtures With Application in Neuro-Steered Hearing Prostheses. IEEE Trans Biomed Eng 2017; 64:1045-1056. [DOI: 10.1109/tbme.2016.2587382] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
99
|
Biesmans W, Das N, Francart T, Bertrand A. Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario. IEEE Trans Neural Syst Rehabil Eng 2017; 25:402-412. [DOI: 10.1109/tnsre.2016.2571900] [Citation(s) in RCA: 133] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
100
|
Fuglsang SA, Dau T, Hjortkjær J. Noise-robust cortical tracking of attended speech in real-world acoustic scenes. Neuroimage 2017; 156:435-444. [PMID: 28412441 DOI: 10.1016/j.neuroimage.2017.04.026] [Citation(s) in RCA: 97] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Revised: 04/07/2017] [Accepted: 04/10/2017] [Indexed: 11/30/2022] Open
Abstract
Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream of a particular talker. Across the different listening environments, we found that the attended talker could be accurately decoded from single-trial EEG data irrespective of the different distortions in the acoustic input. For highly reverberant environments, speech envelopes reconstructed from neural responses to the distorted stimuli resembled the original clean signal more than the distorted input. With reverberant speech, we observed a late cortical response to the attended speech stream that encoded temporal modulations in the speech signal without its reverberant distortion. Single-trial attention decoding accuracies based on 40-50s long blocks of data from 64 scalp electrodes were equally high (80-90% correct) in all considered listening environments and remained statistically significant using down to 10 scalp electrodes and short (<30-s) unaveraged EEG segments. In contrast to the robust decoding of the attended talker we found that decoding of the unattended talker deteriorated with the acoustic distortions. These results suggest that cortical activity tracks an attended speech signal in a way that is invariant to acoustic distortions encountered in real-life sound environments. Noise-robust attention decoding additionally suggests a potential utility of stimulus reconstruction techniques in attention-controlled brain-computer interfaces.
Collapse
Affiliation(s)
- Søren Asp Fuglsang
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark.
| | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark
| | - Jens Hjortkjær
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark; Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Kettegaard Allé 30, 2650 Hvidovre, Denmark.
| |
Collapse
|