1
|
Roebben A, Heintz N, Geirnaert S, Francart T, Bertrand A. 'Are you even listening?' - EEG-based decoding of absolute auditory attention to natural speech. J Neural Eng 2024; 21:036046. [PMID: 38834062 DOI: 10.1088/1741-2552/ad5403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 06/04/2024] [Indexed: 06/06/2024]
Abstract
Objective.In this study, we use electroencephalography (EEG) recordings to determine whether a subject is actively listening to a presented speech stimulus. More precisely, we aim to discriminate between an active listening condition, and a distractor condition where subjects focus on an unrelated distractor task while being exposed to a speech stimulus. We refer to this task as absolute auditory attention decoding.Approach.We re-use an existing EEG dataset where the subjects watch a silent movie as a distractor condition, and introduce a new dataset with two distractor conditions (silently reading a text and performing arithmetic exercises). We focus on two EEG features, namely neural envelope tracking (NET) and spectral entropy (SE). Additionally, we investigate whether the detection of such an active listening condition can be combined with a selective auditory attention decoding (sAAD) task, where the goal is to decide to which of multiple competing speakers the subject is attending. The latter is a key task in so-called neuro-steered hearing devices that aim to suppress unattended audio, while preserving the attended speaker.Main results.Contrary to a previous hypothesis of higher SE being related with actively listening rather than passively listening (without any distractors), we find significantly lower SE in the active listening condition compared to the distractor conditions. Nevertheless, the NET is consistently significantly higher when actively listening. Similarly, we show that the accuracy of a sAAD task improves when evaluating the accuracy only on the highest NET segments. However, the reverse is observed when evaluating the accuracy only on the lowest SE segments.Significance.We conclude that the NET is more reliable for decoding absolute auditory attention as it is consistently higher when actively listening, whereas the relation of the SE between actively and passively listening seems to depend on the nature of the distractor.
Collapse
Affiliation(s)
- Arnout Roebben
- KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Nicolas Heintz
- KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
- KU Leuven, Department of Neurosciences, Experimental Oto-Rhino-Laryngology (ExpORL), Leuven, Belgium
- Leuven.AI-KU Leuven institute for AI, Leuven, Belgium
| | - Simon Geirnaert
- KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
- KU Leuven, Department of Neurosciences, Experimental Oto-Rhino-Laryngology (ExpORL), Leuven, Belgium
- Leuven.AI-KU Leuven institute for AI, Leuven, Belgium
| | - Tom Francart
- KU Leuven, Department of Neurosciences, Experimental Oto-Rhino-Laryngology (ExpORL), Leuven, Belgium
- Leuven.AI-KU Leuven institute for AI, Leuven, Belgium
| | - Alexander Bertrand
- KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
- Leuven.AI-KU Leuven institute for AI, Leuven, Belgium
| |
Collapse
|
2
|
Mizokuchi K, Tanaka T, Sato TG, Shiraki Y. Alpha band modulation caused by selective attention to music enables EEG classification. Cogn Neurodyn 2024; 18:1005-1020. [PMID: 38826648 PMCID: PMC11143110 DOI: 10.1007/s11571-023-09955-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 02/19/2023] [Accepted: 03/08/2023] [Indexed: 06/04/2024] Open
Abstract
Humans are able to pay selective attention to music or speech in the presence of multiple sounds. It has been reported that in the speech domain, selective attention enhances the cross-correlation between the envelope of speech and electroencephalogram (EEG) while also affecting the spatial modulation of the alpha band. However, when multiple music pieces are performed at the same time, it is unclear how selective attention affects neural entrainment and spatial modulation. In this paper, we hypothesized that the entrainment to the attended music differs from that to the unattended music and that spatial modulation in the alpha band occurs in conjunction with attention. We conducted experiments in which we presented musical excerpts to 15 participants, each listening to two excerpts simultaneously but paying attention to one of the two. The results showed that the cross-correlation function between the EEG signal and the envelope of the unattended melody had a more prominent peak than that of the attended melody, contrary to the findings for speech. In addition, the spatial modulation in the alpha band was found with a data-driven approach called the common spatial pattern method. Classification of the EEG signal with a support vector machine identified attended melodies and achieved an accuracy of 100% for 11 of the 15 participants. These results suggest that selective attention to music suppresses entrainment to the melody and that spatial modulation of the alpha band occurs in conjunction with attention. To the best of our knowledge, this is the first report to detect attended music consisting of several types of music notes only with EEG.
Collapse
Affiliation(s)
- Kana Mizokuchi
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Toshihisa Tanaka
- Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Takashi G. Sato
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan
| | - Yoshifumi Shiraki
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan
| |
Collapse
|
3
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
4
|
Zhang X, Li J, Li Z, Hong B, Diao T, Ma X, Nolte G, Engel AK, Zhang D. Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension. Neuroimage 2023; 282:120404. [PMID: 37806465 DOI: 10.1016/j.neuroimage.2023.120404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/19/2023] [Accepted: 10/05/2023] [Indexed: 10/10/2023] Open
Abstract
Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.
Collapse
Affiliation(s)
- Xinmiao Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Jiawei Li
- Department of Education and Psychology, Freie Universität Berlin, Berlin 14195, Federal Republic of Germany
| | - Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China; Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Tongxiang Diao
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Xin Ma
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
5
|
Wang B, Xu X, Niu Y, Wu C, Wu X, Chen J. EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners. Cereb Cortex 2023; 33:10972-10983. [PMID: 37750333 DOI: 10.1093/cercor/bhad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/21/2023] [Accepted: 08/22/2023] [Indexed: 09/27/2023] Open
Abstract
Auditory attention decoding (AAD) was used to determine the attended speaker during an auditory selective attention task. However, the auditory factors modulating AAD remained unclear for hearing-impaired (HI) listeners. In this study, scalp electroencephalogram (EEG) was recorded with an auditory selective attention paradigm, in which HI listeners were instructed to attend one of the two simultaneous speech streams with or without congruent visual input (articulation movements), and at a high or low target-to-masker ratio (TMR). Meanwhile, behavioral hearing tests (i.e. audiogram, speech reception threshold, temporal modulation transfer function) were used to assess listeners' individual auditory abilities. The results showed that both visual input and increasing TMR could significantly enhance the cortical tracking of the attended speech and AAD accuracy. Further analysis revealed that the audiovisual (AV) gain in attended speech cortical tracking was significantly correlated with listeners' auditory amplitude modulation (AM) sensitivity, and the TMR gain in attended speech cortical tracking was significantly correlated with listeners' hearing thresholds. Temporal response function analysis revealed that subjects with higher AM sensitivity demonstrated more AV gain over the right occipitotemporal and bilateral frontocentral scalp electrodes.
Collapse
Affiliation(s)
- Bo Wang
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Xiran Xu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Yadong Niu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Chao Wu
- School of Nursing, Peking University, Beijing 100191, China
| | - Xihong Wu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| | - Jing Chen
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| |
Collapse
|
6
|
Jia Z, Xu C, Li J, Gao J, Ding N, Luo B, Zou J. Phase Property of Envelope-Tracking EEG Response Is Preserved in Patients with Disorders of Consciousness. eNeuro 2023; 10:ENEURO.0130-23.2023. [PMID: 37500493 PMCID: PMC10420405 DOI: 10.1523/eneuro.0130-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/16/2023] [Accepted: 07/20/2023] [Indexed: 07/29/2023] Open
Abstract
When listening to speech, the low-frequency cortical response below 10 Hz can track the speech envelope. Previous studies have demonstrated that the phase lag between speech envelope and cortical response can reflect the mechanism by which the envelope-tracking response is generated. Here, we analyze whether the mechanism to generate the envelope-tracking response is modulated by the level of consciousness, by studying how the stimulus-response phase lag is modulated by the disorder of consciousness (DoC). It is observed that DoC patients in general show less reliable neural tracking of speech. Nevertheless, the stimulus-response phase lag changes linearly with frequency between 3.5 and 8 Hz, for DoC patients who show reliable cortical tracking to speech, regardless of the consciousness state. The mean phase lag is also consistent across these DoC patients. These results suggest that the envelope-tracking response to speech can be generated by an automatic process that is barely modulated by the consciousness state.
Collapse
Affiliation(s)
- Ziting Jia
- The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250033, China
| | - Chuan Xu
- Department of Neurology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou 310019, China
| | - Jingqi Li
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Jian Gao
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Benyan Luo
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310003, China
| | - Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
7
|
Deoisres S, Lu Y, Vanheusden FJ, Bell SL, Simpson DM. Continuous speech with pauses inserted between words increases cortical tracking of speech envelope. PLoS One 2023; 18:e0289288. [PMID: 37498891 PMCID: PMC10374040 DOI: 10.1371/journal.pone.0289288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 07/17/2023] [Indexed: 07/29/2023] Open
Abstract
The decoding multivariate Temporal Response Function (decoder) or speech envelope reconstruction approach is a well-known tool for assessing the cortical tracking of speech envelope. It is used to analyse the correlation between the speech stimulus and the neural response. It is known that auditory late responses are enhanced with longer gaps between stimuli, but it is not clear if this applies to the decoder, and whether the addition of gaps/pauses in continuous speech could be used to increase the envelope reconstruction accuracy. We investigated this in normal hearing participants who listened to continuous speech with no added pauses (natural speech), and then with short (250 ms) or long (500 ms) silent pauses inserted between each word. The total duration for continuous speech stimulus with no, short, and long pauses were approximately, 10 minutes, 16 minutes, and 21 minutes, respectively. EEG and speech envelope were simultaneously acquired and then filtered into delta (1-4 Hz) and theta (4-8 Hz) frequency bands. In addition to analysing responses to the whole speech envelope, speech envelope was also segmented to focus response analysis on onset and non-onset regions of speech separately. Our results show that continuous speech with additional pauses inserted between words significantly increases the speech envelope reconstruction correlations compared to using natural speech, in both the delta and theta frequency bands. It also appears that these increase in speech envelope reconstruction are dominated by the onset regions in the speech envelope. Introducing pauses in speech stimuli has potential clinical benefit for increasing auditory evoked response detectability, though with the disadvantage of speech sounding less natural. The strong effect of pauses and onsets on the decoder should be considered when comparing results from different speech corpora. Whether the increased cortical response, when longer pauses are introduced, reflect improved intelligibility requires further investigation.
Collapse
Affiliation(s)
- Suwijak Deoisres
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - Yuhan Lu
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Frederique J Vanheusden
- Department of Engineering, School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
| | - Steven L Bell
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - David M Simpson
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
8
|
Takai S, Kanno A, Kawase T, Shirakura M, Suzuki J, Nakasato N, Kawashima R, Katori Y. Possibility of additive effects by the presentation of visual information related to distractor sounds on the contra-sound effects of the N100m responses. Hear Res 2023; 434:108778. [PMID: 37105052 DOI: 10.1016/j.heares.2023.108778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/13/2023] [Accepted: 04/21/2023] [Indexed: 04/29/2023]
Abstract
Auditory-evoked responses can be affected by different types of contralateral sounds or by attention modulation. The present study examined the additive effects of presenting visual information about contralateral sounds as distractions during dichotic listening tasks on the contralateral effects of N100m responses in the auditory-evoked cortex in 16 subjects (12 males and 4 females). In magnetoencephalography, a tone-burst of 500 ms duration at a frequency of 1000 Hz was played to the left ear at a level of 70 dB as a stimulus to elicit the N100m response, and a movie clip was used as a distractor stimulus under audio-only, visual-only, and audio-visual conditions. Subjects were instructed to pay attention to the left ear and press the response button each time they heard a tone-burst stimulus in their left ear. The results suggest that the presentation of visual information related to the contralateral sound, which worked as a distractor, significantly suppressed the amplitude of the N100m response compared with only the contralateral sound condition. In contrast, the presentation of visual information related to contralateral sound did not affect the latency of the N100m response. These results suggest that the integration of contralateral sounds and related movies may have resulted in a more perceptually loaded stimulus and reduced the intensity of attention to tone-bursts. Our findings suggest that selective attention and saliency mechanisms may have cross-modal effects on other modes of perception.
Collapse
Affiliation(s)
- Shunsuke Takai
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan.
| | - Akitake Kanno
- Department of Advanced Spintronics Medical Engineering, Graduate School of Engineering, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan; Department of Epileptology, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Tetsuaki Kawase
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan; Laboratory of Rehabilitative Auditory Science, Tohoku University Graduate School of Biomedical Engineering, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan; Department of Audiology, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan
| | - Masayuki Shirakura
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan
| | - Jun Suzuki
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan
| | - Nobukatsu Nakasato
- Department of Advanced Spintronics Medical Engineering, Graduate School of Engineering, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan; Department of Epileptology, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Ryuta Kawashima
- Institute of Development, Aging and Cancer, Tohoku University, 4-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Yukio Katori
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan
| |
Collapse
|
9
|
Slugocki C, Kuk F, Korhonen P. Left Lateralization of the Cortical Auditory-Evoked Potential Reflects Aided Processing and Speech-in-Noise Performance of Older Listeners With a Hearing Loss. Ear Hear 2023; 44:399-410. [PMID: 36331191 DOI: 10.1097/aud.0000000000001293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
OBJECTIVES We analyzed the lateralization of the cortical auditory-evoked potential recorded previously from aided hearing-impaired listeners as part of a study on noise-mitigating hearing aid technologies. Specifically, we asked whether the degree of leftward lateralization in the magnitudes and latencies of these components was reduced by noise and, conversely, enhanced/restored by hearing aid technology. We further explored if individual differences in lateralization could predict speech-in-noise abilities in listeners when tested in the aided mode. DESIGN The study followed a double-blind within-subjects design. Nineteen older adults (8 females; mean age = 73.6 years, range = 56 to 86 years) with moderate to severe hearing loss participated. The cortical auditory-evoked potential was measured over 400 presentations of a synthetic /da/ stimulus which was delivered binaurally in a simulated aided mode using shielded ear-insert transducers. Sequences of the /da/ syllable were presented from the front at 75 dB SPL-C with continuous speech-shaped noise presented from the back at signal-to-noise ratios of 0, 5, and 10 dB. Four hearing aid conditions were tested: (1) omnidirectional microphone (OM) with noise reduction (NR) disabled, (2) OM with NR enabled, (3) directional microphone (DM) with NR disabled, and (4) DM with NR enabled. Lateralization of the P1 component and N1P2 complex was quantified across electrodes spanning the mid-coronal plane. Subsequently, listener speech-in-noise performance was assessed using the Repeat-Recall Test at the same signal-to-noise ratios and hearing aid conditions used to measure cortical activity. RESULTS As expected, both the P1 component and the N1P2 complex were of greater magnitude in electrodes over the left compared to the right hemisphere. In addition, N1 and P2 peaks tended to occur earlier over the left hemisphere, although the effect was mediated by an interaction of signal-to-noise ratio and hearing aid technology. At a group level, degrees of lateralization for the P1 component and the N1P2 complex were enhanced in the DM relative to the OM mode. Moreover, linear mixed-effects models suggested that the degree of leftward lateralization in the N1P2 complex, but not the P1 component, accounted for a significant portion of variability in speech-in-noise performance that was not related to age, hearing loss, hearing aid processing, or signal-to-noise ratio. CONCLUSIONS A robust leftward lateralization of cortical potentials was observed in older listeners when tested in the aided mode. Moreover, the degree of lateralization was enhanced by hearing aid technologies that improve the signal-to-noise ratio for speech. Accounting for the effects of signal-to-noise ratio, hearing aid technology, semantic context, and audiometric thresholds, individual differences in left-lateralized speech-evoked cortical activity were found to predict listeners' speech-in-noise abilities. Quantifying cortical auditory-evoked potential component lateralization may then be useful for profiling listeners' likelihood of communication success following clinical amplification.
Collapse
Affiliation(s)
- Christopher Slugocki
- Office of Research in Clinical Amplification (ORCA-USA), WS Audiology, Lisle, Illinois, USA
| | | | | |
Collapse
|
10
|
Aljarboa GS, Bell SL, Simpson DM. Detecting cortical responses to continuous running speech using EEG data from only one channel. Int J Audiol 2023; 62:199-208. [PMID: 35152811 DOI: 10.1080/14992027.2022.2035832] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVE To explore the detection of cortical responses to continuous speech using a single EEG channel. Particularly, to compare detection rates and times using a cross-correlation approach and parameters extracted from the temporal response function (TRF). DESIGN EEG from 32-channels were recorded whilst presenting 25-min continuous English speech. Detection parameters were cross-correlation between speech and EEG (XCOR), peak value and power of the TRF filter (TRF-peak and TRF-power), and correlation between predicted TRF and true EEG (TRF-COR). A bootstrap analysis was used to determine response statistical significance. Different electrode configurations were compared: Using single channels Cz or Fz, or selecting channels with the highest correlation value. STUDY SAMPLE Seventeen native English-speaking subjects with mild-to-moderate hearing loss. RESULTS Significant cortical responses were detected from all subjects at Fz channel with XCOR and TRF-COR. Lower detection time was seen for XCOR (mean = 4.8 min) over TRF parameters (best TRF-COR, mean = 6.4 min), with significant time differences from XCOR to TRF-peak and TRF-power. Analysing multiple EEG channels and testing channels with the highest correlation between envelope and EEG reduced detection sensitivity compared to Fz alone. CONCLUSIONS Cortical responses to continuous speech can be detected from a single channel with recording times that may be suitable for clinical application.
Collapse
Affiliation(s)
- Ghadah S Aljarboa
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK.,Communication Sciences, Princess Nora bint Abdul Rahman University, Riyadh, Saudi Arabia
| | - Steve L Bell
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK
| | - David M Simpson
- Institute of Sound and Vibration Research, University of Southampton, Southampton, UK
| |
Collapse
|
11
|
Rosenkranz M, Cetin T, Uslar VN, Bleichner MG. Investigating the attentional focus to workplace-related soundscapes in a complex audio-visual-motor task using EEG. FRONTIERS IN NEUROERGONOMICS 2023; 3:1062227. [PMID: 38235454 PMCID: PMC10790850 DOI: 10.3389/fnrgo.2022.1062227] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 12/16/2022] [Indexed: 01/19/2024]
Abstract
Introduction In demanding work situations (e.g., during a surgery), the processing of complex soundscapes varies over time and can be a burden for medical personnel. Here we study, using mobile electroencephalography (EEG), how humans process workplace-related soundscapes while performing a complex audio-visual-motor task (3D Tetris). Specifically, we wanted to know how the attentional focus changes the processing of the soundscape as a whole. Method Participants played a game of 3D Tetris in which they had to use both hands to control falling blocks. At the same time, participants listened to a complex soundscape, similar to what is found in an operating room (i.e., the sound of machinery, people talking in the background, alarm sounds, and instructions). In this within-subject design, participants had to react to instructions (e.g., "place the next block in the upper left corner") and to sounds depending on the experimental condition, either to a specific alarm sound originating from a fixed location or to a beep sound that originated from varying locations. Attention to the alarm reflected a narrow attentional focus, as it was easy to detect and most of the soundscape could be ignored. Attention to the beep reflected a wide attentional focus, as it required the participants to monitor multiple different sound streams. Results and discussion Results show the robustness of the N1 and P3 event related potential response during this dynamic task with a complex auditory soundscape. Furthermore, we used temporal response functions to study auditory processing to the whole soundscape. This work is a step toward studying workplace-related sound processing in the operating room using mobile EEG.
Collapse
Affiliation(s)
- Marc Rosenkranz
- Neurophysiology of Everyday Life Group, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Timur Cetin
- Pius-Hospital Oldenburg, University Hospital for Visceral Surgery, University of Oldenburg, Oldenburg, Germany
| | - Verena N. Uslar
- Pius-Hospital Oldenburg, University Hospital for Visceral Surgery, University of Oldenburg, Oldenburg, Germany
| | - Martin G. Bleichner
- Neurophysiology of Everyday Life Group, Department of Psychology, University of Oldenburg, Oldenburg, Germany
- Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
12
|
Mesik J, Wojtczak M. The effects of data quantity on performance of temporal response function analyses of natural speech processing. Front Neurosci 2023; 16:963629. [PMID: 36711133 PMCID: PMC9878558 DOI: 10.3389/fnins.2022.963629] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 12/26/2022] [Indexed: 01/15/2023] Open
Abstract
In recent years, temporal response function (TRF) analyses of neural activity recordings evoked by continuous naturalistic stimuli have become increasingly popular for characterizing response properties within the auditory hierarchy. However, despite this rise in TRF usage, relatively few educational resources for these tools exist. Here we use a dual-talker continuous speech paradigm to demonstrate how a key parameter of experimental design, the quantity of acquired data, influences TRF analyses fit to either individual data (subject-specific analyses), or group data (generic analyses). We show that although model prediction accuracy increases monotonically with data quantity, the amount of data required to achieve significant prediction accuracies can vary substantially based on whether the fitted model contains densely (e.g., acoustic envelope) or sparsely (e.g., lexical surprisal) spaced features, especially when the goal of the analyses is to capture the aspect of neural responses uniquely explained by specific features. Moreover, we demonstrate that generic models can exhibit high performance on small amounts of test data (2-8 min), if they are trained on a sufficiently large data set. As such, they may be particularly useful for clinical and multi-task study designs with limited recording time. Finally, we show that the regularization procedure used in fitting TRF models can interact with the quantity of data used to fit the models, with larger training quantities resulting in systematically larger TRF amplitudes. Together, demonstrations in this work should aid new users of TRF analyses, and in combination with other tools, such as piloting and power analyses, may serve as a detailed reference for choosing acquisition duration in future studies.
Collapse
|
13
|
Luo C, Gao Y, Fan J, Liu Y, Yu Y, Zhang X. Compromised word-level neural tracking in the high-gamma band for children with attention deficit hyperactivity disorder. Front Hum Neurosci 2023; 17:1174720. [PMID: 37213926 PMCID: PMC10196181 DOI: 10.3389/fnhum.2023.1174720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 04/18/2023] [Indexed: 05/23/2023] Open
Abstract
Children with attention deficit hyperactivity disorder (ADHD) exhibit pervasive difficulties in speech perception. Given that speech processing involves both acoustic and linguistic stages, it remains unclear which stage of speech processing is impaired in children with ADHD. To investigate this issue, we measured neural tracking of speech at syllable and word levels using electroencephalography (EEG), and evaluated the relationship between neural responses and ADHD symptoms in 6-8 years old children. Twenty-three children participated in the current study, and their ADHD symptoms were assessed with SNAP-IV questionnaires. In the experiment, the children listened to hierarchical speech sequences in which syllables and words were, respectively, repeated at 2.5 and 1.25 Hz. Using frequency domain analyses, reliable neural tracking of syllables and words was observed in both the low-frequency band (<4 Hz) and the high-gamma band (70-160 Hz). However, the neural tracking of words in the high-gamma band showed an anti-correlation with the ADHD symptom scores of the children. These results indicate that ADHD prominently impairs cortical encoding of linguistic information (e.g., words) in speech perception.
Collapse
Affiliation(s)
- Cheng Luo
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou, China
- Cheng Luo,
| | - Yayue Gao
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
- *Correspondence: Yayue Gao,
| | - Jianing Fan
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
| | - Yang Liu
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
| | - Yonglin Yu
- Department of Rehabilitation, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
- Yonglin Yu,
| | - Xin Zhang
- Department of Neurology, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
- Xin Zhang,
| |
Collapse
|
14
|
Wang S, Zhang X, Zhang J, Zong C. A synchronized multimodal neuroimaging dataset for studying brain language processing. Sci Data 2022; 9:590. [PMID: 36180444 PMCID: PMC9525723 DOI: 10.1038/s41597-022-01708-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 08/22/2022] [Indexed: 11/15/2022] Open
Abstract
We present a synchronized multimodal neuroimaging dataset for studying brain language processing (SMN4Lang) that contains functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) data on the same 12 healthy volunteers while the volunteers listened to 6 hours of naturalistic stories, as well as high-resolution structural (T1, T2), diffusion MRI and resting-state fMRI data for each participant. We also provide rich linguistic annotations for the stimuli, including word frequencies, syntactic tree structures, time-aligned characters and words, and various types of word and character embeddings. Quality assessment indicators verify that this is a high-quality neuroimaging dataset. Such synchronized data is separately collected by the same group of participants first listening to story materials in fMRI and then in MEG which are well suited to studying the dynamic processing of language comprehension, such as the time and location of different linguistic features encoded in the brain. In addition, this dataset, comprising a large vocabulary from stories with various topics, can serve as a brain benchmark to evaluate and improve computational language models. Measurement(s) | functional brain measurement • Magnetoencephalography | Technology Type(s) | Functional Magnetic Resonance Imaging • Magnetoencephalography | Factor Type(s) | naturalistic stimuli listening | Sample Characteristic - Organism | humanbeings |
Collapse
Affiliation(s)
- Shaonan Wang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China. .,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
| | - Xiaohan Zhang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Jiajun Zhang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Chengqing Zong
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
15
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
16
|
Na Y, Joo H, Trang LT, Quan LDA, Woo J. Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses. Front Neurosci 2022; 16:906616. [PMID: 36061597 PMCID: PMC9433707 DOI: 10.3389/fnins.2022.906616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 07/25/2022] [Indexed: 11/29/2022] Open
Abstract
Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models’ informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.
Collapse
Affiliation(s)
- Youngmin Na
- Department of Biomedical Engineering, University of Ulsan, Ulsan, South Korea
| | - Hyosung Joo
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea
| | - Le Thi Trang
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea
| | - Luong Do Anh Quan
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea
| | - Jihwan Woo
- Department of Biomedical Engineering, University of Ulsan, Ulsan, South Korea
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, South Korea
- *Correspondence: Jihwan Woo,
| |
Collapse
|
17
|
Muncke J, Kuruvila I, Hoppe U. Prediction of Speech Intelligibility by Means of EEG Responses to Sentences in Noise. Front Neurosci 2022; 16:876421. [PMID: 35720724 PMCID: PMC9198593 DOI: 10.3389/fnins.2022.876421] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/13/2022] [Indexed: 11/13/2022] Open
Abstract
Objective Understanding speech in noisy conditions is challenging even for people with mild hearing loss, and intelligibility for an individual person is usually evaluated by using several subjective test methods. In the last few years, a method has been developed to determine a temporal response function (TRF) between speech envelope and simultaneous electroencephalographic (EEG) measurements. By using this TRF it is possible to predict the EEG signal for any speech signal. Recent studies have suggested that the accuracy of this prediction varies with the level of noise added to the speech signal and can predict objectively the individual speech intelligibility. Here we assess the variations of the TRF itself when it is calculated for measurements with different signal-to-noise ratios and apply these variations to predict speech intelligibility. Methods For 18 normal hearing subjects the individual threshold of 50% speech intelligibility was determined by using a speech in noise test. Additionally, subjects listened passively to speech material of the speech in noise test at different signal-to-noise ratios close to individual threshold of 50% speech intelligibility while an EEG was recorded. Afterwards the shape of TRFs for each signal-to-noise ratio and subject were compared with the derived intelligibility. Results The strongest effect of variations in stimulus signal-to-noise ratio on the TRF shape occurred close to 100 ms after the stimulus presentation, and was located in the left central scalp region. The investigated variations in TRF morphology showed a strong correlation with speech intelligibility, and we were able to predict the individual threshold of 50% speech intelligibility with a mean deviation of less then 1.5 dB. Conclusion The intelligibility of speech in noise can be predicted by analyzing the shape of the TRF derived from different stimulus signal-to-noise ratios. Because TRFs are interpretable, in a manner similar to auditory evoked potentials, this method offers new options for clinical diagnostics.
Collapse
Affiliation(s)
- Jan Muncke
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
| | - Ivine Kuruvila
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany.,WS Audiology, Erlangen, Germany
| | - Ulrich Hoppe
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
| |
Collapse
|
18
|
Nogueira W, Dolhopiatenko H. Predicting speech intelligibility from a selective attention decoding paradigm in cochlear implant users. J Neural Eng 2022; 19. [PMID: 35234663 DOI: 10.1088/1741-2552/ac599f] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 03/01/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVES Electroencephalography (EEG) can be used to decode selective attention in cochlear implant (CI) users. This work investigates if selective attention to an attended speech source in the presence of a concurrent speech source can predict speech understanding in CI users. APPROACH CI users were instructed to attend to one out of two speech streams while EEG was recorded. Both speech streams were presented to the same ear and at different signal to interference ratios (SIRs). Speech envelope reconstruction of the to-be-attended speech from EEG was obtained by training decoders using regularized least squares. The correlation coefficient between the reconstructed and the attended (ρ_(A_SIR )) or the unattended (ρ_(U_SIR )) speech stream at each SIR was computed. Additionally, we computed the difference correlation coefficient at the same 〖(ρ〗_Diff= ρ_(A_SIR )-ρ_(U_SIR )) and opposite SIR (ρ_DiffOpp= ρ_(A_SIR )-ρ_(U_(-SIR) )). ρ_Diff compares the attended and unattended correlation coefficient to speech sources presented at different presentation levels depending on SIR. In contrast, ρ_DiffOpp compares the attended and unattended correlation coefficients to speech sources presented at the same presentation level irrespective of SIR. MAIN RESULTS Selective attention decoding in CI users is possible even if both speech streams are presented monaurally. A significant effect of SIR on ρ_(A_SIR ), ρ_Diff and ρ_DiffOpp, but not on ρ_(U_SIR ), was observed. Finally, the results show a significant correlation between speech understanding performance and ρ_(A_SIR ) as well as with ρ_(U_SIR ) across subjects. Moreover, ρ_DiffOpp which is less affected by the CI artifact, also demonstrated a significant correlation with speech understanding. SIGNIFICANCE Selective attention decoding in CI users is possible, however care needs to be taken with the CI artifact and the speech material used to train the decoders. These results are important for future development of objective speech understanding measures for CI users.
Collapse
Affiliation(s)
- Waldo Nogueira
- Department of Otolaryngology and Cluster of Excellence "Hearing4all", Hannover Medical School, Karl-Wiechert Allee, 3, Hannover, Niedersachsen, 30625, GERMANY
| | - Hanna Dolhopiatenko
- Department of Otolaryngology and Cluster of Excellence "Hearing4all", Hannover Medical School, Karl-Wiechert Allee, 3, Hannover, Niedersachsen, 30625, GERMANY
| |
Collapse
|
19
|
Wang L, Wang Y, Liu Z, Wu EX, Chen F. A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes. Front Neurosci 2022; 15:760611. [PMID: 35221885 PMCID: PMC8866945 DOI: 10.3389/fnins.2021.760611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 12/30/2021] [Indexed: 11/21/2022] Open
Abstract
In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous studies revealed that the root mean square (RMS)–level–based speech segmentation made a great contribution to the target speech perception with the modulation of sustained auditory attention. This study further investigated the effect of the RMS-level–based speech segmentation on the auditory attention decoding (AAD) performance with both sustained and switched attention in the competing speaker auditory scenes. Objective biomarkers derived from the cortical activities were also developed to index the dynamic auditory attention states. In the current study, subjects were asked to concentrate or switch their attention between two competing speaker streams. The neural responses to the higher- and lower-RMS-level speech segments were analyzed via the linear temporal response function (TRF) before and after the attention switching from one to the other speaker stream. Furthermore, the AAD performance decoded by the unified TRF decoding model was compared to that by the speech-RMS-level–based segmented decoding model with the dynamic change of the auditory attention states. The results showed that the weight of the typical TRF component approximately 100-ms time lag was sensitive to the switching of the auditory attention. Compared to the unified AAD model, the segmented AAD model improved attention decoding performance under both the sustained and switched auditory attention modulations in a wide range of signal-to-masker ratios (SMRs). In the competing speaker scenes, the TRF weight and AAD accuracy could be used as effective indicators to detect the changes of the auditory attention. In addition, with a wide range of SMRs (i.e., from 6 to –6 dB in this study), the segmented AAD model showed the robust decoding performance even with short decision window length, suggesting that this speech-RMS-level–based model has the potential to decode dynamic attention states in the realistic auditory scenarios.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Yihan Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Zhixing Liu
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Ed X. Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Fei Chen,
| |
Collapse
|
20
|
Li J, Hong B, Nolte G, Engel AK, Zhang D. Preparatory delta phase response is correlated with naturalistic speech comprehension performance. Cogn Neurodyn 2021; 16:337-352. [PMID: 35401861 PMCID: PMC8934811 DOI: 10.1007/s11571-021-09711-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 07/09/2021] [Accepted: 08/12/2021] [Indexed: 01/07/2023] Open
Abstract
While human speech comprehension is thought to be an active process that involves top-down predictions, it remains unclear how predictive information is used to prepare for the processing of upcoming speech information. We aimed to identify the neural signatures of the preparatory processing of upcoming speech. Participants selectively attended to one of two competing naturalistic, narrative speech streams, and a temporal response function (TRF) method was applied to derive event-related-like neural responses from electroencephalographic data. The phase responses to the attended speech at the delta band (1-4 Hz) were correlated with the comprehension performance of individual participants, with a latency of - 200-0 ms relative to the onset of speech amplitude envelope fluctuations over the fronto-central and left-lateralized parietal electrodes. The phase responses to the attended speech at the alpha band also correlated with comprehension performance but with a latency of 650-980 ms post-onset over the fronto-central electrodes. Distinct neural signatures were found for the attentional modulation, taking the form of TRF-based amplitude responses at a latency of 240-320 ms post-onset over the left-lateralized fronto-central and occipital electrodes. Our findings reveal how the brain gets prepared to process an upcoming speech in a continuous, naturalistic speech context.
Collapse
Affiliation(s)
- Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| | - Bo Hong
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Andreas K. Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| |
Collapse
|
21
|
Sokoliuk R, Degano G, Melloni L, Noppeney U, Cruse D. The Influence of Auditory Attention on Rhythmic Speech Tracking: Implications for Studies of Unresponsive Patients. Front Hum Neurosci 2021; 15:702768. [PMID: 34456697 PMCID: PMC8385206 DOI: 10.3389/fnhum.2021.702768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/21/2021] [Indexed: 11/13/2022] Open
Abstract
Language comprehension relies on integrating words into progressively more complex structures, like phrases and sentences. This hierarchical structure-building is reflected in rhythmic neural activity across multiple timescales in E/MEG in healthy, awake participants. However, recent studies have shown evidence for this “cortical tracking” of higher-level linguistic structures also in a proportion of unresponsive patients. What does this tell us about these patients’ residual levels of cognition and consciousness? Must the listener direct their attention toward higher level speech structures to exhibit cortical tracking, and would selective attention across levels of the hierarchy influence the expression of these rhythms? We investigated these questions in an EEG study of 72 healthy human volunteers listening to streams of monosyllabic isochronous English words that were either unrelated (scrambled condition) or composed of four-word-sequences building meaningful sentences (sentential condition). Importantly, there were no physical cues between four-word-sentences. Rather, boundaries were marked by syntactic structure and thematic role assignment. Participants were divided into three attention groups: from passive listening (passive group) to attending to individual words (word group) or sentences (sentence group). The passive and word groups were initially naïve to the sentential stimulus structure, while the sentence group was not. We found significant tracking at word- and sentence rate across all three groups, with sentence tracking linked to left middle temporal gyrus and right superior temporal gyrus. Goal-directed attention to words did not enhance word-rate-tracking, suggesting that word tracking here reflects largely automatic mechanisms, as was shown for tracking at the syllable-rate before. Importantly, goal-directed attention to sentences relative to words significantly increased sentence-rate-tracking over left inferior frontal gyrus. This attentional modulation of rhythmic EEG activity at the sentential rate highlights the role of attention in integrating individual words into complex linguistic structures. Nevertheless, given the presence of high-level cortical tracking under conditions of lower attentional effort, our findings underline the suitability of the paradigm in its clinical application in patients after brain injury. The neural dissociation between passive tracking of sentences and directed attention to sentences provides a potential means to further characterise the cognitive state of each unresponsive patient.
Collapse
Affiliation(s)
- Rodika Sokoliuk
- School of Psychology, University of Birmingham, Birmingham, United Kingdom.,Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| | - Giulio Degano
- School of Psychology, University of Birmingham, Birmingham, United Kingdom.,Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom.,Brain and Language Lab, Department of Psychology, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland
| | - Lucia Melloni
- Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany.,Department of Neurology, New York University, New York City, NY, United States
| | - Uta Noppeney
- Donders Centre for Cognitive Neuroimaging, Nijmegen, Netherlands.,Department of Biophysics, Radboud University, Nijmegen, Netherlands
| | - Damian Cruse
- School of Psychology, University of Birmingham, Birmingham, United Kingdom.,Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
22
|
Chang A, Bedoin N, Canette LH, Nozaradan S, Thompson D, Corneyllie A, Tillmann B, Trainor LJ. Atypical beta power fluctuation while listening to an isochronous sequence in dyslexia. Clin Neurophysiol 2021; 132:2384-2390. [PMID: 34454265 DOI: 10.1016/j.clinph.2021.05.037] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 04/22/2021] [Accepted: 05/31/2021] [Indexed: 11/29/2022]
Abstract
OBJECTIVE Developmental dyslexia is a reading disorder that features difficulties in perceiving and tracking rhythmic regularities in auditory streams, such as speech and music. Studies on typical healthy participants have shown that power fluctuations of neural oscillations in beta band (15-25 Hz) reflect an essential mechanism for tracking rhythm or entrainment and relate to predictive timing and attentional processes. Here we investigated whether adults with dyslexia have atypical beta power fluctuation. METHODS The electroencephalographic activities of individuals with dyslexia (n = 13) and typical control participants (n = 13) were measured while they passively listened to an isochronous tone sequence (2 Hz presentation rate). The time-frequency neural activities generated from auditory cortices were analyzed. RESULTS The phase of beta power fluctuation at the 2 Hz stimulus presentation rate differed and appeared opposite between individuals with dyslexia and controls. CONCLUSIONS Atypical beta power fluctuation might reflect deficits in perceiving and tracking auditory rhythm in dyslexia. SIGNIFICANCE These findings extend our understanding of atypical neural activities for tracking rhythm in dyslexia and could inspire novel methods to objectively measure the benefits of training, and predict potential benefit of auditory rhythmic rehabilitation programs on an individual basis.
Collapse
Affiliation(s)
- Andrew Chang
- Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Nathalie Bedoin
- CNRS, UMR5292, INSERM, U1028, Lyon Neuroscience Research Center, IMPACT Team, Bron, France; University Lyon 1, Villeurbanne, France; University Lyon 2, Bron, France
| | - Laure-Helene Canette
- University Lyon 1, Villeurbanne, France; CNRS, UMR5292, INSERM, U1028, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team, Bron, France
| | - Sylvie Nozaradan
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Locked Bag 1797, Penrith, NSW 2751, Australia; Institute of Neuroscience (IONS), Université catholique de Louvain (UCL), Avenue Mounier 53, Woluwe-Saint-Lambert, 1200, Belgium
| | - Dave Thompson
- Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, ON L8S 4K1, Canada; McMaster Institute for Music and the Mind, McMaster University, Hamilton, ON L8S 4K1, Canada; Rotman Research Institute, Baycrest Hospital, Toronto, ON M6A 2E1, Canada
| | - Alexandra Corneyllie
- University Lyon 1, Villeurbanne, France; CNRS, UMR5292, INSERM, U1028, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team, Bron, France
| | - Barbara Tillmann
- University Lyon 1, Villeurbanne, France; CNRS, UMR5292, INSERM, U1028, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team, Bron, France.
| | - Laurel J Trainor
- Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, ON L8S 4K1, Canada; McMaster Institute for Music and the Mind, McMaster University, Hamilton, ON L8S 4K1, Canada; Rotman Research Institute, Baycrest Hospital, Toronto, ON M6A 2E1, Canada.
| |
Collapse
|
23
|
Bonacci LM, Bressler S, Shinn-Cunningham BG. Nonspatial Features Reduce the Reliance on Sustained Spatial Auditory Attention. Ear Hear 2021; 41:1635-1647. [PMID: 33136638 PMCID: PMC9831360 DOI: 10.1097/aud.0000000000000879] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
OBJECTIVE Top-down spatial attention is effective at selecting a target sound from a mixture. However, nonspatial features often distinguish sources in addition to location. This study explores whether redundant nonspatial features are used to maintain selective auditory attention for a spatially defined target. DESIGN We recorded electroencephalography while subjects focused attention on one of three simultaneous melodies. In one experiment, subjects (n = 17) were given an auditory cue indicating both the location and pitch of the target melody. In a second experiment (n = 17 subjects), the cue only indicated target location, and we compared two conditions: one in which the pitch separation of competing melodies was large, and one in which this separation was small. RESULTS In both experiments, responses evoked by onsets of events in sound streams were modulated by attention, and we found no significant difference in this modulation between small and large pitch separation conditions. Therefore, the evoked response reflected that target stimuli were the focus of attention, and distractors were suppressed successfully for all experimental conditions. In all cases, parietal alpha was lateralized following the cue, but before melody onset, indicating that subjects initially focused attention in space. During the stimulus presentation, this lateralization disappeared when pitch cues were strong but remained significant when pitch cues were weak, suggesting that strong pitch cues reduced reliance on sustained spatial attention. CONCLUSIONS These results demonstrate that once a well-defined target stream at a known location is selected, top-down spatial attention plays a weak role in filtering out a segregated competing stream.
Collapse
Affiliation(s)
- Lia M. Bonacci
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Scott Bressler
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Barbara G. Shinn-Cunningham
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
24
|
Verschueren E, Vanthornhout J, Francart T. The Effect of Stimulus Choice on an EEG-Based Objective Measure of Speech Intelligibility. Ear Hear 2021; 41:1586-1597. [PMID: 33136634 DOI: 10.1097/aud.0000000000000875] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
OBJECTIVES Recently, an objective measure of speech intelligibility (SI), based on brain responses derived from the electroencephalogram (EEG), has been developed using isolated Matrix sentences as a stimulus. We investigated whether this objective measure of SI can also be used with natural speech as a stimulus, as this would be beneficial for clinical applications. DESIGN We recorded the EEG in 19 normal-hearing participants while they listened to two types of stimuli: Matrix sentences and a natural story. Each stimulus was presented at different levels of SI by adding speech weighted noise. SI was assessed in two ways for both stimuli: (1) behaviorally and (2) objectively by reconstructing the speech envelope from the EEG using a linear decoder and correlating it with the acoustic envelope. We also calculated temporal response functions (TRFs) to investigate the temporal characteristics of the brain responses in the EEG channels covering different brain areas. RESULTS For both stimulus types, the correlation between the speech envelope and the reconstructed envelope increased with increasing SI. In addition, correlations were higher for the natural story than for the Matrix sentences. Similar to the linear decoder analysis, TRF amplitudes increased with increasing SI for both stimuli. Remarkable is that although SI remained unchanged under the no-noise and +2.5 dB SNR conditions, neural speech processing was affected by the addition of this small amount of noise: TRF amplitudes across the entire scalp decreased between 0 and 150 ms, while amplitudes between 150 and 200 ms increased in the presence of noise. TRF latency changes in function of SI appeared to be stimulus specific: the latency of the prominent negative peak in the early responses (50 to 300 ms) increased with increasing SI for the Matrix sentences, but remained unchanged for the natural story. CONCLUSIONS These results show (1) the feasibility of natural speech as a stimulus for the objective measure of SI; (2) that neural tracking of speech is enhanced using a natural story compared to Matrix sentences; and (3) that noise and the stimulus type can change the temporal characteristics of the brain responses. These results might reflect the integration of incoming acoustic features and top-down information, suggesting that the choice of the stimulus has to be considered based on the intended purpose of the measurement.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology (ExpORL), Department of Neurosciences, KU Leuven-University of Leuven, Leuven, Belgium
| | | | | |
Collapse
|
25
|
Rosenkranz M, Holtze B, Jaeger M, Debener S. EEG-Based Intersubject Correlations Reflect Selective Attention in a Competing Speaker Scenario. Front Neurosci 2021; 15:685774. [PMID: 34194296 PMCID: PMC8236636 DOI: 10.3389/fnins.2021.685774] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 05/18/2021] [Indexed: 11/13/2022] Open
Abstract
Several solutions have been proposed to study the relationship between ongoing brain activity and natural sensory stimuli, such as running speech. Computing the intersubject correlation (ISC) has been proposed as one possible approach. Previous evidence suggests that ISCs between the participants' electroencephalogram (EEG) may be modulated by attention. The current study addressed this question in a competing-speaker paradigm, where participants (N = 41) had to attend to one of two concurrently presented speech streams. ISCs between participants' EEG were higher for participants attending to the same story compared to participants attending to different stories. Furthermore, we found that ISCs between individual and group data predicted whether an individual attended to the left or right speech stream. Interestingly, the magnitude of the shared neural response with others attending to the same story was related to the individual neural representation of the attended and ignored speech envelope. Overall, our findings indicate that ISC differences reflect the magnitude of selective attentional engagement to speech.
Collapse
Affiliation(s)
- Marc Rosenkranz
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Division Hearing, Fraunhofer Institute for Digital Media Technology IDMT, Speech and Audio Technology, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
26
|
Mesik J, Ray L, Wojtczak M. Effects of Age on Cortical Tracking of Word-Level Features of Continuous Competing Speech. Front Neurosci 2021; 15:635126. [PMID: 33867920 PMCID: PMC8047075 DOI: 10.3389/fnins.2021.635126] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 03/12/2021] [Indexed: 01/17/2023] Open
Abstract
Speech-in-noise comprehension difficulties are common among the elderly population, yet traditional objective measures of speech perception are largely insensitive to this deficit, particularly in the absence of clinical hearing loss. In recent years, a growing body of research in young normal-hearing adults has demonstrated that high-level features related to speech semantics and lexical predictability elicit strong centro-parietal negativity in the EEG signal around 400 ms following the word onset. Here we investigate effects of age on cortical tracking of these word-level features within a two-talker speech mixture, and their relationship with self-reported difficulties with speech-in-noise understanding. While undergoing EEG recordings, younger and older adult participants listened to a continuous narrative story in the presence of a distractor story. We then utilized forward encoding models to estimate cortical tracking of four speech features: (1) word onsets, (2) "semantic" dissimilarity of each word relative to the preceding context, (3) lexical surprisal for each word, and (4) overall word audibility. Our results revealed robust tracking of all features for attended speech, with surprisal and word audibility showing significantly stronger contributions to neural activity than dissimilarity. Additionally, older adults exhibited significantly stronger tracking of word-level features than younger adults, especially over frontal electrode sites, potentially reflecting increased listening effort. Finally, neuro-behavioral analyses revealed trends of a negative relationship between subjective speech-in-noise perception difficulties and the model goodness-of-fit for attended speech, as well as a positive relationship between task performance and the goodness-of-fit, indicating behavioral relevance of these measures. Together, our results demonstrate the utility of modeling cortical responses to multi-talker speech using complex, word-level features and the potential for their use to study changes in speech processing due to aging and hearing loss.
Collapse
Affiliation(s)
- Juraj Mesik
- Department of Psychology, University of Minnesota, Minneapolis, MN, United States
| | | | | |
Collapse
|
27
|
Holtze B, Jaeger M, Debener S, Adiloğlu K, Mirkovic B. Are They Calling My Name? Attention Capture Is Reflected in the Neural Tracking of Attended and Ignored Speech. Front Neurosci 2021; 15:643705. [PMID: 33828451 PMCID: PMC8019946 DOI: 10.3389/fnins.2021.643705] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/19/2021] [Indexed: 11/15/2022] Open
Abstract
Difficulties in selectively attending to one among several speakers have mainly been associated with the distraction caused by ignored speech. Thus, in the current study, we investigated the neural processing of ignored speech in a two-competing-speaker paradigm. For this, we recorded the participant’s brain activity using electroencephalography (EEG) to track the neural representation of the attended and ignored speech envelope. To provoke distraction, we occasionally embedded the participant’s first name in the ignored speech stream. Retrospective reports as well as the presence of a P3 component in response to the name indicate that participants noticed the occurrence of their name. As predicted, the neural representation of the ignored speech envelope increased after the name was presented therein, suggesting that the name had attracted the participant’s attention. Interestingly, in contrast to our hypothesis, the neural tracking of the attended speech envelope also increased after the name occurrence. On this account, we conclude that the name might not have primarily distracted the participants, at most for a brief duration, but that it alerted them to focus to their actual task. These observations remained robust even when the sound intensity of the ignored speech stream, and thus the sound intensity of the name, was attenuated.
Collapse
Affiliation(s)
- Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Kamil Adiloğlu
- Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,HörTech gGmbH, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
28
|
Luo C, Ding N. Cortical encoding of acoustic and linguistic rhythms in spoken narratives. eLife 2020; 9:60433. [PMID: 33345775 PMCID: PMC7775109 DOI: 10.7554/elife.60433] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 12/20/2020] [Indexed: 11/13/2022] Open
Abstract
Speech contains rich acoustic and linguistic information. Using highly controlled speech materials, previous studies have demonstrated that cortical activity is synchronous to the rhythms of perceived linguistic units, for example, words and phrases, on top of basic acoustic features, for example, the speech envelope. When listening to natural speech, it remains unclear, however, how cortical activity jointly encodes acoustic and linguistic information. Here we investigate the neural encoding of words using electroencephalography and observe neural activity synchronous to multi-syllabic words when participants naturally listen to narratives. An amplitude modulation (AM) cue for word rhythm enhances the word-level response, but the effect is only observed during passive listening. Furthermore, words and the AM cue are encoded by spatially separable neural responses that are differentially modulated by attention. These results suggest that bottom-up acoustic cues and top-down linguistic knowledge separately contribute to cortical encoding of linguistic units in spoken narratives.
Collapse
Affiliation(s)
- Cheng Luo
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China.,Research Center for Advanced Artificial Intelligence Theory, Zhejiang Lab, Hangzhou, China
| |
Collapse
|
29
|
Wang L, Wu EX, Chen F. Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions. Front Hum Neurosci 2020; 14:557534. [PMID: 33132874 PMCID: PMC7576187 DOI: 10.3389/fnhum.2020.557534] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/09/2020] [Indexed: 11/25/2022] Open
Abstract
The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
30
|
Alickovic E, Lunner T, Wendt D, Fiedler L, Hietkamp R, Ng EHN, Graversen C. Neural Representation Enhanced for Speech and Reduced for Background Noise With a Hearing Aid Noise Reduction Scheme During a Selective Attention Task. Front Neurosci 2020; 14:846. [PMID: 33071722 PMCID: PMC7533612 DOI: 10.3389/fnins.2020.00846] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 07/20/2020] [Indexed: 12/23/2022] Open
Abstract
Objectives Selectively attending to a target talker while ignoring multiple interferers (competing talkers and background noise) is more difficult for hearing-impaired (HI) individuals compared to normal-hearing (NH) listeners. Such tasks also become more difficult as background noise levels increase. To overcome these difficulties, hearing aids (HAs) offer noise reduction (NR) schemes. The objective of this study was to investigate the effect of NR processing (inactive, where the NR feature was switched off, vs. active, where the NR feature was switched on) on the neural representation of speech envelopes across two different background noise levels [+3 dB signal-to-noise ratio (SNR) and +8 dB SNR] by using a stimulus reconstruction (SR) method. Design To explore how NR processing supports the listeners’ selective auditory attention, we recruited 22 HI participants fitted with HAs. To investigate the interplay between NR schemes, background noise, and neural representation of the speech envelopes, we used electroencephalography (EEG). The participants were instructed to listen to a target talker in front while ignoring a competing talker in front in the presence of multi-talker background babble noise. Results The results show that the neural representation of the attended speech envelope was enhanced by the active NR scheme for both background noise levels. The neural representation of the attended speech envelope at lower (+3 dB) SNR was shifted, approximately by 5 dB, toward the higher (+8 dB) SNR when the NR scheme was turned on. The neural representation of the ignored speech envelope was modulated by the NR scheme and was mostly enhanced in the conditions with more background noise. The neural representation of the background noise was modulated (i.e., reduced) by the NR scheme and was significantly reduced in the conditions with more background noise. The neural representation of the net sum of the ignored acoustic scene (ignored talker and background babble) was not modulated by the NR scheme but was significantly reduced in the conditions with a reduced level of background noise. Taken together, we showed that the active NR scheme enhanced the neural representation of both the attended and the ignored speakers and reduced the neural representation of background noise, while the net sum of the ignored acoustic scene was not enhanced. Conclusion Altogether our results support the hypothesis that the NR schemes in HAs serve to enhance the neural representation of speech and reduce the neural representation of background noise during a selective attention task. We contend that these results provide a neural index that could be useful for assessing the effects of HAs on auditory and cognitive processing in HI populations.
Collapse
Affiliation(s)
- Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Electrical Engineering, Linkoping University, Linköping, Sweden
| | - Thomas Lunner
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Electrical Engineering, Linkoping University, Linköping, Sweden.,Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.,Department of Behavioral Sciences and Learning, Linkoping University, Linköping, Sweden
| | - Dorothea Wendt
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Lorenz Fiedler
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | | | - Elaine Hoi Ning Ng
- Department of Behavioral Sciences and Learning, Linkoping University, Linköping, Sweden.,Oticon A/S, Smørum, Denmark
| | | |
Collapse
|
31
|
Jaeger M, Mirkovic B, Bleichner MG, Debener S. Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening. Front Neurosci 2020; 14:603. [PMID: 32612507 PMCID: PMC7308709 DOI: 10.3389/fnins.2020.00603] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 05/15/2020] [Indexed: 11/13/2022] Open
Abstract
Listeners differ in their ability to attend to a speech stream in the presence of a competing sound. Differences in speech intelligibility in noise cannot be fully explained by the hearing ability which suggests the involvement of additional cognitive factors. A better understanding of the temporal fluctuations in the ability to pay selective auditory attention to a desired speech stream may help in explaining these variabilities. In order to better understand the temporal dynamics of selective auditory attention, we developed an online auditory attention decoding (AAD) processing pipeline based on speech envelope tracking in the electroencephalogram (EEG). Participants had to attend to one audiobook story while a second one had to be ignored. Online AAD was applied to track the attention toward the target speech signal. Individual temporal attention profiles were computed by combining an established AAD method with an adaptive staircase procedure. The individual decoding performance over time was analyzed and linked to behavioral performance as well as subjective ratings of listening effort, motivation, and fatigue. The grand average attended speaker decoding profile derived in the online experiment indicated performance above chance level. Parameters describing the individual AAD performance in each testing block indicated significant differences in decoding performance over time to be closely related to the behavioral performance in the selective listening task. Further, an exploratory analysis indicated that subjects with poor decoding performance reported higher listening effort and fatigue compared to good performers. Taken together our results show that online EEG based AAD in a complex listening situation is feasible. Adaptive attended speaker decoding profiles over time could be used as an objective measure of behavioral performance and listening effort. The developed online processing pipeline could also serve as a basis for future EEG based near real-time auditory neurofeedback systems.
Collapse
Affiliation(s)
- Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Neurophysiology of Everyday Life Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
32
|
Wang Y, Zhang J, Zou J, Luo H, Ding N. Prior Knowledge Guides Speech Segregation in Human Auditory Cortex. Cereb Cortex 2020; 29:1561-1571. [PMID: 29788144 DOI: 10.1093/cercor/bhy052] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/22/2018] [Accepted: 02/15/2018] [Indexed: 11/12/2022] Open
Abstract
Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.
Collapse
Affiliation(s)
- Yuanye Wang
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Jianfeng Zhang
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jiajie Zou
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Huan Luo
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Nai Ding
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China.,Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, Zhejiang, China.,State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou, Zhejiang, China.,Interdisciplinary Center for Social Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
33
|
Paul BT, Uzelac M, Chan E, Dimitrijevic A. Poor early cortical differentiation of speech predicts perceptual difficulties of severely hearing-impaired listeners in multi-talker environments. Sci Rep 2020; 10:6141. [PMID: 32273536 PMCID: PMC7145807 DOI: 10.1038/s41598-020-63103-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/24/2020] [Indexed: 11/23/2022] Open
Abstract
Hearing impairment disrupts processes of selective attention that help listeners attend to one sound source over competing sounds in the environment. Hearing prostheses (hearing aids and cochlear implants, CIs), do not fully remedy these issues. In normal hearing, mechanisms of selective attention arise through the facilitation and suppression of neural activity that represents sound sources. However, it is unclear how hearing impairment affects these neural processes, which is key to understanding why listening difficulty remains. Here, severely-impaired listeners treated with a CI, and age-matched normal-hearing controls, attended to one of two identical but spatially separated talkers while multichannel EEG was recorded. Whereas neural representations of attended and ignored speech were differentiated at early (~ 150 ms) cortical processing stages in controls, differentiation of talker representations only occurred later (~250 ms) in CI users. CI users, but not controls, also showed evidence for spatial suppression of the ignored talker through lateralized alpha (7-14 Hz) oscillations. However, CI users' perceptual performance was only predicted by early-stage talker differentiation. We conclude that multi-talker listening difficulty remains for impaired listeners due to deficits in early-stage separation of cortical speech representations, despite neural evidence that they use spatial information to guide selective attention.
Collapse
Affiliation(s)
- Brandon T Paul
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada.
- Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Centre, Toronto, ON, M4N 3M5, Canada.
| | - Mila Uzelac
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada
| | - Emmanuel Chan
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada
| | - Andrew Dimitrijevic
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada.
- Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Centre, Toronto, ON, M4N 3M5, Canada.
- Faculty of Medicine, Otolaryngology-Head and Neck Surgery, University of Toronto, Toronto, ON, M5S 1A1, Canada.
| |
Collapse
|
34
|
Cortical auditory responses index the contributions of different RMS-level-dependent segments to speech intelligibility. Hear Res 2019; 383:107808. [DOI: 10.1016/j.heares.2019.107808] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 09/17/2019] [Accepted: 10/01/2019] [Indexed: 10/25/2022]
|
35
|
Vanthornhout J, Decruy L, Francart T. Effect of Task and Attention on Neural Tracking of Speech. Front Neurosci 2019; 13:977. [PMID: 31607841 PMCID: PMC6756133 DOI: 10.3389/fnins.2019.00977] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 08/30/2019] [Indexed: 12/02/2022] Open
Abstract
EEG-based measures of neural tracking of natural running speech are becoming increasingly popular to investigate neural processing of speech and have applications in audiology. When the stimulus is a single speaker, it is usually assumed that the listener actively attends to and understands the stimulus. However, as the level of attention of the listener is inherently variable, we investigated how this affected neural envelope tracking. Using a movie as a distractor, we varied the level of attention while we estimated neural envelope tracking. We varied the intelligibility level by adding stationary noise. We found a significant difference in neural envelope tracking between the condition with maximal attention and the movie condition. This difference was most pronounced in the right-frontal region of the brain. The degree of neural envelope tracking was highly correlated with the stimulus signal-to-noise ratio, even in the movie condition. This could be due to residual neural resources to passively attend to the stimulus. When envelope tracking is used to measure speech understanding objectively, this means that the procedure can be made more enjoyable and feasible by letting participants watch a movie during stimulus presentation.
Collapse
Affiliation(s)
| | - Lien Decruy
- Department of Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| |
Collapse
|
36
|
Lesenfants D, Vanthornhout J, Verschueren E, Decruy L, Francart T. Predicting individual speech intelligibility from the cortical tracking of acoustic- and phonetic-level speech representations. Hear Res 2019; 380:1-9. [DOI: 10.1016/j.heares.2019.05.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 05/20/2019] [Accepted: 05/21/2019] [Indexed: 10/26/2022]
|
37
|
Xie Z, Reetzke R, Chandrasekaran B. Machine Learning Approaches to Analyze Speech-Evoked Neurophysiological Responses. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:587-601. [PMID: 30950746 PMCID: PMC6802895 DOI: 10.1044/2018_jslhr-s-astm-18-0244] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 10/28/2018] [Accepted: 11/26/2018] [Indexed: 05/27/2023]
Abstract
Purpose Speech-evoked neurophysiological responses are often collected to answer clinically and theoretically driven questions concerning speech and language processing. Here, we highlight the practical application of machine learning (ML)-based approaches to analyzing speech-evoked neurophysiological responses. Method Two categories of ML-based approaches are introduced: decoding models, which generate a speech stimulus output using the features from the neurophysiological responses, and encoding models, which use speech stimulus features to predict neurophysiological responses. In this review, we focus on (a) a decoding model classification approach, wherein speech-evoked neurophysiological responses are classified as belonging to 1 of a finite set of possible speech events (e.g., phonological categories), and (b) an encoding model temporal response function approach, which quantifies the transformation of a speech stimulus feature to continuous neural activity. Results We illustrate the utility of the classification approach to analyze early electroencephalographic (EEG) responses to Mandarin lexical tone categories from a traditional experimental design, and to classify EEG responses to English phonemes evoked by natural continuous speech (i.e., an audiobook) into phonological categories (plosive, fricative, nasal, and vowel). We also demonstrate the utility of temporal response function to predict EEG responses to natural continuous speech from acoustic features. Neural metrics from the 3 examples all exhibit statistically significant effects at the individual level. Conclusion We propose that ML-based approaches can complement traditional analysis approaches to analyze neurophysiological responses to speech signals and provide a deeper understanding of natural speech and language processing using ecologically valid paradigms in both typical and clinical populations.
Collapse
Affiliation(s)
- Zilong Xie
- Department of Communication Sciences and Disorders, The University of Texas at Austin
| | - Rachel Reetzke
- Department of Communication Sciences and Disorders, The University of Texas at Austin
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh
| |
Collapse
|
38
|
Müller JA, Wendt D, Kollmeier B, Debener S, Brand T. Effect of Speech Rate on Neural Tracking of Speech. Front Psychol 2019; 10:449. [PMID: 30906273 PMCID: PMC6418035 DOI: 10.3389/fpsyg.2019.00449] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 02/14/2019] [Indexed: 12/03/2022] Open
Abstract
Speech comprehension requires effort in demanding listening situations. Selective attention may be required for focusing on a specific talker in a multi-talker environment, may enhance effort by requiring additional cognitive resources, and is known to enhance the neural representation of the attended talker in the listener's neural response. The aim of the study was to investigate the relation of listening effort, as quantified by subjective effort ratings and pupil dilation, and neural speech tracking during sentence recognition. Task demands were varied using sentences with varying levels of linguistic complexity and using two different speech rates in a picture-matching paradigm with 20 normal-hearing listeners. The participants' task was to match the acoustically presented sentence with a picture presented before the acoustic stimulus. Afterwards they rated their perceived effort on a categorical effort scale. During each trial, pupil dilation (as an indicator of listening effort) and electroencephalogram (as an indicator of neural speech tracking) were recorded. Neither measure was significantly affected by linguistic complexity. However, speech rate showed a strong influence on subjectively rated effort, pupil dilation, and neural tracking. The neural tracking analysis revealed a shorter latency for faster sentences, which may reflect a neural adaptation to the rate of the input. No relation was found between neural tracking and listening effort, even though both measures were clearly influenced by speech rate. This is probably due to factors that influence both measures differently. Consequently, the amount of listening effort is not clearly represented in the neural tracking.
Collapse
Affiliation(s)
- Jana Annina Müller
- Cluster of Excellence ‘Hearing4all’, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Medizinische Physik, Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Dorothea Wendt
- Hearing Systems, Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
- Eriksholm Research Centre, Snekkersten, Denmark
| | - Birger Kollmeier
- Cluster of Excellence ‘Hearing4all’, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Medizinische Physik, Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Cluster of Excellence ‘Hearing4all’, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Neuropsychology Lab, Department of Psychology, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Thomas Brand
- Cluster of Excellence ‘Hearing4all’, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Medizinische Physik, Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
39
|
Hambrook DA, Tata MS. The effects of distractor set-size on neural tracking of attended speech. BRAIN AND LANGUAGE 2019; 190:1-9. [PMID: 30616147 DOI: 10.1016/j.bandl.2018.12.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 11/19/2018] [Accepted: 12/19/2018] [Indexed: 06/09/2023]
Abstract
Attention is crucial to speech comprehension in real-world, noisy environments. Selective phase-tracking between low-frequency brain dynamics and the envelope of target speech is a proposed mechanism to reject competing distractors. Studies have supported this theory in the case of a single distractor, but have not considered how tracking is systematically affected by varying distractor set sizes. We recorded electroencephalography (EEG) during selective listening to both natural and vocoded speech as distractor set-size varied from two to six voices. Increasing set-size reduced performance and attenuated EEG tracking of target speech. Further, we found that intrusions of distractor speech into perception were not accompanied by sustained tracking of the distractor stream. Our results support the theory that tracking of speech dynamics is a mechanism for selective attention, and that the mechanism of distraction is not simple stimulus-driven capture of sustained entrainment of auditory mechanisms by the acoustics of distracting speech.
Collapse
Affiliation(s)
- Dillon A Hambrook
- The University of Lethbridge, 4401 University Drive, Lethbridge, Alberta T1K 3M4, Canada.
| | - Matthew S Tata
- The University of Lethbridge, 4401 University Drive, Lethbridge, Alberta T1K 3M4, Canada
| |
Collapse
|
40
|
Kumagai Y, Matsui R, Tanaka T. Music Familiarity Affects EEG Entrainment When Little Attention Is Paid. Front Hum Neurosci 2018; 12:444. [PMID: 30459583 PMCID: PMC6232314 DOI: 10.3389/fnhum.2018.00444] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 10/16/2018] [Indexed: 11/21/2022] Open
Abstract
To investigate the brain's response to music, many researchers have examined cortical entrainment in relation to periodic tunes, periodic beats, and music. Music familiarity is another factor that affects cortical entrainment, and electroencephalogram (EEG) studies have shown that stronger entrainment occurs while listening to unfamiliar music than while listening to familiar music. In the present study, we hypothesized that not only the level of familiarity but also the level of attention affects the level of entrainment. We simultaneously presented music and a silent movie to participants and we recorded an EEG while participants paid attention to either the music or the movie in order to investigate whether cortical entrainment is related to attention and music familiarity. The average cross-correlation function across channels, trials, and participants exhibited a pronounced positive peak at time lags around 130 ms and a negative peak at time lags around 260 ms. The statistical analysis of the two peaks revealed that the level of attention did not affect the level of entrainment, and, moreover, that in both the auditory-active and visual-active conditions, the entrainment level is stronger when listening to unfamiliar music than when listening to familiar music. This may indicate that the familiarity with music affects cortical activities when attention is not fully devoted to listening to music.
Collapse
Affiliation(s)
- Yuiko Kumagai
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Ryosuke Matsui
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Toshihisa Tanaka
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan.,RIKEN Center for Brain Science, Saitama, Japan.,RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| |
Collapse
|
41
|
Fiedler L, Wöstmann M, Herbst SK, Obleser J. Late cortical tracking of ignored speech facilitates neural selectivity in acoustically challenging conditions. Neuroimage 2018; 186:33-42. [PMID: 30367953 DOI: 10.1016/j.neuroimage.2018.10.057] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 09/12/2018] [Accepted: 10/21/2018] [Indexed: 11/25/2022] Open
Abstract
Listening requires selective neural processing of the incoming sound mixture, which in humans is borne out by a surprisingly clean representation of attended-only speech in auditory cortex. How this neural selectivity is achieved even at negative signal-to-noise ratios (SNR) remains unclear. We show that, under such conditions, a late cortical representation (i.e., neural tracking) of the ignored acoustic signal is key to successful separation of attended and distracting talkers (i.e., neural selectivity). We recorded and modeled the electroencephalographic response of 18 participants who attended to one of two simultaneously presented stories, while the SNR between the two talkers varied dynamically between +6 and -6 dB. The neural tracking showed an increasing early-to-late attention-biased selectivity. Importantly, acoustically dominant (i.e., louder) ignored talkers were tracked neurally by late involvement of fronto-parietal regions, which contributed to enhanced neural selectivity. This neural selectivity, by way of representing the ignored talker, poses a mechanistic neural account of attention under real-life acoustic conditions.
Collapse
Affiliation(s)
- Lorenz Fiedler
- Department of Psychology, University of Lübeck, Lübeck, Germany.
| | - Malte Wöstmann
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Sophie K Herbst
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
42
|
Gandras K, Grimm S, Bendixen A. Electrophysiological Correlates of Speaker Segregation and Foreground-Background Selection in Ambiguous Listening Situations. Neuroscience 2018; 389:19-29. [PMID: 28735101 DOI: 10.1016/j.neuroscience.2017.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 07/10/2017] [Accepted: 07/10/2017] [Indexed: 11/15/2022]
Abstract
In everyday listening environments, a main task for our auditory system is to follow one out of multiple speakers talking simultaneously. The present study was designed to find electrophysiological indicators of two central processes involved - segregating the speech mixture into distinct speech sequences corresponding to the two speakers, and then attending to one of the speech sequences. We generated multistable speech stimuli that were set up to create ambiguity as to whether only one or two speakers are talking. Thereby we were able to investigate three perceptual alternatives (no segregation, segregated - speaker A in the foreground, segregated - speaker B in the foreground) without any confounding stimulus changes. Participants listened to a continuously repeating sequence of syllables, which were uttered alternately by two human speakers, and indicated whether they perceived the sequence as an inseparable mixture or as originating from two separate speakers. In the latter case, they distinguished which speaker was in their attentional foreground. Our data show a long-lasting event-related potential (ERP) modulation starting at 130ms after stimulus onset, which can be explained by the perceptual organization of the two speech sequences into attended foreground and ignored background streams. Our paradigm extends previous work with pure-tone sequences toward speech stimuli and adds the possibility to obtain neural correlates of the difficulty to segregate a speech mixture into distinct streams.
Collapse
Affiliation(s)
- Katharina Gandras
- Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany.
| | - Sabine Grimm
- Department of Physics, School of Natural Sciences, Chemnitz University of Technology, D-09126 Chemnitz, Germany.
| | - Alexandra Bendixen
- Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany; Department of Physics, School of Natural Sciences, Chemnitz University of Technology, D-09126 Chemnitz, Germany.
| |
Collapse
|
43
|
Holt LL, Tierney AT, Guerra G, Laffere A, Dick F. Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing. Hear Res 2018; 366:50-64. [PMID: 30131109 PMCID: PMC6107307 DOI: 10.1016/j.heares.2018.06.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/10/2018] [Accepted: 06/19/2018] [Indexed: 12/24/2022]
Abstract
The contribution of acoustic dimensions to an auditory percept is dynamically adjusted and reweighted based on prior experience about how informative these dimensions are across the long-term and short-term environment. This is especially evident in speech perception, where listeners differentially weight information across multiple acoustic dimensions, and use this information selectively to update expectations about future sounds. The dynamic and selective adjustment of how acoustic input dimensions contribute to perception has made it tempting to conceive of this as a form of non-spatial auditory selective attention. Here, we review several human speech perception phenomena that might be consistent with auditory selective attention although, as of yet, the literature does not definitively support a mechanistic tie. We relate these human perceptual phenomena to illustrative nonhuman animal neurobiological findings that offer informative guideposts in how to test mechanistic connections. We next present a novel empirical approach that can serve as a methodological bridge from human research to animal neurobiological studies. Finally, we describe four preliminary results that demonstrate its utility in advancing understanding of human non-spatial dimension-based auditory selective attention.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, 15213, USA; Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Giada Guerra
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK
| | - Frederic Dick
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK; Department of Experimental Psychology, University College London, London, WC1H 0AP, UK
| |
Collapse
|
44
|
Olguin A, Bekinschtein TA, Bozic M. Neural Encoding of Attended Continuous Speech under Different Types of Interference. J Cogn Neurosci 2018; 30:1606-1619. [PMID: 30004849 DOI: 10.1162/jocn_a_01303] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We examined how attention modulates the neural encoding of continuous speech under different types of interference. In an EEG experiment, participants attended to a narrative in English while ignoring a competing stream in the other ear. Four different types of interference were presented to the unattended ear: a different English narrative, a narrative in a language unknown to the listener (Spanish), a well-matched nonlinguistic acoustic interference (Musical Rain), and no interference. Neural encoding of attended and unattended signals was assessed by calculating cross-correlations between their respective envelopes and the EEG recordings. Findings revealed more robust neural encoding for the attended envelopes compared with the ignored ones. Critically, however, the type of the interfering stream significantly modulated this process, with the fully intelligible distractor (English) causing the strongest encoding of both attended and unattended streams and latest dissociation between them and nonintelligible distractors causing weaker encoding and early dissociation between attended and unattended streams. The results were consistent over the time course of the spoken narrative. These findings suggest that attended and unattended information can be differentiated at different depths of processing analysis, with the locus of selective attention determined by the nature of the competing stream. They provide strong support to flexible accounts of auditory selective attention.
Collapse
|
45
|
Haghighi M, Moghadamfalahi M, Akcakaya M, Shinn-Cunningham BG, Erdogmus D. A Graphical Model for Online Auditory Scene Modulation Using EEG Evidence for Attention. IEEE Trans Neural Syst Rehabil Eng 2017; 25:1970-1977. [PMID: 28600256 PMCID: PMC5681401 DOI: 10.1109/tnsre.2017.2712419] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent findings indicate that brain interfaces have the potential to enable attention-guided auditory scene analysis and manipulation in applications, such as hearing aids and augmented/virtual environments. Specifically, noninvasively acquired electroencephalography (EEG) signals have been demonstrated to carry some evidence regarding, which of multiple synchronous speech waveforms the subject attends to. In this paper, we demonstrate that: 1) using data- and model-driven cross-correlation features yield competitive binary auditory attention classification results with at most 20 s of EEG from 16 channels or even a single well-positioned channel; 2) a model calibrated using equal-energy speech waveforms competing for attention could perform well on estimating attention in closed-loop unbalanced-energy speech waveform situations, where the speech amplitudes are modulated by the estimated attention posterior probability distribution; 3) such a model would perform even better if it is corrected (linearly, in this instance) based on EEG evidence dependence on speech weights in the mixture; and 4) calibrating a model based on population EEG could result in acceptable performance for new individuals/users; therefore, EEG-based auditory attention classifiers may generalize across individuals, leading to reduced or eliminated calibration time and effort.
Collapse
|
46
|
Haghighi M, Moghadamfalahi M, Akcakaya M, Erdogmus D. EEG-assisted Modulation of Sound Sources in the Auditory Scene. Biomed Signal Process Control 2017; 39:263-270. [PMID: 31118975 DOI: 10.1016/j.bspc.2017.08.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Noninvasive EEG (electroencephalography) based auditory attention detection could be useful for improved hearing aids in the future. This work is a novel attempt to investigate the feasibility of online modulation of sound sources by probabilistic detection of auditory attention, using a noninvasive EEG-based brain computer interface. Proposed online system modulates the upcoming sound sources through gain adaptation which employs probabilistic decisions (soft decisions) from a classifier trained on offline calibration data. In this work, calibration EEG data were collected in sessions where the participants listened to two sound sources (one attended and one unattended). Cross-correlation coefficients between the EEG measurements and the attended and unattended sound source envelope (estimates) are used to show differences in sharpness and delays of neural responses for attended versus unattended sound source. Salient features to distinguish attended sources from the unattended ones in the correlation patterns have been identified, and later they have been used to train an auditory attention classifier. Using this classifier, we have shown high offline detection performance with single channel EEG measurements compared to the existing approaches in the literature which employ large number of channels. In addition, using the classifier trained offline in the calibration session, we have shown the performance of the online sound source modulation system. We observe that online sound source modulation system is able to keep the level of attended sound source higher than the unattended source.
Collapse
Affiliation(s)
| | | | - Murat Akcakaya
- University of Pittsburgh, 4200 Fifth Ave, Pittsburgh, PA 15260
| | - Deniz Erdogmus
- Northeastern University, 360 Huntington Ave, Boston, MA 02115
| |
Collapse
|
47
|
Fiedler L, Obleser J, Lunner T, Graversen C. Ear-EEG allows extraction of neural responses in challenging listening scenarios - A future technology for hearing aids? ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:5697-5700. [PMID: 28269548 DOI: 10.1109/embc.2016.7592020] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Advances in brain-computer interface research have recently empowered the development of wearable sensors to record mobile electroencephalography (EEG) as an unobtrusive and easy-to-use alternative to conventional scalp EEG. One such mobile solution is to record EEG from the ear canal, which has been validated for auditory steady state responses and discrete event related potentials (ERPs). However, it is still under discussion where to place recording and reference electrodes to capture best responses to auditory stimuli. Furthermore, the technology has not yet been tested and validated for ecologically relevant auditory stimuli such as speech. In this study, Ear-EEG and conventional scalp EEG were recorded simultaneously in a discrete-tone as well as a continuous-speech design. The discrete stimuli were applied in a dichotic oddball paradigm, while continuous stimuli were presented diotically as two simultaneous talkers. Cross-correlation of stimulus envelope and Ear-EEG was assessed as a measure of ongoing neural tracking. The extracted ERPs from Ear-EEG revealed typical auditory components yet depended critically on the reference electrode chosen. Reliable neural-tracking responses were extracted from the Ear-EEG for both paradigms, albeit weaker in amplitude than from scalp EEG. In conclusion, this study shows the feasibility of extracting relevant neural features from ear-canal-recorded "Ear-EEG", which might augment future hearing technology.
Collapse
|
48
|
Kumagai Y, Arvaneh M, Okawa H, Wada T, Tanaka T. Classification of familiarity based on cross-correlation features between EEG and music. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2017:2879-2882. [PMID: 29060499 DOI: 10.1109/embc.2017.8037458] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An approach to recognize the familiarity of a listener with music using both the electroencephalogram (EEG) signals and the music signal is proposed in this paper. Eight participants listened to melodies produced by piano sounds as simple natural stimuli. We classified the familiarity of each participant using cross-correlation values between EEG and the envelope of the music signal as features of the support vector machine (SVM) or neural network used. Here, we report that the maximum classification accuracy was 100% obtained by the SVM. These results suggest that the familiarity of music can be classified by cross-correlation values. The proposed approach can be used to recognize high-level brain states such as familiarity, preference, and emotion.
Collapse
|
49
|
Fuglsang SA, Dau T, Hjortkjær J. Noise-robust cortical tracking of attended speech in real-world acoustic scenes. Neuroimage 2017; 156:435-444. [PMID: 28412441 DOI: 10.1016/j.neuroimage.2017.04.026] [Citation(s) in RCA: 97] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Revised: 04/07/2017] [Accepted: 04/10/2017] [Indexed: 11/30/2022] Open
Abstract
Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream of a particular talker. Across the different listening environments, we found that the attended talker could be accurately decoded from single-trial EEG data irrespective of the different distortions in the acoustic input. For highly reverberant environments, speech envelopes reconstructed from neural responses to the distorted stimuli resembled the original clean signal more than the distorted input. With reverberant speech, we observed a late cortical response to the attended speech stream that encoded temporal modulations in the speech signal without its reverberant distortion. Single-trial attention decoding accuracies based on 40-50s long blocks of data from 64 scalp electrodes were equally high (80-90% correct) in all considered listening environments and remained statistically significant using down to 10 scalp electrodes and short (<30-s) unaveraged EEG segments. In contrast to the robust decoding of the attended talker we found that decoding of the unattended talker deteriorated with the acoustic distortions. These results suggest that cortical activity tracks an attended speech signal in a way that is invariant to acoustic distortions encountered in real-life sound environments. Noise-robust attention decoding additionally suggests a potential utility of stimulus reconstruction techniques in attention-controlled brain-computer interfaces.
Collapse
Affiliation(s)
- Søren Asp Fuglsang
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark.
| | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark
| | - Jens Hjortkjær
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark; Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Kettegaard Allé 30, 2650 Hvidovre, Denmark.
| |
Collapse
|
50
|
Fiedler L, Wöstmann M, Graversen C, Brandmeyer A, Lunner T, Obleser J. Single-channel in-ear-EEG detects the focus of auditory attention to concurrent tone streams and mixed speech. J Neural Eng 2017; 14:036020. [PMID: 28384124 DOI: 10.1088/1741-2552/aa66dd] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Conventional, multi-channel scalp electroencephalography (EEG) allows the identification of the attended speaker in concurrent-listening ('cocktail party') scenarios. This implies that EEG might provide valuable information to complement hearing aids with some form of EEG and to install a level of neuro-feedback. APPROACH To investigate whether a listener's attentional focus can be detected from single-channel hearing-aid-compatible EEG configurations, we recorded EEG from three electrodes inside the ear canal ('in-Ear-EEG') and additionally from 64 electrodes on the scalp. In two different, concurrent listening tasks, participants (n = 7) were fitted with individualized in-Ear-EEG pieces and were either asked to attend to one of two dichotically-presented, concurrent tone streams or to one of two diotically-presented, concurrent audiobooks. A forward encoding model was trained to predict the EEG response at single EEG channels. MAIN RESULTS Each individual participants' attentional focus could be detected from single-channel EEG response recorded from short-distance configurations consisting only of a single in-Ear-EEG electrode and an adjacent scalp-EEG electrode. The differences in neural responses to attended and ignored stimuli were consistent in morphology (i.e. polarity and latency of components) across subjects. SIGNIFICANCE In sum, our findings show that the EEG response from a single-channel, hearing-aid-compatible configuration provides valuable information to identify a listener's focus of attention.
Collapse
Affiliation(s)
- Lorenz Fiedler
- Department of Psychology, University of Lübeck, Lübeck, Germany. Max Planck Research Group 'Auditory Cognition', Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | | | | | | | | | | |
Collapse
|