101
|
Nogueira W, Dolhopiatenko H. Predicting speech intelligibility from a selective attention decoding paradigm in cochlear implant users. J Neural Eng 2022; 19. [PMID: 35234663 DOI: 10.1088/1741-2552/ac599f] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 03/01/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVES Electroencephalography (EEG) can be used to decode selective attention in cochlear implant (CI) users. This work investigates if selective attention to an attended speech source in the presence of a concurrent speech source can predict speech understanding in CI users. APPROACH CI users were instructed to attend to one out of two speech streams while EEG was recorded. Both speech streams were presented to the same ear and at different signal to interference ratios (SIRs). Speech envelope reconstruction of the to-be-attended speech from EEG was obtained by training decoders using regularized least squares. The correlation coefficient between the reconstructed and the attended (ρ_(A_SIR )) or the unattended (ρ_(U_SIR )) speech stream at each SIR was computed. Additionally, we computed the difference correlation coefficient at the same 〖(ρ〗_Diff= ρ_(A_SIR )-ρ_(U_SIR )) and opposite SIR (ρ_DiffOpp= ρ_(A_SIR )-ρ_(U_(-SIR) )). ρ_Diff compares the attended and unattended correlation coefficient to speech sources presented at different presentation levels depending on SIR. In contrast, ρ_DiffOpp compares the attended and unattended correlation coefficients to speech sources presented at the same presentation level irrespective of SIR. MAIN RESULTS Selective attention decoding in CI users is possible even if both speech streams are presented monaurally. A significant effect of SIR on ρ_(A_SIR ), ρ_Diff and ρ_DiffOpp, but not on ρ_(U_SIR ), was observed. Finally, the results show a significant correlation between speech understanding performance and ρ_(A_SIR ) as well as with ρ_(U_SIR ) across subjects. Moreover, ρ_DiffOpp which is less affected by the CI artifact, also demonstrated a significant correlation with speech understanding. SIGNIFICANCE Selective attention decoding in CI users is possible, however care needs to be taken with the CI artifact and the speech material used to train the decoders. These results are important for future development of objective speech understanding measures for CI users.
Collapse
Affiliation(s)
- Waldo Nogueira
- Department of Otolaryngology and Cluster of Excellence "Hearing4all", Hannover Medical School, Karl-Wiechert Allee, 3, Hannover, Niedersachsen, 30625, GERMANY
| | - Hanna Dolhopiatenko
- Department of Otolaryngology and Cluster of Excellence "Hearing4all", Hannover Medical School, Karl-Wiechert Allee, 3, Hannover, Niedersachsen, 30625, GERMANY
| |
Collapse
|
102
|
Marimon M, Höhle B, Langus A. Pupillary entrainment reveals individual differences in cue weighting in 9-month-old German-learning infants. Cognition 2022; 224:105054. [PMID: 35217262 DOI: 10.1016/j.cognition.2022.105054] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 01/30/2022] [Accepted: 01/31/2022] [Indexed: 02/08/2023]
Abstract
Young infants can segment continuous speech with statistical as well as prosodic cues. Understanding how these cues interact can be informative about how infants solve the segmentation problem. Here we investigate how German-speaking adults and 9-month-old German-learning infants weigh statistical and prosodic cues when segmenting continuous speech. We measured participants' pupil size while they were familiarized with a continuous speech stream where prosodic cues were pitted off against transitional probabilities. Adult participants' changes in pupil size synchronized with the occurrence of prosodic words during the familiarization and the temporal alignment of these pupillary changes was predictive of adult participants' performance at test. Further, 9-month-olds as a group failed to consistently segment the familiarization stream with prosodic or statistical cues. However, the variability in temporal alignment of the pupillary changes at word frequency showed that prosodic and statistical cues compete for dominance when segmenting continuous speech. A follow-up language development questionnaire at 40 months of age suggested that infants who entrained to prosodic words performed better on a vocabulary task and those infants who relied more on statistical cues performed better on grammatical tasks. Together these results suggest that statistics and prosody may serve different roles in speech segmentation in infancy.
Collapse
Affiliation(s)
- Mireia Marimon
- University of Potsdam, Cognitive Sciences, Department of Linguistics, Karl-Liebknecht-Str. 24-25, D-14476 Potsdam, Germany
| | - Barbara Höhle
- University of Potsdam, Cognitive Sciences, Department of Linguistics, Karl-Liebknecht-Str. 24-25, D-14476 Potsdam, Germany
| | - Alan Langus
- University of Potsdam, Cognitive Sciences, Department of Linguistics, Karl-Liebknecht-Str. 24-25, D-14476 Potsdam, Germany.
| |
Collapse
|
103
|
Wang L, Wang Y, Liu Z, Wu EX, Chen F. A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes. Front Neurosci 2022; 15:760611. [PMID: 35221885 PMCID: PMC8866945 DOI: 10.3389/fnins.2021.760611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 12/30/2021] [Indexed: 11/21/2022] Open
Abstract
In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous studies revealed that the root mean square (RMS)–level–based speech segmentation made a great contribution to the target speech perception with the modulation of sustained auditory attention. This study further investigated the effect of the RMS-level–based speech segmentation on the auditory attention decoding (AAD) performance with both sustained and switched attention in the competing speaker auditory scenes. Objective biomarkers derived from the cortical activities were also developed to index the dynamic auditory attention states. In the current study, subjects were asked to concentrate or switch their attention between two competing speaker streams. The neural responses to the higher- and lower-RMS-level speech segments were analyzed via the linear temporal response function (TRF) before and after the attention switching from one to the other speaker stream. Furthermore, the AAD performance decoded by the unified TRF decoding model was compared to that by the speech-RMS-level–based segmented decoding model with the dynamic change of the auditory attention states. The results showed that the weight of the typical TRF component approximately 100-ms time lag was sensitive to the switching of the auditory attention. Compared to the unified AAD model, the segmented AAD model improved attention decoding performance under both the sustained and switched auditory attention modulations in a wide range of signal-to-masker ratios (SMRs). In the competing speaker scenes, the TRF weight and AAD accuracy could be used as effective indicators to detect the changes of the auditory attention. In addition, with a wide range of SMRs (i.e., from 6 to –6 dB in this study), the segmented AAD model showed the robust decoding performance even with short decision window length, suggesting that this speech-RMS-level–based model has the potential to decode dynamic attention states in the realistic auditory scenarios.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Yihan Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Zhixing Liu
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Ed X. Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Fei Chen,
| |
Collapse
|
104
|
Abstract
The human brain exhibits the remarkable ability to categorize speech sounds into distinct, meaningful percepts, even in challenging tasks like learning non-native speech categories in adulthood and hearing speech in noisy listening conditions. In these scenarios, there is substantial variability in perception and behavior, both across individual listeners and individual trials. While there has been extensive work characterizing stimulus-related and contextual factors that contribute to variability, recent advances in neuroscience are beginning to shed light on another potential source of variability that has not been explored in speech processing. Specifically, there are task-independent, moment-to-moment variations in neural activity in broadly-distributed cortical and subcortical networks that affect how a stimulus is perceived on a trial-by-trial basis. In this review, we discuss factors that affect speech sound learning and moment-to-moment variability in perception, particularly arousal states—neurotransmitter-dependent modulations of cortical activity. We propose that a more complete model of speech perception and learning should incorporate subcortically-mediated arousal states that alter behavior in ways that are distinct from, yet complementary to, top-down cognitive modulations. Finally, we discuss a novel neuromodulation technique, transcutaneous auricular vagus nerve stimulation (taVNS), which is particularly well-suited to investigating causal relationships between arousal mechanisms and performance in a variety of perceptual tasks. Together, these approaches provide novel testable hypotheses for explaining variability in classically challenging tasks, including non-native speech sound learning.
Collapse
|
105
|
Wagner M, Ortiz-Mantilla S, Rusiniak M, Benasich AA, Shafer VL, Steinschneider M. Acoustic-level and language-specific processing of native and non-native phonological sequence onsets in the low gamma and theta-frequency bands. Sci Rep 2022; 12:314. [PMID: 35013345 PMCID: PMC8748887 DOI: 10.1038/s41598-021-03611-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 11/08/2021] [Indexed: 11/15/2022] Open
Abstract
Acoustic structures associated with native-language phonological sequences are enhanced within auditory pathways for perception, although the underlying mechanisms are not well understood. To elucidate processes that facilitate perception, time-frequency (T-F) analyses of EEGs obtained from native speakers of English and Polish were conducted. Participants listened to same and different nonword pairs within counterbalanced attend and passive conditions. Nonwords contained the onsets /pt/, /pət/, /st/, and /sət/ that occur in both the Polish and English languages with the exception of /pt/, which never occurs in the English language in word onset. Measures of spectral power and inter-trial phase locking (ITPL) in the low gamma (LG) and theta-frequency bands were analyzed from two bilateral, auditory source-level channels, created through source localization modeling. Results revealed significantly larger spectral power in LG for the English listeners to the unfamiliar /pt/ onsets from the right hemisphere at early cortical stages, during the passive condition. Further, ITPL values revealed distinctive responses in high and low-theta to acoustic characteristics of the onsets, which were modulated by language exposure. These findings, language-specific processing in LG and acoustic-level and language-specific processing in theta, support the view that multi scale temporal processing in the LG and theta-frequency bands facilitates speech perception.
Collapse
Affiliation(s)
- Monica Wagner
- St. John's University, St. John's Hall, Room 344 e1, 8000 Utopia Parkway, Queens, NY, 11439, USA.
| | | | | | | | - Valerie L Shafer
- The Graduate Center of the City University of New York, New York, NY, 10016, USA
| | | |
Collapse
|
106
|
Su E, Cai S, Xie L, Li H, Schultz T. STAnet: A Spatiotemporal Attention Network for Decoding Auditory Spatial Attention from EEG. IEEE Trans Biomed Eng 2022; 69:2233-2242. [PMID: 34982671 DOI: 10.1109/tbme.2022.3140246] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Humans are able to localize the source of a sound. This enables them to direct attention to a particular speaker in a cocktail party. Psycho-acoustic studies show that the sensory cortices of the human brain respond to the location of sound sources differently, and the auditory attention itself is a dynamic and temporally based brain activity. In this work, we seek to build a computational model which uses both spatial and temporal information manifested in EEG signals for auditory spatial attention detection (ASAD). METHODS We propose an end-to-end spatiotemporal attention network, denoted as STAnet, to detect auditory spatial attention from EEG. The STAnet is designed to assign differentiated weights dynamically to EEG channels through a spatial attention mechanism, and to temporal patterns in EEG signals through a temporal attention mechanism. RESULTS We report the ASAD experiments on two publicly available datasets. The STAnet outperforms other competitive models by a large margin under various experimental conditions. Its attention decision for 1-second decision window outperforms that of the state-of-the-art techniques for 10-second decision window. Experimental results also demonstrate that the STAnet achieves competitive performance on EEG signals ranging from 64 to as few as 16 channels. CONCLUSION This study provides evidence suggesting that efficient low-density EEG online decoding is within reach. SIGNIFICANCE This study also marks an important step towards the practical implementation of ASAD in real life applications.
Collapse
|
107
|
Petersen EB. Hearing-Aid Directionality Improves Neural Speech Tracking in Older Hearing-Impaired Listeners. Trends Hear 2022; 26:23312165221099894. [PMID: 35730193 PMCID: PMC9228639 DOI: 10.1177/23312165221099894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
In recent years, a growing body of literature has explored the effect of hearing impairment on the neural processing of speech, particularly related to the neural tracking of speech envelopes. However, only limited work has focused on the potential usage of the method for evaluating the effect of hearing aids designed to amplify and process the auditory input provided to hearing-impaired listeners. The current study investigates how directional sound processing in hearing-aids, denoted directionality, affects the neural tracking and encoding of speech in EEG recorded from 11 older hearing-impaired listeners. Behaviorally, the task performance improved when directionality was applied, while subjective ratings of listening effort were not affected. The reconstruction of the to-be-attended speech envelopes improved significantly when applying directionality, as well as when removing the background noise altogether. When inspecting the modelled response of the neural encoding of speech, a faster transition was observed between the early bottom-up response and the later top-down attentional-driven responses when directionality was applied. In summary, hearing-aid directionality affects both the neural speech tracking and neural encoding of to-be-attended speech. This result shows that hearing-aid signal processing impacts the neural processing of sounds and that neural speech tracking is indicative of the benefits associated with applying hearing-aid processing algorithms.
Collapse
|
108
|
Kern P, Heilbron M, de Lange FP, Spaak E. Cortical activity during naturalistic music listening reflects short-range predictions based on long-term experience. eLife 2022; 11:80935. [PMID: 36562532 PMCID: PMC9836393 DOI: 10.7554/elife.80935] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 12/22/2022] [Indexed: 12/24/2022] Open
Abstract
Expectations shape our experience of music. However, the internal model upon which listeners form melodic expectations is still debated. Do expectations stem from Gestalt-like principles or statistical learning? If the latter, does long-term experience play an important role, or are short-term regularities sufficient? And finally, what length of context informs contextual expectations? To answer these questions, we presented human listeners with diverse naturalistic compositions from Western classical music, while recording neural activity using MEG. We quantified note-level melodic surprise and uncertainty using various computational models of music, including a state-of-the-art transformer neural network. A time-resolved regression analysis revealed that neural activity over fronto-temporal sensors tracked melodic surprise particularly around 200ms and 300-500ms after note onset. This neural surprise response was dissociated from sensory-acoustic and adaptation effects. Neural surprise was best predicted by computational models that incorporated long-term statistical learning-rather than by simple, Gestalt-like principles. Yet, intriguingly, the surprise reflected primarily short-range musical contexts of less than ten notes. We present a full replication of our novel MEG results in an openly available EEG dataset. Together, these results elucidate the internal model that shapes melodic predictions during naturalistic music listening.
Collapse
Affiliation(s)
- Pius Kern
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Micha Heilbron
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Floris P de Lange
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Eelke Spaak
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| |
Collapse
|
109
|
Abstract
The auditory cortex of people with sensorineural hearing loss can be re-afferented using a cochlear implant (CI): a neural prosthesis that bypasses the damaged cells in the cochlea to directly stimulate the auditory nerve. Although CIs are the most successful neural prosthesis to date, some CI users still do not achieve satisfactory outcomes using these devices. To explain variability in outcomes, clinicians and researchers have increasingly focused their attention on neuroscientific investigations that examined how the auditory cortices respond to the electric signals that originate from the CI. This chapter provides an overview of the literature that examined how the auditory cortex changes its functional properties in response to inputs from the CI, in animal models and in humans. We focus first on the basic responses to sounds delivered through electrical hearing and, next, we examine the integrity of two fundamental aspects of the auditory system: tonotopy and processing of binaural cues. When addressing the effects of CIs in humans, we also consider speech-evoked responses. We conclude by discussing to what extent this neuroscientific literature can contribute to clinical practices and help to overcome variability in outcomes.
Collapse
Affiliation(s)
- Francesco Pavani
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy.
| | | |
Collapse
|
110
|
Etard O, Messaoud RB, Gaugain G, Reichenbach T. No Evidence of Attentional Modulation of the Neural Response to the Temporal Fine Structure of Continuous Musical Pieces. J Cogn Neurosci 2021; 34:411-424. [PMID: 35015867 DOI: 10.1162/jocn_a_01811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Speech and music are spectrotemporally complex acoustic signals that are highly relevant for humans. Both contain a temporal fine structure that is encoded in the neural responses of subcortical and cortical processing centers. The subcortical response to the temporal fine structure of speech has recently been shown to be modulated by selective attention to one of two competing voices. Music similarly often consists of several simultaneous melodic lines, and a listener can selectively attend to a particular one at a time. However, the neural mechanisms that enable such selective attention remain largely enigmatic, not least since most investigations to date have focused on short and simplified musical stimuli. Here, we studied the neural encoding of classical musical pieces in human volunteers, using scalp EEG recordings. We presented volunteers with continuous musical pieces composed of one or two instruments. In the latter case, the participants were asked to selectively attend to one of the two competing instruments and to perform a vibrato identification task. We used linear encoding and decoding models to relate the recorded EEG activity to the stimulus waveform. We show that we can measure neural responses to the temporal fine structure of melodic lines played by one single instrument, at the population level as well as for most individual participants. The neural response peaks at a latency of 7.6 msec and is not measurable past 15 msec. When analyzing the neural responses to the temporal fine structure elicited by competing instruments, we found no evidence of attentional modulation. We observed, however, that low-frequency neural activity exhibited a modulation consistent with the behavioral task at latencies from 100 to 160 msec, in a similar manner to the attentional modulation observed in continuous speech (N100). Our results show that, much like speech, the temporal fine structure of music is tracked by neural activity. In contrast to speech, however, this response appears unaffected by selective attention in the context of our experiment.
Collapse
|
111
|
Palana J, Schwartz S, Tager-Flusberg H. Evaluating the Use of Cortical Entrainment to Measure Atypical Speech Processing: A Systematic Review. Neurosci Biobehav Rev 2021; 133:104506. [PMID: 34942267 DOI: 10.1016/j.neubiorev.2021.12.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 12/12/2021] [Accepted: 12/18/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND Cortical entrainment has emerged as promising means for measuring continuous speech processing in young, neurotypical adults. However, its utility for capturing atypical speech processing has not been systematically reviewed. OBJECTIVES Synthesize evidence regarding the merit of measuring cortical entrainment to capture atypical speech processing and recommend avenues for future research. METHOD We systematically reviewed publications investigating entrainment to continuous speech in populations with auditory processing differences. RESULTS In the 25 publications reviewed, most studies were conducted on older and/or hearing-impaired adults, for whom slow-wave entrainment to speech was often heightened compared to controls. Research conducted on populations with neurodevelopmental disorders, in whom slow-wave entrainment was often reduced, was less common. Across publications, findings highlighted associations between cortical entrainment and speech processing performance differences. CONCLUSIONS Measures of cortical entrainment offer useful means of capturing speech processing differences and future research should leverage them more extensively when studying populations with neurodevelopmental disorders.
Collapse
Affiliation(s)
- Joseph Palana
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA; Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Harvard Medical School, Boston Children's Hospital, 1 Autumn Street, Boston, MA, 02215, USA
| | - Sophie Schwartz
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA
| | - Helen Tager-Flusberg
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA.
| |
Collapse
|
112
|
Straetmans L, Holtze B, Debener S, Jaeger M, Mirkovic B. Neural tracking to go: auditory attention decoding and saliency detection with mobile EEG. J Neural Eng 2021; 18. [PMID: 34902846 DOI: 10.1088/1741-2552/ac42b5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 12/13/2021] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Neuro-steered assistive technologies have been suggested to offer a major advancement in future devices like neuro-steered hearing aids. Auditory attention decoding methods would in that case allow for identification of an attended speaker within complex auditory environments, exclusively from neural data. Decoding the attended speaker using neural information has so far only been done in controlled laboratory settings. Yet, it is known that ever-present factors like distraction and movement are reflected in the neural signal parameters related to attention. APPROACH Thus, in the current study we applied a two-competing speaker paradigm to investigate performance of a commonly applied EEG-based auditory attention decoding (AAD) model outside of the laboratory during leisure walking and distraction. Unique environmental sounds were added to the auditory scene and served as distractor events. MAIN RESULTS The current study shows, for the first time, that the attended speaker can be accurately decoded during natural movement. At a temporal resolution of as short as 5-seconds and without artifact attenuation, decoding was found to be significantly above chance level. Further, as hypothesized, we found a decrease in attention to the to-be-attended and the to-be-ignored speech stream after the occurrence of a salient event. Additionally, we demonstrate that it is possible to predict neural correlates of distraction with a computational model of auditory saliency based on acoustic features. CONCLUSION Taken together, our study shows that auditory attention tracking outside of the laboratory in ecologically valid conditions is feasible and a step towards the development of future neural-steered hearing aids.
Collapse
Affiliation(s)
- Lisa Straetmans
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstraße 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - B Holtze
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Stefan Debener
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Manuela Jaeger
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Bojana Mirkovic
- Department of Psychology , Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| |
Collapse
|
113
|
Huet MP, Micheyl C, Parizet E, Gaudrain E. Behavioral Account of Attended Stream Enhances Neural Tracking. Front Neurosci 2021; 15:674112. [PMID: 34966252 PMCID: PMC8710602 DOI: 10.3389/fnins.2021.674112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
During the past decade, several studies have identified electroencephalographic (EEG) correlates of selective auditory attention to speech. In these studies, typically, listeners are instructed to focus on one of two concurrent speech streams (the "target"), while ignoring the other (the "masker"). EEG signals are recorded while participants are performing this task, and subsequently analyzed to recover the attended stream. An assumption often made in these studies is that the participant's attention can remain focused on the target throughout the test. To check this assumption, and assess when a participant's attention in a concurrent speech listening task was directed toward the target, the masker, or neither, we designed a behavioral listen-then-recall task (the Long-SWoRD test). After listening to two simultaneous short stories, participants had to identify keywords from the target story, randomly interspersed among words from the masker story and words from neither story, on a computer screen. To modulate task difficulty, and hence, the likelihood of attentional switches, masker stories were originally uttered by the same talker as the target stories. The masker voice parameters were then manipulated to parametrically control the similarity of the two streams, from clearly dissimilar to almost identical. While participants listened to the stories, EEG signals were measured and subsequently, analyzed using a temporal response function (TRF) model to reconstruct the speech stimuli. Responses in the behavioral recall task were used to infer, retrospectively, when attention was directed toward the target, the masker, or neither. During the model-training phase, the results of these behavioral-data-driven inferences were used as inputs to the model in addition to the EEG signals, to determine if this additional information would improve stimulus reconstruction accuracy, relative to performance of models trained under the assumption that the listener's attention was unwaveringly focused on the target. Results from 21 participants show that information regarding the actual - as opposed to, assumed - attentional focus can be used advantageously during model training, to enhance subsequent (test phase) accuracy of auditory stimulus-reconstruction based on EEG signals. This is the case, especially, in challenging listening situations, where the participants' attention is less likely to remain focused entirely on the target talker. In situations where the two competing voices are clearly distinct and easily separated perceptually, the assumption that listeners are able to stay focused on the target is reasonable. The behavioral recall protocol introduced here provides experimenters with a means to behaviorally track fluctuations in auditory selective attention, including, in combined behavioral/neurophysiological studies.
Collapse
Affiliation(s)
- Moïra-Phoebé Huet
- Laboratoire Vibrations Acoustique, Institut National des Sciences Appliquées de Lyon, Université de Lyon, Villeurbanne, France
- CNRS UMR 5292, INSERM U1028, Auditory Cognition and Psychoacoustics Team, Lyon Neuroscience Research Center, Lyon, France
| | | | - Etienne Parizet
- Laboratoire Vibrations Acoustique, Institut National des Sciences Appliquées de Lyon, Université de Lyon, Villeurbanne, France
| | - Etienne Gaudrain
- CNRS UMR 5292, INSERM U1028, Auditory Cognition and Psychoacoustics Team, Lyon Neuroscience Research Center, Lyon, France
- Department of Otorhinolaryngology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| |
Collapse
|
114
|
Agmon G, Yahav PHS, Ben-Shachar M, Golumbic EZ. Attention to Speech: Mapping Distributed and Selective Attention Systems. Cereb Cortex 2021; 32:3763-3776. [PMID: 34875678 DOI: 10.1093/cercor/bhab446] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Revised: 11/02/2021] [Accepted: 11/03/2021] [Indexed: 11/14/2022] Open
Abstract
When faced with situations where many people talk at once, individuals can employ different listening strategies to deal with the cacophony of speech sounds and to achieve different goals. In this fMRI study, we investigated how the pattern of neural activity is affected by the type of attention applied to speech in a simulated "cocktail party." Specifically, we compared brain activation patterns when listeners "attended selectively" to only one speaker and ignored all others, versus when they "distributed their attention" and followed several concurrent speakers. Conjunction analysis revealed a highly overlapping network of regions activated for both types of attention, including auditory association cortex (bilateral STG/STS) and frontoparietal regions related to speech processing and attention (bilateral IFG/insula, right MFG, left IPS). Activity within nodes of this network, though, was modulated by the type of attention required as well as the number of competing speakers. Auditory and speech-processing regions exhibited higher activity during distributed attention, whereas frontoparietal regions were activated more strongly during selective attention. These results suggest a common "attention to speech" network, which provides the computational infrastructure to deal effectively with multi-speaker input, but with sufficient flexibility to implement different prioritization strategies and to adapt to different listener goals.
Collapse
Affiliation(s)
- Galit Agmon
- Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Paz Har-Shai Yahav
- Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Michal Ben-Shachar
- Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan 5290002, Israel.,Department of English Literature and Linguistics, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Elana Zion Golumbic
- Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan 5290002, Israel
| |
Collapse
|
115
|
Reddy Katthi J, Ganapathy S. Deep Correlation Analysis for Audio-EEG Decoding. IEEE Trans Neural Syst Rehabil Eng 2021; 29:2742-2753. [PMID: 34874861 DOI: 10.1109/tnsre.2021.3129790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The electroencephalography (EEG), which is one of the easiest modes of recording brain activations in a non-invasive manner, is often distorted due to recording artifacts which adversely impacts the stimulus-response analysis. The most prominent techniques thus far attempt to improve the stimulus-response correlations using linear methods. In this paper, we propose a neural network based correlation analysis framework that significantly improves over the linear methods for auditory stimuli. A deep model is proposed for intra-subject audio-EEG analysis based on directly optimizing the correlation loss. Further, a neural network model with a shared encoder architecture is proposed for improving the inter-subject stimulus response correlations. These models attempt to suppress the EEG artifacts while preserving the components related to the stimulus. Several experiments are performed using EEG recordings from subjects listening to speech and music stimuli. In these experiments, we show that the deep models improve the Pearson correlation significantly over the linear methods (average absolute improvements of 7.4% in speech tasks and 29.3% in music tasks). We also analyze the impact of several model parameters on the stimulus-response correlation.
Collapse
|
116
|
Robust anticipation of continuous steering actions from electroencephalographic data during simulated driving. Sci Rep 2021; 11:23383. [PMID: 34862442 PMCID: PMC8642531 DOI: 10.1038/s41598-021-02750-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 10/26/2021] [Indexed: 11/08/2022] Open
Abstract
Driving a car requires high cognitive demands, from sustained attention to perception and action planning. Recent research investigated the neural processes reflecting the planning of driving actions, aiming to better understand the factors leading to driving errors and to devise methodologies to anticipate and prevent such errors by monitoring the driver’s cognitive state and intention. While such anticipation was shown for discrete driving actions, such as emergency braking, there is no evidence for robust neural signatures of continuous action planning. This study aims to fill this gap by investigating continuous steering actions during a driving task in a car simulator with multimodal recordings of behavioural and electroencephalography (EEG) signals. System identification is used to assess whether robust neurophysiological signatures emerge before steering actions. Linear decoding models are then used to determine whether such cortical signals can predict continuous steering actions with progressively longer anticipation. Results point to significant EEG signatures of continuous action planning. Such neural signals show consistent dynamics across participants for anticipations up to 1 s, while individual-subject neural activity could reliably decode steering actions and predict future actions for anticipations up to 1.8 s. Finally, we use canonical correlation analysis to attempt disentangling brain and non-brain contributors to the EEG-based decoding. Our results suggest that low-frequency cortical dynamics are involved in the planning of steering actions and that EEG is sensitive to that neural activity. As a result, we propose a framework to investigate anticipatory neural activity in realistic continuous motor tasks.
Collapse
|
117
|
Gransier R, Wouters J. Neural auditory processing of parameterized speech envelopes. Hear Res 2021; 412:108374. [PMID: 34800800 DOI: 10.1016/j.heares.2021.108374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 10/01/2021] [Accepted: 10/13/2021] [Indexed: 10/19/2022]
Abstract
Speech perception depends highly on the neural processing of the speech envelope. Several auditory processing deficits are hypothesized to result in a reduction in fidelity of the neural representation of the speech envelope across the auditory pathway. Furthermore, this reduction in fidelity is associated with supra-threshold speech processing deficits. Investigating the mechanisms that affect the neural encoding of the speech envelope can be of great value to gain insight in the different mechanisms that account for this reduced neural representation, and to develop stimulation strategies for hearing prosthesis that aim to restore it. In this perspective, we discuss the importance of neural assessment of phase-locking to the speech envelope from an audiological view and introduce the Temporal Envelope Speech Tracking (TEMPEST) stimulus framework which enables the electrophysiological assessment of envelope processing across the auditory pathway in a systematic and standardized way. We postulate that this framework can be used to gain insight in the salience of speech-like temporal envelopes in the neural code and to evaluate the effectiveness of stimulation strategies that aim to restore temporal processing across the auditory pathway with auditory prostheses.
Collapse
Affiliation(s)
- Robin Gransier
- ExpORL, Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium.
| | - Jan Wouters
- ExpORL, Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
118
|
Accou B, Jalilpour Monesi M, Van Hamme H, Francart T. Predicting speech intelligibility from EEG in a non-linear classification paradigm . J Neural Eng 2021; 18. [PMID: 34706347 DOI: 10.1088/1741-2552/ac33e9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/27/2021] [Indexed: 01/05/2023]
Abstract
Objective.Currently, only behavioral speech understanding tests are available, which require active participation of the person being tested. As this is infeasible for certain populations, an objective measure of speech intelligibility is required. Recently, brain imaging data has been used to establish a relationship between stimulus and brain response. Linear models have been successfully linked to speech intelligibility but require per-subject training. We present a deep-learning-based model incorporating dilated convolutions that operates in a match/mismatch paradigm. The accuracy of the model's match/mismatch predictions can be used as a proxy for speech intelligibility without subject-specific (re)training.Approach.We evaluated the performance of the model as a function of input segment length, electroencephalography (EEG) frequency band and receptive field size while comparing it to multiple baseline models. Next, we evaluated performance on held-out data and finetuning. Finally, we established a link between the accuracy of our model and the state-of-the-art behavioral MATRIX test.Main results.The dilated convolutional model significantly outperformed the baseline models for every input segment length, for all EEG frequency bands except the delta and theta band, and receptive field sizes between 250 and 500 ms. Additionally, finetuning significantly increased the accuracy on a held-out dataset. Finally, a significant correlation (r= 0.59,p= 0.0154) was found between the speech reception threshold (SRT) estimated using the behavioral MATRIX test and our objective method.Significance.Our method is the first to predict the SRT from EEG for unseen subjects, contributing to objective measures of speech intelligibility.
Collapse
Affiliation(s)
- Bernd Accou
- Department of Neuroscience and Department of Electrical Engineering, KU Leuven, Leuven, Vlaams Brabant, 3000, Belgium
| | - Mohammad Jalilpour Monesi
- Department of Neuroscience and Department of Electrical Engineering, KU Leuven, Leuven, Vlaams Brabant, 3000, Belgium
| | - Hugo Van Hamme
- Department of Electrical Engineering, KU Leuven, Leuven, Vlaams Brabant, 3000, Belgium
| | - Tom Francart
- Department of Neuroscience, KU Leuven, Leuven, Vlaams Brabant, 3000, Belgium
| |
Collapse
|
119
|
Renvall H, Seol J, Tuominen R, Sorger B, Riecke L, Salmelin R. Selective auditory attention within naturalistic scenes modulates reactivity to speech sounds. Eur J Neurosci 2021; 54:7626-7641. [PMID: 34697833 PMCID: PMC9298413 DOI: 10.1111/ejn.15504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 10/10/2021] [Indexed: 11/27/2022]
Abstract
Rapid recognition and categorization of sounds are essential for humans and animals alike, both for understanding and reacting to our surroundings and for daily communication and social interaction. For humans, perception of speech sounds is of crucial importance. In real life, this task is complicated by the presence of a multitude of meaningful non‐speech sounds. The present behavioural, magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) study was set out to address how attention to speech versus attention to natural non‐speech sounds within complex auditory scenes influences cortical processing. The stimuli were superimpositions of spoken words and environmental sounds, with parametric variation of the speech‐to‐environmental sound intensity ratio. The participants' task was to detect a repetition in either the speech or the environmental sound. We found that specifically when participants attended to speech within the superimposed stimuli, higher speech‐to‐environmental sound ratios resulted in shorter sustained MEG responses and stronger BOLD fMRI signals especially in the left supratemporal auditory cortex and in improved behavioural performance. No such effects of speech‐to‐environmental sound ratio were observed when participants attended to the environmental sound part within the exact same stimuli. These findings suggest stronger saliency of speech compared with other meaningful sounds during processing of natural auditory scenes, likely linked to speech‐specific top‐down and bottom‐up mechanisms activated during speech perception that are needed for tracking speech in real‐life‐like auditory environments.
Collapse
Affiliation(s)
- Hanna Renvall
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto NeuroImaging, Aalto University, Espoo, Finland.,BioMag Laboratory, HUS Diagnostic Center, Helsinki University Hospital, University of Helsinki and Aalto University School of Science, Helsinki, Finland
| | - Jaeho Seol
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Riku Tuominen
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Bettina Sorger
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Lars Riecke
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Riitta Salmelin
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
120
|
Decoding Object-Based Auditory Attention from Source-Reconstructed MEG Alpha Oscillations. J Neurosci 2021; 41:8603-8617. [PMID: 34429378 DOI: 10.1523/jneurosci.0583-21.2021] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/08/2021] [Accepted: 08/11/2021] [Indexed: 11/21/2022] Open
Abstract
How do we attend to relevant auditory information in complex naturalistic scenes? Much research has focused on detecting which information is attended, without regarding underlying top-down control mechanisms. Studies investigating attentional control generally manipulate and cue specific features in simple stimuli. However, in naturalistic scenes it is impossible to dissociate relevant from irrelevant information based on low-level features. Instead, the brain has to parse and select auditory objects of interest. The neural underpinnings of object-based auditory attention remain not well understood. Here we recorded MEG while 15 healthy human subjects (9 female) prepared for the repetition of an auditory object presented in one of two overlapping naturalistic auditory streams. The stream containing the repetition was prospectively cued with 70% validity. Crucially, this task could not be solved by attending low-level features, but only by processing the objects fully. We trained a linear classifier on the cortical distribution of source-reconstructed oscillatory activity to distinguish which auditory stream was attended. We could successfully classify the attended stream from alpha (8-14 Hz) activity in anticipation of repetition onset. Importantly, attention could only be classified from trials in which subjects subsequently detected the repetition, but not from miss trials. Behavioral relevance was further supported by a correlation between classification accuracy and detection performance. Decodability was not sustained throughout stimulus presentation, but peaked shortly before repetition onset, suggesting that attention acted transiently according to temporal expectations. We thus demonstrate anticipatory alpha oscillations to underlie top-down control of object-based auditory attention in complex naturalistic scenes.SIGNIFICANCE STATEMENT In everyday life, we often find ourselves bombarded with auditory information, from which we need to select what is relevant to our current goals. Previous research has highlighted how we attend to specific highly controlled aspects of the auditory input. Although invaluable, it is still unclear how this relates to attentional control in naturalistic auditory scenes. Here we used the high precision of magnetoencephalography in space and time to investigate the brain mechanisms underlying top-down control of object-based attention in ecologically valid sound scenes. We show that rhythmic activity in auditory association cortex at a frequency of ∼10 Hz (alpha waves) controls attention to currently relevant segments within the auditory scene and predicts whether these segments are subsequently detected.
Collapse
|
121
|
Ramos-Escobar N, Segura E, Olivé G, Rodriguez-Fornells A, François C. Oscillatory activity and EEG phase synchrony of concurrent word segmentation and meaning-mapping in 9-year-old children. Dev Cogn Neurosci 2021; 51:101010. [PMID: 34461393 PMCID: PMC8403737 DOI: 10.1016/j.dcn.2021.101010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 08/25/2021] [Accepted: 08/26/2021] [Indexed: 10/28/2022] Open
Abstract
When learning a new language, one must segment words from continuous speech and associate them with meanings. These complex processes can be boosted by attentional mechanisms triggered by multi-sensory information. Previous electrophysiological studies suggest that brain oscillations are sensitive to different hierarchical complexity levels of the input, making them a plausible neural substrate for speech parsing. Here, we investigated the functional role of brain oscillations during concurrent speech segmentation and meaning acquisition in sixty 9-year-old children. We collected EEG data during an audio-visual statistical learning task during which children were exposed to a learning condition with consistent word-picture associations and a random condition with inconsistent word-picture associations before being tested on their ability to recall words and word-picture associations. We capitalized on the brain dynamics to align neural activity to the same rate as an external rhythmic stimulus to explore modulations of neural synchronization and phase synchronization between electrodes during multi-sensory word learning. Results showed enhanced power at both word- and syllabic-rate and increased EEG phase synchronization between frontal and occipital regions in the learning compared to the random condition. These findings suggest that multi-sensory cueing and attentional mechanisms play an essential role in children's successful word learning.
Collapse
Affiliation(s)
- Neus Ramos-Escobar
- Dept. of Cognition, Development and Educational Science, Institute of Neuroscience, University of Barcelona, L'Hospitalet de Llobregat, Barcelona, 08097, Spain; Cognition and Brain Plasticity Group, Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona, 08097, Spain
| | - Emma Segura
- Dept. of Cognition, Development and Educational Science, Institute of Neuroscience, University of Barcelona, L'Hospitalet de Llobregat, Barcelona, 08097, Spain; Cognition and Brain Plasticity Group, Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona, 08097, Spain
| | - Guillem Olivé
- Dept. of Cognition, Development and Educational Science, Institute of Neuroscience, University of Barcelona, L'Hospitalet de Llobregat, Barcelona, 08097, Spain; Cognition and Brain Plasticity Group, Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona, 08097, Spain
| | - Antoni Rodriguez-Fornells
- Dept. of Cognition, Development and Educational Science, Institute of Neuroscience, University of Barcelona, L'Hospitalet de Llobregat, Barcelona, 08097, Spain; Cognition and Brain Plasticity Group, Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona, 08097, Spain; Catalan Institution for Research and Advanced Studies, ICREA, Barcelona, Spain.
| | | |
Collapse
|
122
|
Kiremitçi I, Yilmaz Ö, Çelik E, Shahdloo M, Huth AG, Çukur T. Attentional Modulation of Hierarchical Speech Representations in a Multitalker Environment. Cereb Cortex 2021; 31:4986-5005. [PMID: 34115102 PMCID: PMC8491717 DOI: 10.1093/cercor/bhab136] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Humans are remarkably adept in listening to a desired speaker in a crowded environment, while filtering out nontarget speakers in the background. Attention is key to solving this difficult cocktail-party task, yet a detailed characterization of attentional effects on speech representations is lacking. It remains unclear across what levels of speech features and how much attentional modulation occurs in each brain area during the cocktail-party task. To address these questions, we recorded whole-brain blood-oxygen-level-dependent (BOLD) responses while subjects either passively listened to single-speaker stories, or selectively attended to a male or a female speaker in temporally overlaid stories in separate experiments. Spectral, articulatory, and semantic models of the natural stories were constructed. Intrinsic selectivity profiles were identified via voxelwise models fit to passive listening responses. Attentional modulations were then quantified based on model predictions for attended and unattended stories in the cocktail-party task. We find that attention causes broad modulations at multiple levels of speech representations while growing stronger toward later stages of processing, and that unattended speech is represented up to the semantic level in parabelt auditory cortex. These results provide insights on attentional mechanisms that underlie the ability to selectively listen to a desired speaker in noisy multispeaker environments.
Collapse
Affiliation(s)
- Ibrahim Kiremitçi
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Özgür Yilmaz
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
| | - Emin Çelik
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Mo Shahdloo
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Experimental Psychology, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford OX3 9DU, UK
| | - Alexander G Huth
- Department of Neuroscience, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| | - Tolga Çukur
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| |
Collapse
|
123
|
Hausfeld L, Disbergen NR, Valente G, Zatorre RJ, Formisano E. Modulating Cortical Instrument Representations During Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2021; 15:635937. [PMID: 34630007 PMCID: PMC8498193 DOI: 10.3389/fnins.2021.635937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 08/24/2021] [Indexed: 11/13/2022] Open
Abstract
Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Niels R Disbergen
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Robert J Zatorre
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
| | - Elia Formisano
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, Netherlands
- Brightlands Institute for Smart Society (BISS), Maastricht University, Maastricht, Netherlands
| |
Collapse
|
124
|
Vander Ghinst M, Bourguignon M, Wens V, Naeije G, Ducène C, Niesen M, Hassid S, Choufani G, Goldman S, De Tiège X. Inaccurate cortical tracking of speech in adults with impaired speech perception in noise. Brain Commun 2021; 3:fcab186. [PMID: 34541530 PMCID: PMC8445395 DOI: 10.1093/braincomms/fcab186] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 06/05/2021] [Accepted: 06/08/2021] [Indexed: 01/17/2023] Open
Abstract
Impaired speech perception in noise despite normal peripheral auditory function is a common problem in young adults. Despite a growing body of research, the pathophysiology of this impairment remains unknown. This magnetoencephalography study characterizes the cortical tracking of speech in a multi-talker background in a group of highly selected adult subjects with impaired speech perception in noise without peripheral auditory dysfunction. Magnetoencephalographic signals were recorded from 13 subjects with impaired speech perception in noise (six females, mean age: 30 years) and matched healthy subjects while they were listening to 5 different recordings of stories merged with a multi-talker background at different signal to noise ratios (No Noise, +10, +5, 0 and −5 dB). The cortical tracking of speech was quantified with coherence between magnetoencephalographic signals and the temporal envelope of (i) the global auditory scene (i.e. the attended speech stream and the multi-talker background noise), (ii) the attended speech stream only and (iii) the multi-talker background noise. Functional connectivity was then estimated between brain areas showing altered cortical tracking of speech in noise in subjects with impaired speech perception in noise and the rest of the brain. All participants demonstrated a selective cortical representation of the attended speech stream in noisy conditions, but subjects with impaired speech perception in noise displayed reduced cortical tracking of speech at the syllable rate (i.e. 4–8 Hz) in all noisy conditions. Increased functional connectivity was observed in subjects with impaired speech perception in noise in Noiseless and speech in noise conditions between supratemporal auditory cortices and left-dominant brain areas involved in semantic and attention processes. The difficulty to understand speech in a multi-talker background in subjects with impaired speech perception in noise appears to be related to an inaccurate auditory cortex tracking of speech at the syllable rate. The increased functional connectivity between supratemporal auditory cortices and language/attention-related neocortical areas probably aims at supporting speech perception and subsequent recognition in adverse auditory scenes. Overall, this study argues for a central origin of impaired speech perception in noise in the absence of any peripheral auditory dysfunction.
Collapse
Affiliation(s)
- Marc Vander Ghinst
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Service, d'ORL et de chirurgie cervico-faciale, CUB Hôpital Erasme, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium
| | - Mathieu Bourguignon
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Laboratory of Neurophysiology and Movement Biomechanics, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Basque Center on Cognition, Brain and Language (BCBL), Donostia/San Sebastian 20009, Spain
| | - Vincent Wens
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Clinics of Functional Neuroimaging, Service of Nuclear Medicine, CUB Hôpital Erasme, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium
| | - Gilles Naeije
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Service de Neurologie, ULB-Hôpital Erasme, Université libre de Bruxelles (ULB), Brussels 1070, Belgium
| | - Cecile Ducène
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Service, d'ORL et de chirurgie cervico-faciale, CUB Hôpital Erasme, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium
| | - Maxime Niesen
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Service, d'ORL et de chirurgie cervico-faciale, CUB Hôpital Erasme, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium
| | - Sergio Hassid
- Service, d'ORL et de chirurgie cervico-faciale, CUB Hôpital Erasme, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium
| | - Georges Choufani
- Service, d'ORL et de chirurgie cervico-faciale, CUB Hôpital Erasme, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium
| | - Serge Goldman
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Clinics of Functional Neuroimaging, Service of Nuclear Medicine, CUB Hôpital Erasme, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium
| | - Xavier De Tiège
- Laboratoire de Cartographie fonctionnelle du Cerveau, UNI-ULB Neuroscience Institute, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium.,Clinics of Functional Neuroimaging, Service of Nuclear Medicine, CUB Hôpital Erasme, Université Libre de Bruxelles (ULB), Brussels 1070, Belgium
| |
Collapse
|
125
|
Devaraju DS, Kemp A, Eddins DA, Shrivastav R, Chandrasekaran B, Hampton Wray A. Effects of Task Demands on Neural Correlates of Acoustic and Semantic Processing in Challenging Listening Conditions. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3697-3706. [PMID: 34403278 DOI: 10.1044/2021_jslhr-21-00006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose Listeners shift their listening strategies between lower level acoustic information and higher level semantic information to prioritize maximum speech intelligibility in challenging listening conditions. Although increasing task demands via acoustic degradation modulates lexical-semantic processing, the neural mechanisms underlying different listening strategies are unclear. The current study examined the extent to which encoding of lower level acoustic cues is modulated by task demand and associations with lexical-semantic processes. Method Electroencephalography was acquired while participants listened to sentences in the presence of four-talker babble that contained either higher or lower probability final words. Task difficulty was modulated by time available to process responses. Cortical tracking of speech-neural correlates of acoustic temporal envelope processing-were estimated using temporal response functions. Results Task difficulty did not affect cortical tracking of temporal envelope of speech under challenging listening conditions. Neural indices of lexical-semantic processing (N400 amplitudes) were larger with increased task difficulty. No correlations were observed between the cortical tracking of temporal envelope of speech and lexical-semantic processes, even after controlling for the effect of individualized signal-to-noise ratios. Conclusions Cortical tracking of the temporal envelope of speech and semantic processing are differentially influenced by task difficulty. While increased task demands modulated higher level semantic processing, cortical tracking of the temporal envelope of speech may be influenced by task difficulty primarily when the demand is manipulated in terms of acoustic properties of the stimulus, consistent with an emerging perspective in speech perception.
Collapse
Affiliation(s)
- Dhatri S Devaraju
- Department of Communication Science and Disorders, University of Pittsburgh, PA
| | - Amy Kemp
- Department of Communication Sciences and Special Education, University of Georgia, Athens
| | - David A Eddins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | | | | | - Amanda Hampton Wray
- Department of Communication Science and Disorders, University of Pittsburgh, PA
| |
Collapse
|
126
|
Drgas S, Blaszak M, Przekoracka-Krawczyk A. The Combination of Neural Tracking and Alpha Power Lateralization for Auditory Attention Detection. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3603-3616. [PMID: 34403288 DOI: 10.1044/2021_jslhr-20-00608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose The acoustic source that is attended to by the listener in a mixture can be identified with a certain accuracy on the basis of their neural response recorded during listening, and various phenomena may be used to detect attention. For example, neural tracking (NT) and alpha power lateralization (APL) may be utilized in order to obtain information concerning attention. However, these methods of auditory attention detection (AAD) are typically tested in different experimental setups, which makes it impossible to compare their accuracy. The aim of this study is to compare the accuracy of AAD based on NT, APL, and their combination for a dichotic natural speech listening task. Method Thirteen adult listeners were presented with dichotic speech stimuli and instructed to attend to one of them. Electroencephalogram of the subjects was continuously recorded during the experiment using a set of 32 active electrodes. The accuracy of AAD was evaluated for trial lengths of 50, 25, and 12.5 s. AAD was tested for various parameters of NT- and APL-based modules. Results The obtained results suggest that NT of natural running speech provides similar accuracy to APL. The statistically significant improvement of the accuracy of AAD using a combined method has been observed not only for the longest duration of test samples (50 s, p = .005) but also for shorter ones (25 s, p = .011). Conclusions It seems that the combination of standard NT and APL significantly increases the effectiveness of accurate identification of the traced signal perceived by a listener under dichotic conditions. It has been demonstrated that, under certain conditions, the combination of NT and APL may provide a benefit for AAD in cocktail party scenarios.
Collapse
Affiliation(s)
- Szymon Drgas
- Institute of Automation and Robotics, Poznań University of Technology, Poland
| | - Magdalena Blaszak
- Department of Medical Physics and Radiospectroscopy, Faculty of Physics, Adam Mickiewicz University, Poznań, Poland
- Vision and Neuroscience Laboratory, NanoBioMedical Centre, Adam Mickiewicz University, Poznań, Poland
| | - Anna Przekoracka-Krawczyk
- Vision and Neuroscience Laboratory, NanoBioMedical Centre, Adam Mickiewicz University, Poznań, Poland
- Laboratory of Vision Science and Optometry, Faculty of Physics, Adam Mickiewicz University, Poznań, Poland
| |
Collapse
|
127
|
Lu Y, Wang M, Yao L, Shen H, Wu W, Zhang Q, Zhang L, Chen M, Liu H, Peng R, Liu M, Chen S. Auditory attention decoding from electroencephalography based on long short-term memory networks. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102966] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
128
|
Viswanathan V, Bharadwaj HM, Shinn-Cunningham BG, Heinz MG. Modulation masking and fine structure shape neural envelope coding to predict speech intelligibility across diverse listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2230. [PMID: 34598642 PMCID: PMC8483789 DOI: 10.1121/10.0006385] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/22/2021] [Accepted: 08/30/2021] [Indexed: 05/28/2023]
Abstract
A fundamental question in the neuroscience of everyday communication is how scene acoustics shape the neural processing of attended speech sounds and in turn impact speech intelligibility. While it is well known that the temporal envelopes in target speech are important for intelligibility, how the neural encoding of target-speech envelopes is influenced by background sounds or other acoustic features of the scene is unknown. Here, we combine human electroencephalography with simultaneous intelligibility measurements to address this key gap. We find that the neural envelope-domain signal-to-noise ratio in target-speech encoding, which is shaped by masker modulations, predicts intelligibility over a range of strategically chosen realistic listening conditions unseen by the predictive model. This provides neurophysiological evidence for modulation masking. Moreover, using high-resolution vocoding to carefully control peripheral envelopes, we show that target-envelope coding fidelity in the brain depends not only on envelopes conveyed by the cochlea, but also on the temporal fine structure (TFS), which supports scene segregation. Our results are consistent with the notion that temporal coherence of sound elements across envelopes and/or TFS influences scene analysis and attentive selection of a target sound. Our findings also inform speech-intelligibility models and technologies attempting to improve real-world speech communication.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA
| | - Hari M Bharadwaj
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | | | - Michael G Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
129
|
Zuk NJ, Murphy JW, Reilly RB, Lalor EC. Envelope reconstruction of speech and music highlights stronger tracking of speech at low frequencies. PLoS Comput Biol 2021; 17:e1009358. [PMID: 34534211 PMCID: PMC8480853 DOI: 10.1371/journal.pcbi.1009358] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 09/29/2021] [Accepted: 08/18/2021] [Indexed: 11/19/2022] Open
Abstract
The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one's cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.
Collapse
Affiliation(s)
- Nathaniel J. Zuk
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Del Monte Institute of Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Jeremy W. Murphy
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Richard B. Reilly
- Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Edmund C. Lalor
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Del Monte Institute of Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
| |
Collapse
|
130
|
Li J, Hong B, Nolte G, Engel AK, Zhang D. Preparatory delta phase response is correlated with naturalistic speech comprehension performance. Cogn Neurodyn 2021; 16:337-352. [PMID: 35401861 PMCID: PMC8934811 DOI: 10.1007/s11571-021-09711-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 07/09/2021] [Accepted: 08/12/2021] [Indexed: 01/07/2023] Open
Abstract
While human speech comprehension is thought to be an active process that involves top-down predictions, it remains unclear how predictive information is used to prepare for the processing of upcoming speech information. We aimed to identify the neural signatures of the preparatory processing of upcoming speech. Participants selectively attended to one of two competing naturalistic, narrative speech streams, and a temporal response function (TRF) method was applied to derive event-related-like neural responses from electroencephalographic data. The phase responses to the attended speech at the delta band (1-4 Hz) were correlated with the comprehension performance of individual participants, with a latency of - 200-0 ms relative to the onset of speech amplitude envelope fluctuations over the fronto-central and left-lateralized parietal electrodes. The phase responses to the attended speech at the alpha band also correlated with comprehension performance but with a latency of 650-980 ms post-onset over the fronto-central electrodes. Distinct neural signatures were found for the attentional modulation, taking the form of TRF-based amplitude responses at a latency of 240-320 ms post-onset over the left-lateralized fronto-central and occipital electrodes. Our findings reveal how the brain gets prepared to process an upcoming speech in a continuous, naturalistic speech context.
Collapse
Affiliation(s)
- Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| | - Bo Hong
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Andreas K. Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, China
| |
Collapse
|
131
|
AIM: A network model of attention in auditory cortex. PLoS Comput Biol 2021; 17:e1009356. [PMID: 34449761 PMCID: PMC8462696 DOI: 10.1371/journal.pcbi.1009356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 09/24/2021] [Accepted: 08/18/2021] [Indexed: 11/19/2022] Open
Abstract
Attentional modulation of cortical networks is critical for the cognitive flexibility required to process complex scenes. Current theoretical frameworks for attention are based almost exclusively on studies in visual cortex, where attentional effects are typically modest and excitatory. In contrast, attentional effects in auditory cortex can be large and suppressive. A theoretical framework for explaining attentional effects in auditory cortex is lacking, preventing a broader understanding of cortical mechanisms underlying attention. Here, we present a cortical network model of attention in primary auditory cortex (A1). A key mechanism in our network is attentional inhibitory modulation (AIM) of cortical inhibitory neurons. In this mechanism, top-down inhibitory neurons disinhibit bottom-up cortical circuits, a prominent circuit motif observed in sensory cortex. Our results reveal that the same underlying mechanisms in the AIM network can explain diverse attentional effects on both spatial and frequency tuning in A1. We find that a dominant effect of disinhibition on cortical tuning is suppressive, consistent with experimental observations. Functionally, the AIM network may play a key role in solving the cocktail party problem. We demonstrate how attention can guide the AIM network to monitor an acoustic scene, select a specific target, or switch to a different target, providing flexible outputs for solving the cocktail party problem. Selective attention plays a key role in how we navigate our everyday lives. For example, at a cocktail party, we can attend to friend’s speech amidst other speakers, music, and background noise. In stark contrast, hundreds of millions of people with hearing impairment and other disorders find such environments overwhelming and debilitating. Understanding the mechanisms underlying selective attention may lead to breakthroughs in improving the quality of life for those negatively affected. Here, we propose a mechanistic network model of attention in primary auditory cortex based on attentional inhibitory modulation (AIM). In the AIM model, attention targets specific cortical inhibitory neurons, which then modulate local cortical circuits to emphasize a particular feature of sounds and suppress competing features. We show that the AIM model can account for experimental observations across different species and stimulus domains. We also demonstrate that the same mechanisms can enable listeners to flexibly switch between attending to specific targets sounds and monitoring the environment in complex acoustic scenes, such as a cocktail party. The AIM network provides a theoretical framework which can work in tandem with new experiments to help unravel cortical circuits underlying attention.
Collapse
|
132
|
θ-Band Cortical Tracking of the Speech Envelope Shows the Linear Phase Property. eNeuro 2021; 8:ENEURO.0058-21.2021. [PMID: 34380659 PMCID: PMC8387159 DOI: 10.1523/eneuro.0058-21.2021] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 07/10/2021] [Accepted: 07/29/2021] [Indexed: 11/30/2022] Open
Abstract
When listening to speech, low-frequency cortical activity tracks the speech envelope. It remains controversial, however, whether such envelope-tracking neural activity reflects entrainment of neural oscillations or superposition of transient responses evoked by sound features. Recently, it is suggested that the phase of envelope-tracking activity can potentially distinguish entrained oscillations and evoked responses. Here, we analyze the phase of envelope-tracking in humans during passive listening, and observe that the phase lag between cortical activity and speech envelope tends to change linearly across frequency in the θ band (4–8 Hz), suggesting that the θ-band envelope-tracking activity can be readily modeled by evoked responses.
Collapse
|
133
|
Homma NY, Bajo VM. Lemniscal Corticothalamic Feedback in Auditory Scene Analysis. Front Neurosci 2021; 15:723893. [PMID: 34489635 PMCID: PMC8417129 DOI: 10.3389/fnins.2021.723893] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/30/2021] [Indexed: 12/15/2022] Open
Abstract
Sound information is transmitted from the ear to central auditory stations of the brain via several nuclei. In addition to these ascending pathways there exist descending projections that can influence the information processing at each of these nuclei. A major descending pathway in the auditory system is the feedback projection from layer VI of the primary auditory cortex (A1) to the ventral division of medial geniculate body (MGBv) in the thalamus. The corticothalamic axons have small glutamatergic terminals that can modulate thalamic processing and thalamocortical information transmission. Corticothalamic neurons also provide input to GABAergic neurons of the thalamic reticular nucleus (TRN) that receives collaterals from the ascending thalamic axons. The balance of corticothalamic and TRN inputs has been shown to refine frequency tuning, firing patterns, and gating of MGBv neurons. Therefore, the thalamus is not merely a relay stage in the chain of auditory nuclei but does participate in complex aspects of sound processing that include top-down modulations. In this review, we aim (i) to examine how lemniscal corticothalamic feedback modulates responses in MGBv neurons, and (ii) to explore how the feedback contributes to auditory scene analysis, particularly on frequency and harmonic perception. Finally, we will discuss potential implications of the role of corticothalamic feedback in music and speech perception, where precise spectral and temporal processing is essential.
Collapse
Affiliation(s)
- Natsumi Y. Homma
- Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA, United States
- Coleman Memorial Laboratory, Department of Otolaryngology – Head and Neck Surgery, University of California, San Francisco, San Francisco, CA, United States
| | - Victoria M. Bajo
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
134
|
Geravanchizadeh M, Roushan H. Dynamic selective auditory attention detection using RNN and reinforcement learning. Sci Rep 2021; 11:15497. [PMID: 34326401 PMCID: PMC8322190 DOI: 10.1038/s41598-021-94876-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 07/16/2021] [Indexed: 11/08/2022] Open
Abstract
The cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. The proposed dynamic SAAD is modeled as a sequential decision-making problem, which is solved by recurrent neural network (RNN) and reinforcement learning methods of Q-learning and deep Q-learning. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach with RNN as agent provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.
Collapse
Affiliation(s)
- Masoud Geravanchizadeh
- Faculty of Electrical & Computer Engineering, University of Tabriz, 51666-15813, Tabriz, Iran.
| | - Hossein Roushan
- Faculty of Electrical & Computer Engineering, University of Tabriz, 51666-15813, Tabriz, Iran
| |
Collapse
|
135
|
Verschueren E, Vanthornhout J, Francart T. The Effect of Stimulus Choice on an EEG-Based Objective Measure of Speech Intelligibility. Ear Hear 2021; 41:1586-1597. [PMID: 33136634 DOI: 10.1097/aud.0000000000000875] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
OBJECTIVES Recently, an objective measure of speech intelligibility (SI), based on brain responses derived from the electroencephalogram (EEG), has been developed using isolated Matrix sentences as a stimulus. We investigated whether this objective measure of SI can also be used with natural speech as a stimulus, as this would be beneficial for clinical applications. DESIGN We recorded the EEG in 19 normal-hearing participants while they listened to two types of stimuli: Matrix sentences and a natural story. Each stimulus was presented at different levels of SI by adding speech weighted noise. SI was assessed in two ways for both stimuli: (1) behaviorally and (2) objectively by reconstructing the speech envelope from the EEG using a linear decoder and correlating it with the acoustic envelope. We also calculated temporal response functions (TRFs) to investigate the temporal characteristics of the brain responses in the EEG channels covering different brain areas. RESULTS For both stimulus types, the correlation between the speech envelope and the reconstructed envelope increased with increasing SI. In addition, correlations were higher for the natural story than for the Matrix sentences. Similar to the linear decoder analysis, TRF amplitudes increased with increasing SI for both stimuli. Remarkable is that although SI remained unchanged under the no-noise and +2.5 dB SNR conditions, neural speech processing was affected by the addition of this small amount of noise: TRF amplitudes across the entire scalp decreased between 0 and 150 ms, while amplitudes between 150 and 200 ms increased in the presence of noise. TRF latency changes in function of SI appeared to be stimulus specific: the latency of the prominent negative peak in the early responses (50 to 300 ms) increased with increasing SI for the Matrix sentences, but remained unchanged for the natural story. CONCLUSIONS These results show (1) the feasibility of natural speech as a stimulus for the objective measure of SI; (2) that neural tracking of speech is enhanced using a natural story compared to Matrix sentences; and (3) that noise and the stimulus type can change the temporal characteristics of the brain responses. These results might reflect the integration of incoming acoustic features and top-down information, suggesting that the choice of the stimulus has to be considered based on the intended purpose of the measurement.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology (ExpORL), Department of Neurosciences, KU Leuven-University of Leuven, Leuven, Belgium
| | | | | |
Collapse
|
136
|
Tune S, Alavash M, Fiedler L, Obleser J. Neural attentional-filter mechanisms of listening success in middle-aged and older individuals. Nat Commun 2021; 12:4533. [PMID: 34312388 PMCID: PMC8313676 DOI: 10.1038/s41467-021-24771-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 07/01/2021] [Indexed: 12/12/2022] Open
Abstract
Successful listening crucially depends on intact attentional filters that separate relevant from irrelevant information. Research into their neurobiological implementation has focused on two potential auditory filter strategies: the lateralization of alpha power and selective neural speech tracking. However, the functional interplay of the two neural filter strategies and their potency to index listening success in an ageing population remains unclear. Using electroencephalography and a dual-talker task in a representative sample of listeners (N = 155; age=39-80 years), we here demonstrate an often-missed link from single-trial behavioural outcomes back to trial-by-trial changes in neural attentional filtering. First, we observe preserved attentional-cue-driven modulation of both neural filters across chronological age and hearing levels. Second, neural filter states vary independently of one another, demonstrating complementary neurobiological solutions of spatial selective attention. Stronger neural speech tracking but not alpha lateralization boosts trial-to-trial behavioural performance. Our results highlight the translational potential of neural speech tracking as an individualized neural marker of adaptive listening behaviour.
Collapse
Affiliation(s)
- Sarah Tune
- Department of Psychology, University of Lübeck, Lübeck, Germany.
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany.
| | - Mohsen Alavash
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
| | - Lorenz Fiedler
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
- Eriksholm Research Centre, Snekkersten, Denmark
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany.
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
137
|
Zhang M, Alamatsaz N, Ihlefeld A. Hemodynamic Responses Link Individual Differences in Informational Masking to the Vicinity of Superior Temporal Gyrus. Front Neurosci 2021; 15:675326. [PMID: 34366772 PMCID: PMC8339305 DOI: 10.3389/fnins.2021.675326] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 05/13/2021] [Indexed: 01/20/2023] Open
Abstract
Suppressing unwanted background sound is crucial for aural communication. A particularly disruptive type of background sound, informational masking (IM), often interferes in social settings. However, IM mechanisms are incompletely understood. At present, IM is identified operationally: when a target should be audible, based on suprathreshold target/masker energy ratios, yet cannot be heard because target-like background sound interferes. We here confirm that speech identification thresholds differ dramatically between low- vs. high-IM background sound. However, speech detection thresholds are comparable across the two conditions. Moreover, functional near infrared spectroscopy recordings show that task-evoked blood oxygenation changes near the superior temporal gyrus (STG) covary with behavioral speech detection performance for high-IM but not low-IM background sound, suggesting that the STG is part of an IM-dependent network. Moreover, listeners who are more vulnerable to IM show increased hemodynamic recruitment near STG, an effect that cannot be explained based on differences in task difficulty across low- vs. high-IM. In contrast, task-evoked responses near another auditory region of cortex, the caudal inferior frontal sulcus (cIFS), do not predict behavioral sensitivity, suggesting that the cIFS belongs to an IM-independent network. Results are consistent with the idea that cortical gating shapes individual vulnerability to IM.
Collapse
Affiliation(s)
- Min Zhang
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, United States
- Rutgers Biomedical and Health Sciences, Rutgers University, Newark, NJ, United States
| | - Nima Alamatsaz
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, United States
- Rutgers Biomedical and Health Sciences, Rutgers University, Newark, NJ, United States
| | - Antje Ihlefeld
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, United States
| |
Collapse
|
138
|
Cai S, Li P, Su E, Xie L. Auditory Attention Detection via Cross-Modal Attention. Front Neurosci 2021; 15:652058. [PMID: 34366770 PMCID: PMC8333999 DOI: 10.3389/fnins.2021.652058] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/24/2021] [Indexed: 11/13/2022] Open
Abstract
Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.
Collapse
Affiliation(s)
| | | | | | - Longhan Xie
- Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
139
|
McAuley JD, Shen Y, Smith T, Kidd GR. Effects of speech-rhythm disruption on selective listening with a single background talker. Atten Percept Psychophys 2021; 83:2229-2240. [PMID: 33782913 PMCID: PMC10612531 DOI: 10.3758/s13414-021-02298-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2021] [Indexed: 11/08/2022]
Abstract
Recent work by McAuley et al. (Attention, Perception, & Psychophysics, 82, 3222-3233, 2020) using the Coordinate Response Measure (CRM) paradigm with a multitalker background revealed that altering the natural rhythm of target speech amidst background speech worsens target recognition (a target-rhythm effect), while altering background speech rhythm improves target recognition (a background-rhythm effect). Here, we used a single-talker background to examine the role of specific properties of target and background sound patterns on selective listening without the complexity of multiple background stimuli. Experiment 1 manipulated the sex of the background talker, presented with a male target talker, to assess target and background-rhythm effects with and without a strong pitch cue to aid perceptual segregation. Experiment 2 used a vocoded single-talker background to examine target and background-rhythm effects with envelope-based speech rhythms preserved, but without semantic content or temporal fine structure. While a target-rhythm effect was present with all backgrounds, the background-rhythm effect was only observed for the same-sex background condition. Results provide additional support for a selective entrainment hypothesis, while also showing that the background-rhythm effect is not driven by envelope-based speech rhythm alone, and may be reduced or eliminated when pitch or other acoustic differences provide a strong basis for selective listening.
Collapse
Affiliation(s)
- J Devin McAuley
- Department of Psychology, Michigan State University, East Lansing, MI, 48824, USA.
| | - Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Toni Smith
- Department of Psychology, Michigan State University, East Lansing, MI, 48824, USA
| | - Gary R Kidd
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
| |
Collapse
|
140
|
Lunner T, Alickovic E, Graversen C, Ng EHN, Wendt D, Keidser G. Three New Outcome Measures That Tap Into Cognitive Processes Required for Real-Life Communication. Ear Hear 2021; 41 Suppl 1:39S-47S. [PMID: 33105258 PMCID: PMC7676869 DOI: 10.1097/aud.0000000000000941] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 07/11/2020] [Indexed: 11/29/2022]
Abstract
To increase the ecological validity of outcomes from laboratory evaluations of hearing and hearing devices, it is desirable to introduce more realistic outcome measures in the laboratory. This article presents and discusses three outcome measures that have been designed to go beyond traditional speech-in-noise measures to better reflect realistic everyday challenges. The outcome measures reviewed are: the Sentence-final Word Identification and Recall (SWIR) test that measures working memory performance while listening to speech in noise at ceiling performance; a neural tracking method that produces a quantitative measure of selective speech attention in noise; and pupillometry that measures changes in pupil dilation to assess listening effort while listening to speech in noise. According to evaluation data, the SWIR test provides a sensitive measure in situations where speech perception performance might be unaffected. Similarly, pupil dilation has also shown sensitivity in situations where traditional speech-in-noise measures are insensitive. Changes in working memory capacity and effort mobilization were found at positive signal-to-noise ratios (SNR), that is, at SNRs that might reflect everyday situations. Using stimulus reconstruction, it has been demonstrated that neural tracking is a robust method at determining to what degree a listener is attending to a specific talker in a typical cocktail party situation. Using both established and commercially available noise reduction schemes, data have further shown that all three measures are sensitive to variation in SNR. In summary, the new outcome measures seem suitable for testing hearing and hearing devices under more realistic and demanding everyday conditions than traditional speech-in-noise tests.
Collapse
Affiliation(s)
- Thomas Lunner
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
- Department of Electrical Engineering, Division Automatic Control, Linköping University, Linköping, Sweden
- Department of Health Technology, Hearing Systems, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Electrical Engineering, Division Automatic Control, Linköping University, Linköping, Sweden
| | | | - Elaine Hoi Ning Ng
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
- Oticon A/S, Kongebakken, Denmark
| | - Dorothea Wendt
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Health Technology, Hearing Systems, Technical University of Denmark, Lyngby, Denmark
| | - Gitte Keidser
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
| |
Collapse
|
141
|
Keshavarzi M, Varano E, Reichenbach T. Cortical Tracking of a Background Speaker Modulates the Comprehension of a Foreground Speech Signal. J Neurosci 2021; 41:5093-5101. [PMID: 33926996 PMCID: PMC8197648 DOI: 10.1523/jneurosci.3200-20.2021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/23/2021] [Accepted: 04/12/2021] [Indexed: 11/21/2022] Open
Abstract
Understanding speech in background noise is a difficult task. The tracking of speech rhythms such as the rate of syllables and words by cortical activity has emerged as a key neural mechanism for speech-in-noise comprehension. In particular, recent investigations have used transcranial alternating current stimulation (tACS) with the envelope of a speech signal to influence the cortical speech tracking, demonstrating that this type of stimulation modulates comprehension and therefore providing evidence of a functional role of the cortical tracking in speech processing. Cortical activity has been found to track the rhythms of a background speaker as well, but the functional significance of this neural response remains unclear. Here we use a speech-comprehension task with a target speaker in the presence of a distractor voice to show that tACS with the speech envelope of the target voice as well as tACS with the envelope of the distractor speaker both modulate the comprehension of the target speech. Because the envelope of the distractor speech does not carry information about the target speech stream, the modulation of speech comprehension through tACS with this envelope provides evidence that the cortical tracking of the background speaker affects the comprehension of the foreground speech signal. The phase dependency of the resulting modulation of speech comprehension is, however, opposite to that obtained from tACS with the envelope of the target speech signal. This suggests that the cortical tracking of the ignored speech stream and that of the attended speech stream may compete for neural resources.SIGNIFICANCE STATEMENT Loud environments such as busy pubs or restaurants can make conversation difficult. However, they also allow us to eavesdrop into other conversations that occur in the background. In particular, we often notice when somebody else mentions our name, even if we have not been listening to that person. However, the neural mechanisms by which background speech is processed remain poorly understood. Here we use transcranial alternating current stimulation, a technique through which neural activity in the cerebral cortex can be influenced, to show that cortical responses to rhythms in the distractor speech modulate the comprehension of the target speaker. Our results provide evidence that the cortical tracking of background speech rhythms plays a functional role in speech processing.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Enrico Varano
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| |
Collapse
|
142
|
Liu L, Zhang Y, Zhou Q, Garrett DD, Lu C, Chen A, Qiu J, Ding G. Auditory-Articulatory Neural Alignment between Listener and Speaker during Verbal Communication. Cereb Cortex 2021; 30:942-951. [PMID: 31318013 DOI: 10.1093/cercor/bhz138] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 05/20/2019] [Accepted: 05/25/2019] [Indexed: 11/13/2022] Open
Abstract
Whether auditory processing of speech relies on reference to the articulatory motor information of speaker remains elusive. Here, we addressed this issue under a two-brain framework. Functional magnetic resonance imaging was applied to record the brain activities of speakers when telling real-life stories and later of listeners when listening to the audio recordings of these stories. Based on between-brain seed-to-voxel correlation analyses, we revealed that neural dynamics in listeners' auditory temporal cortex are temporally coupled with the dynamics in the speaker's larynx/phonation area. Moreover, the coupling response in listener's left auditory temporal cortex follows the hierarchical organization for speech processing, with response lags in A1+, STG/STS, and MTG increasing linearly. Further, listeners showing greater coupling responses understand the speech better. When comprehension fails, such interbrain auditory-articulation coupling vanishes substantially. These findings suggest that a listener's auditory system and a speaker's articulatory system are inherently aligned during naturalistic verbal interaction, and such alignment is associated with high-level information transfer from the speaker to the listener. Our study provides reliable evidence supporting that references to the articulatory motor information of speaker facilitate speech comprehension under a naturalistic scene.
Collapse
Affiliation(s)
- Lanfang Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China.,Department of Psychology, Sun Yat-sen University, Guangzhou 510006, People's Republic of China
| | - Yuxuan Zhang
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China
| | - Qi Zhou
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China
| | - Douglas D Garrett
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Max Planck Institute for Human Development, Lentzeallee 94, Berlin 14195, Germany
| | - Chunming Lu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China
| | - Antao Chen
- Key Laboratory of Cognition and Personality (SWU), Ministry of Education & Department of Psychology, Southwest University, Chongqing 400715, People's Republic of China
| | - Jiang Qiu
- Key Laboratory of Cognition and Personality (SWU), Ministry of Education & Department of Psychology, Southwest University, Chongqing 400715, People's Republic of China
| | - Guosheng Ding
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China
| |
Collapse
|
143
|
Bröhl F, Kayser C. Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 2021; 233:117958. [PMID: 33744458 PMCID: PMC8204264 DOI: 10.1016/j.neuroimage.2021.117958] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/01/2022] Open
Abstract
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed 'speech tracking') many studies consider the broadband speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this broadband envelope can provide a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech-derived envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency speech-derived envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and may reflect the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the broadband speech envelope.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
| |
Collapse
|
144
|
Har-shai Yahav P, Zion Golumbic E. Linguistic processing of task-irrelevant speech at a cocktail party. eLife 2021; 10:e65096. [PMID: 33942722 PMCID: PMC8163500 DOI: 10.7554/elife.65096] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 04/26/2021] [Indexed: 01/05/2023] Open
Abstract
Paying attention to one speaker in a noisy place can be extremely difficult, because to-be-attended and task-irrelevant speech compete for processing resources. We tested whether this competition is restricted to acoustic-phonetic interference or if it extends to competition for linguistic processing as well. Neural activity was recorded using Magnetoencephalography as human participants were instructed to attend to natural speech presented to one ear, and task-irrelevant stimuli were presented to the other. Task-irrelevant stimuli consisted either of random sequences of syllables, or syllables structured to form coherent sentences, using hierarchical frequency-tagging. We find that the phrasal structure of structured task-irrelevant stimuli was represented in the neural response in left inferior frontal and posterior parietal regions, indicating that selective attention does not fully eliminate linguistic processing of task-irrelevant speech. Additionally, neural tracking of to-be-attended speech in left inferior frontal regions was enhanced when competing with structured task-irrelevant stimuli, suggesting inherent competition between them for linguistic processing.
Collapse
Affiliation(s)
- Paz Har-shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan UniversityRamat GanIsrael
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan UniversityRamat GanIsrael
| |
Collapse
|
145
|
Belo J, Clerc M, Schön D. EEG-Based Auditory Attention Detection and Its Possible Future Applications for Passive BCI. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.661178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The ability to discriminate and attend one specific sound source in a complex auditory environment is a fundamental skill for efficient communication. Indeed, it allows us to follow a family conversation or discuss with a friend in a bar. This ability is challenged in hearing-impaired individuals and more precisely in those with a cochlear implant (CI). Indeed, due to the limited spectral resolution of the implant, auditory perception remains quite poor in a noisy environment or in presence of simultaneous auditory sources. Recent methodological advances allow now to detect, on the basis of neural signals, which auditory stream within a set of multiple concurrent streams an individual is attending to. This approach, called EEG-based auditory attention detection (AAD), is based on fundamental research findings demonstrating that, in a multi speech scenario, cortical tracking of the envelope of the attended speech is enhanced compared to the unattended speech. Following these findings, other studies showed that it is possible to use EEG/MEG (Electroencephalography/Magnetoencephalography) to explore auditory attention during speech listening in a Cocktail-party-like scenario. Overall, these findings make it possible to conceive next-generation hearing aids combining customary technology and AAD. Importantly, AAD has also a great potential in the context of passive BCI, in the educational context as well as in the context of interactive music performances. In this mini review, we firstly present the different approaches of AAD and the main limitations of the global concept. We then expose its potential applications in the world of non-clinical passive BCI.
Collapse
|
146
|
Vandecappelle S, Deckers L, Das N, Ansari AH, Bertrand A, Francart T. EEG-based detection of the locus of auditory attention with convolutional neural networks. eLife 2021; 10:e56481. [PMID: 33929315 PMCID: PMC8143791 DOI: 10.7554/elife.56481] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/28/2021] [Indexed: 01/16/2023] Open
Abstract
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Collapse
Affiliation(s)
- Servaas Vandecappelle
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Lucas Deckers
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Neetha Das
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Amir Hossein Ansari
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data AnalyticsLeuvenBelgium
| | - Tom Francart
- Department of Neurosciences, Experimental Oto-rhino-laryngologyLeuvenBelgium
| |
Collapse
|
147
|
Vandecappelle S, Deckers L, Das N, Ansari AH, Bertrand A, Francart T. EEG-based detection of the locus of auditory attention with convolutional neural networks. eLife 2021; 10:56481. [PMID: 33929315 DOI: 10.1101/475673] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 04/28/2021] [Indexed: 05/27/2023] Open
Abstract
In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Collapse
Affiliation(s)
- Servaas Vandecappelle
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Lucas Deckers
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Neetha Das
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Amir Hossein Ansari
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, Experimental Oto-rhino-laryngology, Leuven, Belgium
| |
Collapse
|
148
|
Xu C, Zou J, He F, Wen X, Li J, Gao J, Ding N, Luo B. Neural Tracking of Sound Rhythms Correlates With Diagnosis, Severity, and Prognosis of Disorders of Consciousness. Front Neurosci 2021; 15:646543. [PMID: 33994924 PMCID: PMC8113690 DOI: 10.3389/fnins.2021.646543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Accepted: 03/19/2021] [Indexed: 12/03/2022] Open
Abstract
Effective diagnosis and prognosis of patients with disorders of consciousness (DOC) provides a basis for family counseling, decision-making, and the design of rehabilitation programs. However, effective and objective bedside evaluation is a challenging problem. In this study, we explored electroencephalography (EEG) response tracking sound rhythms as potential neural markers for DOC evaluation. We analyzed the responses to natural speech and tones modulated at 2 and 41 Hz. At the population level, patients with positive outcomes (DOC-P) showed higher cortical synchronization to modulated tones at 41 Hz compared with patients with negative outcomes (DOC-N). At the individual level, phase coherence to modulated tones at 41 Hz was significantly correlated with Coma Recovery Scale-Revised (CRS-R) and Glasgow Outcome Scale-Extended (GOS-E) scores. Furthermore, SVM classifiers, trained using phase coherences in higher frequency bands or combination of the low frequency aSSR and speech tracking responses, performed very well in diagnosis and prognosis of DOC. These findings show that EEG response to auditory rhythms is a potential tool for diagnosis, severity, and prognosis of DOC.
Collapse
Affiliation(s)
- Chuan Xu
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China.,Research Center for Advanced Artificial Intelligence Theory Zhejiang Lab, Hangzhou, China
| | - Fangping He
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Xinrui Wen
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Jingqi Li
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou, China
| | - Jian Gao
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China.,Research Center for Advanced Artificial Intelligence Theory Zhejiang Lab, Hangzhou, China
| | - Benyan Luo
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| |
Collapse
|
149
|
Geirnaert S, Francart T, Bertrand A. Fast EEG-Based Decoding Of The Directional Focus Of Auditory Attention Using Common Spatial Patterns. IEEE Trans Biomed Eng 2021; 68:1557-1568. [PMID: 33095706 DOI: 10.1109/tbme.2020.3033446] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE Noise reduction algorithms in current hearing devices lack informationabout the sound source a user attends to when multiple sources are present. To resolve this issue, they can be complemented with auditory attention decoding (AAD) algorithms, which decode the attention using electroencephalography (EEG) sensors. State-of-the-art AAD algorithms employ a stimulus reconstruction approach, in which the envelope of the attended source is reconstructed from the EEG and correlated with the envelopes of the individual sources. This approach, however, performs poorly on short signal segments, whilelonger segments yield impractically long detection delays when the user switches attention. METHODS We propose decoding the directional focus of attention using filterbank common spatial pattern filters (FB-CSP) as an alternative AAD paradigm, whichdoes not require access to the clean source envelopes. RESULTS The proposed FB-CSP approach outperforms both the stimulus reconstruction approach on short signal segments, as well as a convolutional neural network approach on the same task. We achieve a high accuracy (80% for [Formula: see text] windows and 70% for quasi-instantaneous decisions), which is sufficient to reach minimal expected switch durations below [Formula: see text]. We also demonstrate that the decoder can adapt to unlabeled data from anunseen subject and works with only a subset of EEG channels located around the ear to emulate a wearable EEG setup. CONCLUSION The proposed FB-CSP method provides fast and accurate decoding of the directional focus of auditory attention. SIGNIFICANCE The high accuracy on very short data segments is a major step forward towards practical neuro-steered hearing devices.
Collapse
|
150
|
Mesik J, Ray L, Wojtczak M. Effects of Age on Cortical Tracking of Word-Level Features of Continuous Competing Speech. Front Neurosci 2021; 15:635126. [PMID: 33867920 PMCID: PMC8047075 DOI: 10.3389/fnins.2021.635126] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 03/12/2021] [Indexed: 01/17/2023] Open
Abstract
Speech-in-noise comprehension difficulties are common among the elderly population, yet traditional objective measures of speech perception are largely insensitive to this deficit, particularly in the absence of clinical hearing loss. In recent years, a growing body of research in young normal-hearing adults has demonstrated that high-level features related to speech semantics and lexical predictability elicit strong centro-parietal negativity in the EEG signal around 400 ms following the word onset. Here we investigate effects of age on cortical tracking of these word-level features within a two-talker speech mixture, and their relationship with self-reported difficulties with speech-in-noise understanding. While undergoing EEG recordings, younger and older adult participants listened to a continuous narrative story in the presence of a distractor story. We then utilized forward encoding models to estimate cortical tracking of four speech features: (1) word onsets, (2) "semantic" dissimilarity of each word relative to the preceding context, (3) lexical surprisal for each word, and (4) overall word audibility. Our results revealed robust tracking of all features for attended speech, with surprisal and word audibility showing significantly stronger contributions to neural activity than dissimilarity. Additionally, older adults exhibited significantly stronger tracking of word-level features than younger adults, especially over frontal electrode sites, potentially reflecting increased listening effort. Finally, neuro-behavioral analyses revealed trends of a negative relationship between subjective speech-in-noise perception difficulties and the model goodness-of-fit for attended speech, as well as a positive relationship between task performance and the goodness-of-fit, indicating behavioral relevance of these measures. Together, our results demonstrate the utility of modeling cortical responses to multi-talker speech using complex, word-level features and the potential for their use to study changes in speech processing due to aging and hearing loss.
Collapse
Affiliation(s)
- Juraj Mesik
- Department of Psychology, University of Minnesota, Minneapolis, MN, United States
| | | | | |
Collapse
|