1
|
Luo C, Ding N. Cortical encoding of hierarchical linguistic information when syllabic rhythms are obscured by echoes. Neuroimage 2024; 300:120875. [PMID: 39341475 DOI: 10.1016/j.neuroimage.2024.120875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 09/24/2024] [Accepted: 09/26/2024] [Indexed: 10/01/2024] Open
Abstract
In speech perception, low-frequency cortical activity tracks hierarchical linguistic units (e.g., syllables, phrases, and sentences) on top of acoustic features (e.g., speech envelope). Since the fluctuation of speech envelope typically corresponds to the syllabic boundaries, one common interpretation is that the acoustic envelope underlies the extraction of discrete syllables from continuous speech for subsequent linguistic processing. However, it remains unclear whether and how cortical activity encodes linguistic information when the speech envelope does not provide acoustic correlates of syllables. To address the issue, we introduced a frequency-tagging speech stream where the syllabic rhythm was obscured by echoic envelopes and investigated neural encoding of hierarchical linguistic information using electroencephalography (EEG). When listeners attended to the echoic speech, cortical activity showed reliable tracking of syllable, phrase, and sentence levels, among which the higher-level linguistic units elicited more robust neural responses. When attention was diverted from the echoic speech, reliable neural tracking of the syllable level was also observed in contrast to deteriorated neural tracking of the phrase and sentence levels. Further analyses revealed that the envelope aligned with the syllabic rhythm could be recovered from the echoic speech through a neural adaptation model, and the reconstructed envelope yielded higher predictive power for the neural tracking responses than either the original echoic envelope or anechoic envelope. Taken together, these results suggest that neural adaptation and attentional modulation jointly contribute to neural encoding of linguistic information in distorted speech where the syllabic rhythm is obscured by echoes.
Collapse
Affiliation(s)
- Cheng Luo
- Zhejiang Lab, Hangzhou 311121, China.
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China; The State Key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
2
|
Kasten FH, Busson Q, Zoefel B. Opposing neural processing modes alternate rhythmically during sustained auditory attention. Commun Biol 2024; 7:1125. [PMID: 39266696 PMCID: PMC11393317 DOI: 10.1038/s42003-024-06834-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 09/03/2024] [Indexed: 09/14/2024] Open
Abstract
During continuous tasks, humans show spontaneous fluctuations in performance, putatively caused by varying attentional resources allocated to process external information. If neural resources are used to process other, presumably "internal" information, sensory input can be missed and explain an apparent dichotomy of "internal" versus "external" attention. In the current study, we extract presumed neural signatures of these attentional modes in human electroencephalography (EEG): neural entrainment and α-oscillations (~10-Hz), linked to the processing and suppression of sensory information, respectively. We test whether they exhibit structured fluctuations over time, while listeners attend to an ecologically relevant stimulus, like speech, and complete a task that requires full and continuous attention. Results show an antagonistic relation between neural entrainment to speech and spontaneous α-oscillations in two distinct brain networks-one specialized in the processing of external information, the other reminiscent of the dorsal attention network. These opposing neural modes undergo slow, periodic fluctuations around ~0.07 Hz and are related to the detection of auditory targets. Our study might have tapped into a general attentional mechanism that is conserved across species and has important implications for situations in which sustained attention to sensory information is critical.
Collapse
Affiliation(s)
- Florian H Kasten
- Department for Cognitive, Affective, Behavioral Neuroscience with Focus Neurostimulation, Institute of Psychology, University of Trier, Trier, Germany.
- Centre de Recherche Cerveau & Cognition, CNRS, Toulouse, France.
- Université Toulouse III Paul Sabatier, Toulouse, France.
| | | | - Benedikt Zoefel
- Centre de Recherche Cerveau & Cognition, CNRS, Toulouse, France.
- Université Toulouse III Paul Sabatier, Toulouse, France.
| |
Collapse
|
3
|
Burgoyne AP, Seeburger DT, Engle RW. Modality matters: Three auditory conflict tasks to measure individual differences in attention control. Behav Res Methods 2024; 56:5959-5985. [PMID: 38366119 DOI: 10.3758/s13428-023-02328-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2023] [Indexed: 02/18/2024]
Abstract
Early work on selective attention used auditory-based tasks, such as dichotic listening, to shed light on capacity limitations and individual differences in these limitations. Today, there is great interest in individual differences in attentional abilities, but the field has shifted towards visual-modality tasks. Furthermore, most conflict-based tests of attention control lack reliability due to low signal-to-noise ratios and the use of difference scores. Critically, it is unclear to what extent attention control generalizes across sensory modalities, and without reliable auditory-based tests, an answer to this question will remain elusive. To this end, we developed three auditory-based tests of attention control that use an adaptive response deadline (DL) to account for speed-accuracy trade-offs: Auditory Simon DL, Auditory Flanker DL, and Auditory Stroop DL. In a large sample (N = 316), we investigated the psychometric properties of the three auditory conflict tasks, tested whether attention control is better modeled as a unitary factor or modality-specific factors, and estimated the extent to which unique variance in modality-specific factors contributed incrementally to the prediction of dichotic listening and multitasking performance. Our analyses indicated that the auditory conflict tasks have strong psychometric properties and demonstrate convergent validity with visual tests of attention control. Auditory and visual attention control factors were highly correlated (r = .81)-even after controlling for perceptual processing speed (r = .75). Modality-specific attention control factors accounted for unique variance in modality-matched criterion measures, but the majority of the explained variance was modality-general. The results suggest an interplay between modality-general attention control and modality-specific processing.
Collapse
|
4
|
Xiao X, Ding J, Yu M, Dong Z, Cruz S, Ding N, Aubinet C, Laureys S, Di H, Chen Y. Exploring the clinical diagnostic value of linguistic learning ability in patients with disorders of consciousness using electrooculography. Neuroimage 2024; 297:120753. [PMID: 39053636 DOI: 10.1016/j.neuroimage.2024.120753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 07/15/2024] [Accepted: 07/23/2024] [Indexed: 07/27/2024] Open
Abstract
For patients with disorders of consciousness (DoC), accurate assessment of residual consciousness levels and cognitive abilities is critical for developing appropriate rehabilitation interventions. In this study, we investigated the potential of electrooculography (EOG) in assessing language processing abilities and consciousness levels. Patients' EOG data and related electrophysiological data were analysed before and after explicit language learning. The results showed distinct differences in vocabulary learning patterns among patients with varying levels of consciousness. While minimally conscious patients showed significant neural tracking of artificial words and notable learning effects similar to those observed in healthy controls, whereas patients with unresponsive wakefulness syndrome did not show such effects. Correlation analysis further indicated that EOG detected vocabulary learning effects with comparable validity to electroencephalography, reinforcing the credibility of EOG indicator as a diagnostic tool. Critically, EOG also revealed significant correlations between individual patients' linguistic learning performance and their Oromotor/verbal function as assessed through behavioural scales. In conclusion, this study explored the differences in language processing abilities among patients with varying consciousness levels. By demonstrating the utility of EOG in evaluating consciousness and detecting vocabulary learning effects, as well as its potential to guide personalised rehabilitation, our findings indicate that EOG indicators show promise as a rapid, accurate and effective additional tool for diagnosing and managing patients with DoC.
Collapse
Affiliation(s)
- Xiangyue Xiao
- International Unresponsive Wakefulness Syndrome and Consciousness Science Institute, Hangzhou Normal University, Hangzhou 311121, China; Key Laboratory of Ageing and Cancer Biology of Zhejiang Province, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou 311121, China
| | - Junhua Ding
- Department of Psychology, University of Edinburgh, Edinburgh EH8 9YL, UK
| | - Mingyan Yu
- International Unresponsive Wakefulness Syndrome and Consciousness Science Institute, Hangzhou Normal University, Hangzhou 311121, China; Key Laboratory of Ageing and Cancer Biology of Zhejiang Province, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou 311121, China
| | - Zhicai Dong
- International Unresponsive Wakefulness Syndrome and Consciousness Science Institute, Hangzhou Normal University, Hangzhou 311121, China; Key Laboratory of Ageing and Cancer Biology of Zhejiang Province, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou 311121, China
| | - Sara Cruz
- The Psychology for Development Research Centre, Lusiada University Porto, Porto 4100-348, Portugal
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Charlène Aubinet
- Coma Science Group, GIGA Consciousness & Centre du Cerveau, University and University Hospital of Liège, Liège 4000, Belgium; Psychology & Neuroscience of Cognition Research Unit, University of Liège, Liège 4000, Belgium
| | - Steven Laureys
- Coma Science Group, GIGA Consciousness & Centre du Cerveau, University and University Hospital of Liège, Liège 4000, Belgium
| | - Haibo Di
- International Unresponsive Wakefulness Syndrome and Consciousness Science Institute, Hangzhou Normal University, Hangzhou 311121, China; Key Laboratory of Ageing and Cancer Biology of Zhejiang Province, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou 311121, China.
| | - Yan Chen
- International Unresponsive Wakefulness Syndrome and Consciousness Science Institute, Hangzhou Normal University, Hangzhou 311121, China; Key Laboratory of Ageing and Cancer Biology of Zhejiang Province, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou 311121, China.
| |
Collapse
|
5
|
Iverson P, Song J. Neural Tracking of Speech Acoustics in Noise Is Coupled with Lexical Predictability as Estimated by Large Language Models. eNeuro 2024; 11:ENEURO.0507-23.2024. [PMID: 39095091 PMCID: PMC11335968 DOI: 10.1523/eneuro.0507-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 07/15/2024] [Accepted: 07/15/2024] [Indexed: 08/04/2024] Open
Abstract
Adults heard recordings of two spatially separated speakers reading newspaper and magazine articles. They were asked to listen to one of them and ignore the other, and EEG was recorded to assess their neural processing. Machine learning extracted neural sources that tracked the target and distractor speakers at three levels: the acoustic envelope of speech (delta- and theta-band modulations), lexical frequency for individual words, and the contextual predictability of individual words estimated by GPT-4 and earlier lexical models. To provide a broader view of speech perception, half of the subjects completed a simultaneous visual task, and the listeners included both native and non-native English speakers. Distinct neural components were extracted for these levels of auditory and lexical processing, demonstrating that native English speakers had greater target-distractor separation compared with non-native English speakers on most measures, and that lexical processing was reduced by the visual task. Moreover, there was a novel interaction of lexical predictability and frequency with auditory processing; acoustic tracking was stronger for lexically harder words, suggesting that people listened harder to the acoustics when needed for lexical selection. This demonstrates that speech perception is not simply a feedforward process from acoustic processing to the lexicon. Rather, the adaptable context-sensitive processing long known to occur at a lexical level has broader consequences for perception, coupling with the acoustic tracking of individual speakers in noise.
Collapse
Affiliation(s)
- Paul Iverson
- Department of Speech, Hearing and Phonetic Sciences, University College London, London WC1N 1PF, United Kingdom
| | - Jieun Song
- School of Digital Humanities and Computational Social Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| |
Collapse
|
6
|
Kapatsinski V, Bramlett AA, Idemaru K. What do you learn from a single cue? Dimensional reweighting and cue reassociation from experience with a newly unreliable phonetic cue. Cognition 2024; 249:105818. [PMID: 38772253 DOI: 10.1016/j.cognition.2024.105818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024]
Abstract
In language comprehension, we use perceptual cues to infer meanings. Some of these cues reside on perceptual dimensions. For example, the difference between bear and pear is cued by a difference in voice onset time (VOT), which is a continuous perceptual dimension. The present paper asks whether, and when, experience with a single value on a dimension behaving unexpectedly is used by the learner to reweight the whole dimension. We show that learners reweight the whole VOT dimension when exposed to a single VOT value (e.g., 45 ms) and provided with feedback indicating that the speaker intended to produce a /b/ 50% of the time and a /p/ the other 50% of the time. Importantly, dimensional reweighting occurs only if 1) the 50/50 feedback is unexpected for the VOT value, and 2) there is another dimension that is predictive of feedback. When no predictive dimension is available, listeners reassociate the experienced VOT value with the more surprising outcome but do not downweight the entire VOT dimension. These results provide support for perceptual representations of speech sounds that combine cues and dimensions, for viewing perceptual learning in speech as a combination of error-driven cue reassociation and dimensional reweighting, and for considering dimensional reweighting to be reallocation of attention that occurs only when there is evidence that reallocating attention would improve prediction accuracy (Harmon, Z., Idemaru, K., & Kapatsinski, V. 2019. Learning mechanisms in cue reweighting. Cognition, 189, 76-88.).
Collapse
Affiliation(s)
- Vsevolod Kapatsinski
- University of Oregon, Department of Linguistics, 161 Straub Hall, University of Oregon, Eugene, OR 97403-1290, United States of America.
| | - Adam A Bramlett
- Carnegie-Mellon University, Department of Modern Languages, 341 Posner Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213, United States of America.
| | - Kaori Idemaru
- University of Oregon, Department of East Asian Languages and Literatures, 114 Friendly Hall University of Oregon, Eugene, OR 97403-1248, United States of America.
| |
Collapse
|
7
|
Xu S, Zhang H, Fan J, Jiang X, Zhang M, Guan J, Ding H, Zhang Y. Auditory Challenges and Listening Effort in School-Age Children With Autism: Insights From Pupillary Dynamics During Speech-in-Noise Perception. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:2410-2453. [PMID: 38861391 DOI: 10.1044/2024_jslhr-23-00553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2024]
Abstract
PURPOSE This study aimed to investigate challenges in speech-in-noise (SiN) processing faced by school-age children with autism spectrum conditions (ASCs) and their impact on listening effort. METHOD Participants, including 23 Mandarin-speaking children with ASCs and 19 age-matched neurotypical (NT) peers, underwent sentence recognition tests in both quiet and noisy conditions, with a speech-shaped steady-state noise masker presented at 0-dB signal-to-noise ratio in the noisy condition. Recognition accuracy rates and task-evoked pupil responses were compared to assess behavioral performance and listening effort during auditory tasks. RESULTS No main effect of group was found on accuracy rates. Instead, significant effects emerged for autistic trait scores, listening conditions, and their interaction, indicating that higher trait scores were associated with poorer performance in noise. Pupillometric data revealed significantly larger and earlier peak dilations, along with more varied pupillary dynamics in the ASC group relative to the NT group, especially under noisy conditions. Importantly, the ASC group's peak dilation in quiet mirrored that of the NT group in noise. However, the ASC group consistently exhibited reduced mean dilations than the NT group. CONCLUSIONS Pupillary responses suggest a different resource allocation pattern in ASCs: An initial sharper and larger dilation may signal an intense, narrowed resource allocation, likely linked to heightened arousal, engagement, and cognitive load, whereas a subsequent faster tail-off may indicate a greater decrease in resource availability and engagement, or a quicker release of arousal and cognitive load. The presence of noise further accentuates this pattern. This highlights the unique SiN processing challenges children with ASCs may face, underscoring the importance of a nuanced, individual-centric approach for interventions and support.
Collapse
Affiliation(s)
- Suyun Xu
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, China
- National Research Centre for Language and Well-Being, Shanghai, China
| | - Hua Zhang
- Department of Child and Adolescent Psychiatry, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, China
| | - Juan Fan
- Department of Child and Adolescent Psychiatry, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, China
| | - Xiaoming Jiang
- Institute of Linguistics, Shanghai International Studies University, China
| | - Minyue Zhang
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, China
- National Research Centre for Language and Well-Being, Shanghai, China
| | | | - Hongwei Ding
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, China
- National Research Centre for Language and Well-Being, Shanghai, China
| | - Yang Zhang
- Department of Speech-Language-Hearing Sciences and Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis
| |
Collapse
|
8
|
Mizokuchi K, Tanaka T, Sato TG, Shiraki Y. Alpha band modulation caused by selective attention to music enables EEG classification. Cogn Neurodyn 2024; 18:1005-1020. [PMID: 38826648 PMCID: PMC11143110 DOI: 10.1007/s11571-023-09955-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 02/19/2023] [Accepted: 03/08/2023] [Indexed: 06/04/2024] Open
Abstract
Humans are able to pay selective attention to music or speech in the presence of multiple sounds. It has been reported that in the speech domain, selective attention enhances the cross-correlation between the envelope of speech and electroencephalogram (EEG) while also affecting the spatial modulation of the alpha band. However, when multiple music pieces are performed at the same time, it is unclear how selective attention affects neural entrainment and spatial modulation. In this paper, we hypothesized that the entrainment to the attended music differs from that to the unattended music and that spatial modulation in the alpha band occurs in conjunction with attention. We conducted experiments in which we presented musical excerpts to 15 participants, each listening to two excerpts simultaneously but paying attention to one of the two. The results showed that the cross-correlation function between the EEG signal and the envelope of the unattended melody had a more prominent peak than that of the attended melody, contrary to the findings for speech. In addition, the spatial modulation in the alpha band was found with a data-driven approach called the common spatial pattern method. Classification of the EEG signal with a support vector machine identified attended melodies and achieved an accuracy of 100% for 11 of the 15 participants. These results suggest that selective attention to music suppresses entrainment to the melody and that spatial modulation of the alpha band occurs in conjunction with attention. To the best of our knowledge, this is the first report to detect attended music consisting of several types of music notes only with EEG.
Collapse
Affiliation(s)
- Kana Mizokuchi
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Toshihisa Tanaka
- Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Takashi G. Sato
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan
| | - Yoshifumi Shiraki
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan
| |
Collapse
|
9
|
Jensen O. Distractor inhibition by alpha oscillations is controlled by an indirect mechanism governed by goal-relevant information. COMMUNICATIONS PSYCHOLOGY 2024; 2:36. [PMID: 38665356 PMCID: PMC11041682 DOI: 10.1038/s44271-024-00081-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 03/25/2024] [Indexed: 04/28/2024]
Abstract
The role of alpha oscillations (8-13 Hz) in cognition is intensively investigated. While intracranial animal recordings demonstrate that alpha oscillations are associated with decreased neuronal excitability, it is been questioned whether alpha oscillations are under direct control from frontoparietal areas to suppress visual distractors. We here point to a revised mechanism in which alpha oscillations are controlled by an indirect mechanism governed by the load of goal-relevant information - a view compatible with perceptual load theory. We will outline how this framework can be further tested and discuss the consequences for network dynamics and resource allocation in the working brain.
Collapse
Affiliation(s)
- Ole Jensen
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, B152TT UK
| |
Collapse
|
10
|
Brilliant, Yaar-Soffer Y, Herrmann CS, Henkin Y, Kral A. Theta and alpha oscillatory signatures of auditory sensory and cognitive loads during complex listening. Neuroimage 2024; 289:120546. [PMID: 38387743 DOI: 10.1016/j.neuroimage.2024.120546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/07/2024] [Accepted: 02/15/2024] [Indexed: 02/24/2024] Open
Abstract
The neuronal signatures of sensory and cognitive load provide access to brain activities related to complex listening situations. Sensory and cognitive loads are typically reflected in measures like response time (RT) and event-related potentials (ERPs) components. It's, however, strenuous to distinguish the underlying brain processes solely from these measures. In this study, along with RT- and ERP-analysis, we performed time-frequency analysis and source localization of oscillatory activity in participants performing two different auditory tasks with varying degrees of complexity and related them to sensory and cognitive load. We studied neuronal oscillatory activity in both periods before the behavioral response (pre-response) and after it (post-response). Robust oscillatory activities were found in both periods and were differentially affected by sensory and cognitive load. Oscillatory activity under sensory load was characterized by decrease in pre-response (early) theta activity and increased alpha activity. Oscillatory activity under cognitive load was characterized by increased theta activity, mainly in post-response (late) time. Furthermore, source localization revealed specific brain regions responsible for processing these loads, such as temporal and frontal lobe, cingulate cortex and precuneus. The results provide evidence that in complex listening situations, the brain processes sensory and cognitive loads differently. These neural processes have specific oscillatory signatures and are long lasting, extending beyond the behavioral response.
Collapse
Affiliation(s)
- Brilliant
- Department of Experimental Otology, Hannover Medical School, 30625 Hannover, Germany.
| | - Y Yaar-Soffer
- Department of Communication Disorder, Tel Aviv University, 5262657 Tel Aviv, Israel; Hearing, Speech and Language Center, Sheba Medical Center, 5265601 Tel Hashomer, Israel
| | - C S Herrmann
- Experimental Psychology Division, University of Oldenburg, 26111 Oldenburg, Germany
| | - Y Henkin
- Department of Communication Disorder, Tel Aviv University, 5262657 Tel Aviv, Israel; Hearing, Speech and Language Center, Sheba Medical Center, 5265601 Tel Hashomer, Israel
| | - A Kral
- Department of Experimental Otology, Hannover Medical School, 30625 Hannover, Germany
| |
Collapse
|
11
|
Wikman P, Salmela V, Sjöblom E, Leminen M, Laine M, Alho K. Attention to audiovisual speech shapes neural processing through feedback-feedforward loops between different nodes of the speech network. PLoS Biol 2024; 22:e3002534. [PMID: 38466713 PMCID: PMC10957087 DOI: 10.1371/journal.pbio.3002534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 03/21/2024] [Accepted: 01/30/2024] [Indexed: 03/13/2024] Open
Abstract
Selective attention-related top-down modulation plays a significant role in separating relevant speech from irrelevant background speech when vocal attributes separating concurrent speakers are small and continuously evolving. Electrophysiological studies have shown that such top-down modulation enhances neural tracking of attended speech. Yet, the specific cortical regions involved remain unclear due to the limited spatial resolution of most electrophysiological techniques. To overcome such limitations, we collected both electroencephalography (EEG) (high temporal resolution) and functional magnetic resonance imaging (fMRI) (high spatial resolution), while human participants selectively attended to speakers in audiovisual scenes containing overlapping cocktail party speech. To utilise the advantages of the respective techniques, we analysed neural tracking of speech using the EEG data and performed representational dissimilarity-based EEG-fMRI fusion. We observed that attention enhanced neural tracking and modulated EEG correlates throughout the latencies studied. Further, attention-related enhancement of neural tracking fluctuated in predictable temporal profiles. We discuss how such temporal dynamics could arise from a combination of interactions between attention and prediction as well as plastic properties of the auditory cortex. EEG-fMRI fusion revealed attention-related iterative feedforward-feedback loops between hierarchically organised nodes of the ventral auditory object related processing stream. Our findings support models where attention facilitates dynamic neural changes in the auditory cortex, ultimately aiding discrimination of relevant sounds from irrelevant ones while conserving neural resources.
Collapse
Affiliation(s)
- Patrik Wikman
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Viljami Salmela
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Eetu Sjöblom
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
| | - Miika Leminen
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- AI and Analytics Unit, Helsinki University Hospital, Helsinki, Finland
| | - Matti Laine
- Department of Psychology, Åbo Akademi University, Turku, Finland
| | - Kimmo Alho
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
12
|
Ershaid H, Lizarazu M, McLaughlin D, Cooke M, Simantiraki O, Koutsogiannaki M, Lallier M. Contributions of listening effort and intelligibility to cortical tracking of speech in adverse listening conditions. Cortex 2024; 172:54-71. [PMID: 38215511 DOI: 10.1016/j.cortex.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/05/2023] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Cortical tracking of speech is vital for speech segmentation and is linked to speech intelligibility. However, there is no clear consensus as to whether reduced intelligibility leads to a decrease or an increase in cortical speech tracking, warranting further investigation of the factors influencing this relationship. One such factor is listening effort, defined as the cognitive resources necessary for speech comprehension, and reported to have a strong negative correlation with speech intelligibility. Yet, no studies have examined the relationship between speech intelligibility, listening effort, and cortical tracking of speech. The aim of the present study was thus to examine these factors in quiet and distinct adverse listening conditions. Forty-nine normal hearing adults listened to sentences produced casually, presented in quiet and two adverse listening conditions: cafeteria noise and reverberant speech. Electrophysiological responses were registered with electroencephalogram, and listening effort was estimated subjectively using self-reported scores and objectively using pupillometry. Results indicated varying impacts of adverse conditions on intelligibility, listening effort, and cortical tracking of speech, depending on the preservation of the speech temporal envelope. The more distorted envelope in the reverberant condition led to higher listening effort, as reflected in higher subjective scores, increased pupil diameter, and stronger cortical tracking of speech in the delta band. These findings suggest that using measures of listening effort in addition to those of intelligibility is useful for interpreting cortical tracking of speech results. Moreover, reading and phonological skills of participants were positively correlated with listening effort in the cafeteria condition, suggesting a special role of expert language skills in processing speech in this noisy condition. Implications for future research and theories linking atypical cortical tracking of speech and reading disorders are further discussed.
Collapse
Affiliation(s)
- Hadeel Ershaid
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Mikel Lizarazu
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Drew McLaughlin
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Martin Cooke
- Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| | | | | | - Marie Lallier
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain; Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| |
Collapse
|
13
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
14
|
Shih WY, Yu HY, Lee CC, Chou CC, Chen C, Glimcher PW, Wu SW. Electrophysiological population dynamics reveal context dependencies during decision making in human frontal cortex. Nat Commun 2023; 14:7821. [PMID: 38016973 PMCID: PMC10684521 DOI: 10.1038/s41467-023-42092-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 09/28/2023] [Indexed: 11/30/2023] Open
Abstract
Evidence from monkeys and humans suggests that the orbitofrontal cortex (OFC) encodes the subjective value of options under consideration during choice. Data from non-human primates suggests that these value signals are context-dependent, representing subjective value in a way influenced by the decision makers' recent experience. Using electrodes distributed throughout cortical and subcortical structures, human epilepsy patients performed an auction task where they repeatedly reported the subjective values they placed on snack food items. High-gamma activity in many cortical and subcortical sites including the OFC positively correlated with subjective value. Other OFC sites showed signals contextually modulated by the subjective value of previously offered goods-a context dependency predicted by theory but not previously observed in humans. These results suggest that value and value-context signals are simultaneously present but separately represented in human frontal cortical activity.
Collapse
Affiliation(s)
- Wan-Yu Shih
- Institute of Neuroscience, College of Life Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC.
| | - Hsiang-Yu Yu
- College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Department of Epilepsy, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Cheng-Chia Lee
- Department of Epilepsy, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Department of Neurosurgery, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
| | - Chien-Chen Chou
- College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Department of Epilepsy, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Chien Chen
- College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
- Department of Epilepsy, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan, ROC
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC
| | - Paul W Glimcher
- Neuroscience Institute, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Neuroscience and Physiology, NYU Grossman School of Medicine, New York, NY, USA.
| | - Shih-Wei Wu
- Institute of Neuroscience, College of Life Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC.
- Brain Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC.
| |
Collapse
|
15
|
Schüller A, Schilling A, Krauss P, Rampp S, Reichenbach T. Attentional Modulation of the Cortical Contribution to the Frequency-Following Response Evoked by Continuous Speech. J Neurosci 2023; 43:7429-7440. [PMID: 37793908 PMCID: PMC10621774 DOI: 10.1523/jneurosci.1247-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/07/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open
Abstract
Selective attention to one of several competing speakers is required for comprehending a target speaker among other voices and for successful communication with them. It moreover has been found to involve the neural tracking of low-frequency speech rhythms in the auditory cortex. Effects of selective attention have also been found in subcortical neural activities, in particular regarding the frequency-following response related to the fundamental frequency of speech (speech-FFR). Recent investigations have, however, shown that the speech-FFR contains cortical contributions as well. It remains unclear whether these are also modulated by selective attention. Here we used magnetoencephalography to assess the attentional modulation of the cortical contributions to the speech-FFR. We presented both male and female participants with two competing speech signals and analyzed the cortical responses during attentional switching between the two speakers. Our findings revealed robust attentional modulation of the cortical contribution to the speech-FFR: the neural responses were higher when the speaker was attended than when they were ignored. We also found that, regardless of attention, a voice with a lower fundamental frequency elicited a larger cortical contribution to the speech-FFR than a voice with a higher fundamental frequency. Our results show that the attentional modulation of the speech-FFR does not only occur subcortically but extends to the auditory cortex as well.SIGNIFICANCE STATEMENT Understanding speech in noise requires attention to a target speaker. One of the speech features that a listener can use to identify a target voice among others and attend it is the fundamental frequency, together with its higher harmonics. The fundamental frequency arises from the opening and closing of the vocal folds and is tracked by high-frequency neural activity in the auditory brainstem and in the cortex. Previous investigations showed that the subcortical neural tracking is modulated by selective attention. Here we show that attention affects the cortical tracking of the fundamental frequency as well: it is stronger when a particular voice is attended than when it is ignored.
Collapse
Affiliation(s)
- Alina Schüller
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Achim Schilling
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Patrick Krauss
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
- Pattern Recognition Lab, Department Computer Science, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Stefan Rampp
- Department of Neurosurgery, University Hospital Erlangen, 91058 Erlangen, Germany
- Department of Neurosurgery, University Hospital Halle (Saale), 06120 Halle (Saale), Germany
- Department of Neuroradiology, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Tobias Reichenbach
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
16
|
Fischer M, Moscovitch M, Fukuda K, Alain C. Ready for action! When the brain learns, yet memory-biased action does not follow. Neuropsychologia 2023; 189:108660. [PMID: 37604333 DOI: 10.1016/j.neuropsychologia.2023.108660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 05/23/2023] [Accepted: 08/11/2023] [Indexed: 08/23/2023]
Abstract
Does memory prepare us to act? Long-term memory can facilitate signal detection, though the degree of benefit varies and can even be absent. To dissociate between learning and behavioral expression of learning, we used high-density electroencephalography (EEG) to assess memory retrieval and response processing. At learning, participants heard everyday sounds. Half of these sound clips were paired with an above-threshold lateralized tone, such that it was possible to form incidental associations between the sound clip and the location of the tone. Importantly, attention was directed to either the sound clip (Experiment 1) or the tone (Experiment 2). Participants then completed a novel detection task that separated cued retrieval from response processing. At retrieval, we observed a striking brain-behavior dissociation. Learning was observed neurally in both experiments. Behaviorally, however, signal detection was only facilitated in Experiment 2, for which there was an accompanying explicit memory for tone presence. Further, implicit neural memory for tone location correlated with the degree of response preparation, but not response execution. Together, the findings suggest 1) that attention at learning affects memory-biased action and 2) that memory prepared action via both explicit and implicit associative memory, with the latter triggering response preparation.
Collapse
Affiliation(s)
- Manda Fischer
- Department of Psychology, University of Toronto, Toronto, Canada; Department of Psychology, Rotman Research Institute at Baycrest Hospital, Toronto, Canada.
| | - Morris Moscovitch
- Department of Psychology, University of Toronto, Toronto, Canada; Department of Psychology, Rotman Research Institute at Baycrest Hospital, Toronto, Canada.
| | - Keisuke Fukuda
- Department of Psychology, University of Toronto, Toronto, Canada.
| | - Claude Alain
- Department of Psychology, University of Toronto, Toronto, Canada; Department of Psychology, Rotman Research Institute at Baycrest Hospital, Toronto, Canada.
| |
Collapse
|
17
|
Santoyo AE, Gonzales MG, Iqbal ZJ, Backer KC, Balasubramaniam R, Bortfeld H, Shahin AJ. Neurophysiological time course of timbre-induced music-like perception. J Neurophysiol 2023; 130:291-302. [PMID: 37377190 PMCID: PMC10396220 DOI: 10.1152/jn.00042.2023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 06/26/2023] [Accepted: 06/26/2023] [Indexed: 06/29/2023] Open
Abstract
Traditionally, pitch variation in a sound stream has been integral to music identity. We attempt to expand music's definition, by demonstrating that the neural code for musicality is independent of pitch encoding. That is, pitchless sound streams can still induce music-like perception and a neurophysiological hierarchy similar to pitched melodies. Previous work reported that neural processing of sounds with no-pitch, fixed-pitch, and irregular-pitch (melodic) patterns, exhibits a right-lateralized hierarchical shift, with pitchless sounds favorably processed in Heschl's gyrus (HG), ascending laterally to nonprimary auditory areas for fixed-pitch and even more laterally for melodic patterns. The objective of this EEG study was to assess whether sound encoding maintains a similar hierarchical profile when musical perception is driven by timbre irregularities in the absence of pitch changes. Individuals listened to repetitions of three musical and three nonmusical sound-streams. The nonmusical streams were comprised of seven 200-ms segments of white, pink, or brown noise, separated by silent gaps. Musical streams were created similarly, but with all three noise types combined in a unique order within each stream to induce timbre variations and music-like perception. Subjects classified the sound streams as musical or nonmusical. Musical processing exhibited right dominant α power enhancement, followed by a lateralized increase in θ phase-locking and spectral power. The θ phase-locking was stronger in musicians than in nonmusicians. The lateralization of activity suggests higher-level auditory processing. Our findings validate the existence of a hierarchical shift, traditionally observed with pitched-melodic perception, underscoring that musicality can be achieved with timbre irregularities alone.NEW & NOTEWORTHY EEG induced by streams of pitchless noise segments varying in timbre were classified as music-like and exhibited a right-lateralized hierarchy in processing similar to pitched melodic processing. This study provides evidence that the neural-code of musicality is independent of pitch encoding. The results have implications for understanding music processing in individuals with degraded pitch perception, such as in cochlear-implant listeners, as well as the role of nonpitched sounds in the induction of music-like perceptual states.
Collapse
Affiliation(s)
- Alejandra E Santoyo
- Department of Cognitive and Information Sciences, University of California, Merced, California, United States
| | - Mariel G Gonzales
- Department of Cognitive and Information Sciences, University of California, Merced, California, United States
| | - Zunaira J Iqbal
- Department of Cognitive and Information Sciences, University of California, Merced, California, United States
| | - Kristina C Backer
- Department of Cognitive and Information Sciences, University of California, Merced, California, United States
- Health Sciences Research Institute, University of California, Merced, California, United States
| | - Ramesh Balasubramaniam
- Department of Cognitive and Information Sciences, University of California, Merced, California, United States
- Health Sciences Research Institute, University of California, Merced, California, United States
| | - Heather Bortfeld
- Department of Cognitive and Information Sciences, University of California, Merced, California, United States
- Health Sciences Research Institute, University of California, Merced, California, United States
- Department of Psychology, University of California, Merced, California, United States
| | - Antoine J Shahin
- Department of Cognitive and Information Sciences, University of California, Merced, California, United States
- Health Sciences Research Institute, University of California, Merced, California, United States
| |
Collapse
|
18
|
Mohammadi Y, Graversen C, Østergaard J, Andersen OK, Reichenbach T. Phase-locking of Neural Activity to the Envelope of Speech in the Delta Frequency Band Reflects Differences between Word Lists and Sentences. J Cogn Neurosci 2023; 35:1301-1311. [PMID: 37379482 DOI: 10.1162/jocn_a_02016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
The envelope of a speech signal is tracked by neural activity in the cerebral cortex. The cortical tracking occurs mainly in two frequency bands, theta (4-8 Hz) and delta (1-4 Hz). Tracking in the faster theta band has been mostly associated with lower-level acoustic processing, such as the parsing of syllables, whereas the slower tracking in the delta band relates to higher-level linguistic information of words and word sequences. However, much regarding the more specific association between cortical tracking and acoustic as well as linguistic processing remains to be uncovered. Here, we recorded EEG responses to both meaningful sentences and random word lists in different levels of signal-to-noise ratios (SNRs) that lead to different levels of speech comprehension as well as listening effort. We then related the neural signals to the acoustic stimuli by computing the phase-locking value (PLV) between the EEG recordings and the speech envelope. We found that the PLV in the delta band increases with increasing SNR for sentences but not for the random word lists, showing that the PLV in this frequency band reflects linguistic information. When attempting to disentangle the effects of SNR, speech comprehension, and listening effort, we observed a trend that the PLV in the delta band might reflect listening effort rather than the other two variables, although the effect was not statistically significant. In summary, our study shows that the PLV in the delta band reflects linguistic information and might be related to listening effort.
Collapse
|
19
|
Cai S, Li J, Yang H, Li H. RGCnet: An Efficient Recursive Gated Convolutional Network for EEG-based Auditory Attention Detection. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083536 DOI: 10.1109/embc40787.2023.10340432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Humans are able to listen to one speaker and disregard others in a speaking crowd, referred to as the "cocktail party effect". EEG-based auditory attention detection (AAD) seeks to identify whom a listener is listening to by decoding one's EEG signals. Recent research has demonstrated that the self-attention mechanism is effective for AAD. In this paper, we present the Recursive Gated Convolutional network (RGCnet) for AAD, which implements long-range and high-order interactions as a self-attention mechanism, while maintaining a low computational cost. The RGCnet expands the 2nd order feature interactions to a higher order to model the complex interactions between EEG features. We evaluate RGCnet on two public datasets and compare it with other AAD models. Our results demonstrate that RGCnet outperforms other comparative models under various conditions, thus potentially improving the control of neuro-steered hearing devices.
Collapse
|
20
|
Shim H, Gibbs L, Rush K, Ham J, Kim S, Kim S, Choi I. Neural Mechanisms Related to the Enhanced Auditory Selective Attention Following Neurofeedback Training: Focusing on Cortical Oscillations. APPLIED SCIENCES (BASEL, SWITZERLAND) 2023; 13:8499. [PMID: 39449731 PMCID: PMC11500732 DOI: 10.3390/app13148499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
Selective attention can be a useful tactic for speech-in-noise (SiN) interpretation as it strengthens cortical responses to attended sensory inputs while suppressing others. This cortical process is referred to as attentional modulation. Our earlier study showed that a neurofeedback training paradigm was effective for improving the attentional modulation of cortical auditory evoked responses. However, it was unclear how such neurofeedback training improved attentional modulation. This paper attempts to unveil what neural mechanisms underlie strengthened auditory selective attention during the neurofeedback training paradigm. Our EEG time-frequency analysis found that, when spatial auditory attention was focused, a fronto-parietal brain network was activated. Additionally, the neurofeedback training increased beta oscillation, which may imply top-down processing was used to anticipate the sound to be attended selectively with prior information. When the subjects were attending to the sound from the right, they exhibited more alpha oscillation in the right parietal cortex during the final session compared to the first, indicating improved spatial inhibitory processing to suppress sounds from the left. After the four-week training period, the temporal cortex exhibited improved attentional modulation of beta oscillation. This suggests strengthened neural activity to predict the target. Moreover, there was an improvement in the strength of attentional modulation on cortical evoked responses to sounds. The Placebo Group, who experienced similar attention training with the exception that feedback was based simply on behavioral accuracy, did not experience these training effects. These findings demonstrate how neurofeedback training effectively improves the neural mechanisms underlying auditory selective attention.
Collapse
Affiliation(s)
- Hwan Shim
- Department of Electrical and Computer Engineering Technology, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Leah Gibbs
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA 52242, USA
| | - Karsyn Rush
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA 52242, USA
| | - Jusung Ham
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA 52242, USA
| | - Subong Kim
- Department of Communication Sciences and Disorders, Montclair State University, Montclair, NJ 07043, USA
| | - Sungyoung Kim
- Department of Electrical and Computer Engineering Technology, Rochester Institute of Technology, Rochester, NY 14623, USA
- Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Inyong Choi
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA 52242, USA
- Graduate School of Convergence Science and Technology, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
21
|
Orf M, Wöstmann M, Hannemann R, Obleser J. Target enhancement but not distractor suppression in auditory neural tracking during continuous speech. iScience 2023; 26:106849. [PMID: 37305701 PMCID: PMC10251127 DOI: 10.1016/j.isci.2023.106849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/13/2023] [Accepted: 05/05/2023] [Indexed: 06/13/2023] Open
Abstract
Selective attention modulates the neural tracking of speech in auditory cortical regions. It is unclear whether this attentional modulation is dominated by enhanced target tracking, or suppression of distraction. To settle this long-standing debate, we employed an augmented electroencephalography (EEG) speech-tracking paradigm with target, distractor, and neutral streams. Concurrent target speech and distractor (i.e., sometimes relevant) speech were juxtaposed with a third, never task-relevant speech stream serving as neutral baseline. Listeners had to detect short target repeats and committed more false alarms originating from the distractor than from the neutral stream. Speech tracking revealed target enhancement but no distractor suppression below the neutral baseline. Speech tracking of the target (not distractor or neutral speech) explained single-trial accuracy in repeat detection. In sum, the enhanced neural representation of target speech is specific to processes of attentional gain for behaviorally relevant target speech rather than neural suppression of distraction.
Collapse
Affiliation(s)
- Martin Orf
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| | - Malte Wöstmann
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| | | | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| |
Collapse
|
22
|
Schubert J, Schmidt F, Gehmacher Q, Bresgen A, Weisz N. Cortical speech tracking is related to individual prediction tendencies. Cereb Cortex 2023; 33:6608-6619. [PMID: 36617790 PMCID: PMC10233232 DOI: 10.1093/cercor/bhac528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/13/2022] [Accepted: 12/14/2022] [Indexed: 01/10/2023] Open
Abstract
Listening can be conceptualized as a process of active inference, in which the brain forms internal models to integrate auditory information in a complex interaction of bottom-up and top-down processes. We propose that individuals vary in their "prediction tendency" and that this variation contributes to experiential differences in everyday listening situations and shapes the cortical processing of acoustic input such as speech. Here, we presented tone sequences of varying entropy level, to independently quantify auditory prediction tendency (as the tendency to anticipate low-level acoustic features) for each individual. This measure was then used to predict cortical speech tracking in a multi speaker listening task, where participants listened to audiobooks narrated by a target speaker in isolation or interfered by 1 or 2 distractors. Furthermore, semantic violations were introduced into the story, to also examine effects of word surprisal during speech processing. Our results show that cortical speech tracking is related to prediction tendency. In addition, we find interactions between prediction tendency and background noise as well as word surprisal in disparate brain regions. Our findings suggest that individual prediction tendencies are generalizable across different listening situations and may serve as a valuable element to explain interindividual differences in natural listening situations.
Collapse
Affiliation(s)
- Juliane Schubert
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Fabian Schmidt
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Quirin Gehmacher
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Annika Bresgen
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
23
|
Parida S, Liu ST, Sadagopan S. Adaptive mechanisms facilitate robust performance in noise and in reverberation in an auditory categorization model. Commun Biol 2023; 6:456. [PMID: 37130918 PMCID: PMC10154343 DOI: 10.1038/s42003-023-04816-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 04/05/2023] [Indexed: 05/04/2023] Open
Abstract
For robust vocalization perception, the auditory system must generalize over variability in vocalization production as well as variability arising from the listening environment (e.g., noise and reverberation). We previously demonstrated using guinea pig and marmoset vocalizations that a hierarchical model generalized over production variability by detecting sparse intermediate-complexity features that are maximally informative about vocalization category from a dense spectrotemporal input representation. Here, we explore three biologically feasible model extensions to generalize over environmental variability: (1) training in degraded conditions, (2) adaptation to sound statistics in the spectrotemporal stage and (3) sensitivity adjustment at the feature detection stage. All mechanisms improved vocalization categorization performance, but improvement trends varied across degradation type and vocalization type. One or more adaptive mechanisms were required for model performance to approach the behavioral performance of guinea pigs on a vocalization categorization task. These results highlight the contributions of adaptive mechanisms at multiple auditory processing stages to achieve robust auditory categorization.
Collapse
Affiliation(s)
- Satyabrata Parida
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shi Tong Liu
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Srivatsun Sadagopan
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
24
|
Kaufman M, Zion Golumbic E. Listening to two speakers: Capacity and tradeoffs in neural speech tracking during Selective and Distributed Attention. Neuroimage 2023; 270:119984. [PMID: 36854352 DOI: 10.1016/j.neuroimage.2023.119984] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 02/06/2023] [Accepted: 02/24/2023] [Indexed: 02/27/2023] Open
Abstract
Speech comprehension is severely compromised when several people talk at once, due to limited perceptual and cognitive resources. In such circumstances, top-down attention mechanisms can actively prioritize processing of task-relevant speech. However, behavioral and neural evidence suggest that this selection is not exclusive, and the system may have sufficient capacity to process additional speech input as well. Here we used a data-driven approach to contrast two opposing hypotheses regarding the system's capacity to co-represent competing speech: Can the brain represent two speakers equally or is the system fundamentally limited, resulting in tradeoffs between them? Neural activity was measured using magnetoencephalography (MEG) as human participants heard concurrent speech narratives and engaged in two tasks: Selective Attention, where only one speaker was task-relevant and Distributed Attention, where both speakers were equally relevant. Analysis of neural speech-tracking revealed that both tasks engaged a similar network of brain regions involved in auditory processing, attentional control and speech processing. Interestingly, during both Selective and Distributed Attention the neural representation of competing speech showed a bias towards one speaker. This is in line with proposed 'bottlenecks' for co-representation of concurrent speech and suggests that good performance on distributed attention tasks may be achieved by toggling attention between speakers over time.
Collapse
Affiliation(s)
- Maya Kaufman
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, Israel.
| |
Collapse
|
25
|
Willmore BDB, King AJ. Adaptation in auditory processing. Physiol Rev 2023; 103:1025-1058. [PMID: 36049112 PMCID: PMC9829473 DOI: 10.1152/physrev.00011.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Adaptation is an essential feature of auditory neurons, which reduces their responses to unchanging and recurring sounds and allows their response properties to be matched to the constantly changing statistics of sounds that reach the ears. As a consequence, processing in the auditory system highlights novel or unpredictable sounds and produces an efficient representation of the vast range of sounds that animals can perceive by continually adjusting the sensitivity and, to a lesser extent, the tuning properties of neurons to the most commonly encountered stimulus values. Together with attentional modulation, adaptation to sound statistics also helps to generate neural representations of sound that are tolerant to background noise and therefore plays a vital role in auditory scene analysis. In this review, we consider the diverse forms of adaptation that are found in the auditory system in terms of the processing levels at which they arise, the underlying neural mechanisms, and their impact on neural coding and perception. We also ask what the dynamics of adaptation, which can occur over multiple timescales, reveal about the statistical properties of the environment. Finally, we examine how adaptation to sound statistics is influenced by learning and experience and changes as a result of aging and hearing loss.
Collapse
Affiliation(s)
- Ben D. B. Willmore
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Andrew J. King
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
26
|
Richter B, Putze F, Ivucic G, Brandt M, Schütze C, Reisenhofer R, Wrede B, Schultz T. EEG Correlates of Distractions and Hesitations in Human–Robot Interaction: A LabLinking Pilot Study. MULTIMODAL TECHNOLOGIES AND INTERACTION 2023. [DOI: 10.3390/mti7040037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023] Open
Abstract
In this paper, we investigate the effect of distractions and hesitations as a scaffolding strategy. Recent research points to the potential beneficial effects of a speaker’s hesitations on the listeners’ comprehension of utterances, although results from studies on this issue indicate that humans do not make strategic use of them. The role of hesitations and their communicative function in human-human interaction is a much-discussed topic in current research. To better understand the underlying cognitive processes, we developed a human–robot interaction (HRI) setup that allows the measurement of the electroencephalogram (EEG) signals of a human participant while interacting with a robot. We thereby address the research question of whether we find effects on single-trial EEG based on the distraction and the corresponding robot’s hesitation scaffolding strategy. To carry out the experiments, we leverage our LabLinking method, which enables interdisciplinary joint research between remote labs. This study could not have been conducted without LabLinking, as the two involved labs needed to combine their individual expertise and equipment to achieve the goal together. The results of our study indicate that the EEG correlates in the distracted condition are different from the baseline condition without distractions. Furthermore, we could differentiate the EEG correlates of distraction with and without a hesitation scaffolding strategy. This proof-of-concept study shows that LabLinking makes it possible to conduct collaborative HRI studies in remote laboratories and lays the first foundation for more in-depth research into robotic scaffolding strategies.
Collapse
|
27
|
Popov T, Gips B, Weisz N, Jensen O. Brain areas associated with visual spatial attention display topographic organization during auditory spatial attention. Cereb Cortex 2023; 33:3478-3489. [PMID: 35972419 PMCID: PMC10068281 DOI: 10.1093/cercor/bhac285] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 07/02/2022] [Accepted: 07/05/2022] [Indexed: 11/12/2022] Open
Abstract
Spatially selective modulation of alpha power (8-14 Hz) is a robust finding in electrophysiological studies of visual attention, and has been recently generalized to auditory spatial attention. This modulation pattern is interpreted as reflecting a top-down mechanism for suppressing distracting input from unattended directions of sound origin. The present study on auditory spatial attention extends this interpretation by demonstrating that alpha power modulation is closely linked to oculomotor action. We designed an auditory paradigm in which participants were required to attend to upcoming sounds from one of 24 loudspeakers arranged in a circular array around the head. Maintaining the location of an auditory cue was associated with a topographically modulated distribution of posterior alpha power resembling the findings known from visual attention. Multivariate analyses allowed the prediction of the sound location in the horizontal plane. Importantly, this prediction was also possible, when derived from signals capturing saccadic activity. A control experiment on auditory spatial attention confirmed that, in absence of any visual/auditory input, lateralization of alpha power is linked to the lateralized direction of gaze. Attending to an auditory target engages oculomotor and visual cortical areas in a topographic manner akin to the retinotopic organization associated with visual attention.
Collapse
Affiliation(s)
- Tzvetan Popov
- Methods of Plasticity Research, Department of Psychology, University of Zurich, 1-80502-784644-50205-B15 2TT, Zurich, Switzerland
- Department of Psychology, University of Konstanz, Konstanz, Germany
| | - Bart Gips
- NATO Science and Technology Organization Centre for Maritime Research and Experimentation (CMRE) La Spezia, La Spezia 19126, Italy
| | - Nathan Weisz
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Salzburg, Austria
| | - Ole Jensen
- School of Psychology, University of Birmingham, Birmingham, UK
| |
Collapse
|
28
|
Makov S, Pinto D, Har-Shai Yahav P, Miller LM, Zion Golumbic E. "Unattended, distracting or irrelevant": Theoretical implications of terminological choices in auditory selective attention research. Cognition 2023; 231:105313. [PMID: 36344304 DOI: 10.1016/j.cognition.2022.105313] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 09/30/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022]
Abstract
For seventy years, auditory selective attention research has focused on studying the cognitive mechanisms of prioritizing the processing a 'main' task-relevant stimulus, in the presence of 'other' stimuli. However, a closer look at this body of literature reveals deep empirical inconsistencies and theoretical confusion regarding the extent to which this 'other' stimulus is processed. We argue that many key debates regarding attention arise, at least in part, from inappropriate terminological choices for experimental variables that may not accurately map onto the cognitive constructs they are meant to describe. Here we critically review the more common or disruptive terminological ambiguities, differentiate between methodology-based and theory-derived terms, and unpack the theoretical assumptions underlying different terminological choices. Particularly, we offer an in-depth analysis of the terms 'unattended' and 'distractor' and demonstrate how their use can lead to conflicting theoretical inferences. We also offer a framework for thinking about terminology in a more productive and precise way, in hope of fostering more productive debates and promoting more nuanced and accurate cognitive models of selective attention.
Collapse
Affiliation(s)
- Shiri Makov
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Paz Har-Shai Yahav
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Lee M Miller
- The Center for Mind and Brain, University of California, Davis, CA, United States of America; Department of Neurobiology, Physiology, & Behavior, University of California, Davis, CA, United States of America; Department of Otolaryngology / Head and Neck Surgery, University of California, Davis, CA, United States of America
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel.
| |
Collapse
|
29
|
Ozmeral EJ, Menon KN. Selective auditory attention modulates cortical responses to sound location change for speech in quiet and in babble. PLoS One 2023; 18:e0268932. [PMID: 36638116 PMCID: PMC9838839 DOI: 10.1371/journal.pone.0268932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 01/03/2023] [Indexed: 01/14/2023] Open
Abstract
Listeners use the spatial location or change in spatial location of coherent acoustic cues to aid in auditory object formation. From stimulus-evoked onset responses in normal-hearing listeners using electroencephalography (EEG), we have previously shown measurable tuning to stimuli changing location in quiet, revealing a potential window into the cortical representations of auditory scene analysis. These earlier studies used non-fluctuating, spectrally narrow stimuli, so it was still unknown whether previous observations would translate to speech stimuli, and whether responses would be preserved for stimuli in the presence of background maskers. To examine the effects that selective auditory attention and interferers have on object formation, we measured cortical responses to speech changing location in the free field with and without background babble (+6 dB SNR) during both passive and active conditions. Active conditions required listeners to respond to the onset of the speech stream when it occurred at a new location, explicitly indicating 'yes' or 'no' to whether the stimulus occurred at a block-specific location either 30 degrees to the left or right of midline. In the aggregate, results show similar evoked responses to speech stimuli changing location in quiet compared to babble background. However, the effect of the two background environments diverges somewhat when considering the magnitude and direction of the location change and where the subject was attending. In quiet, attention to the right hemifield appeared to evoke a stronger response than attention to the left hemifield when speech shifted in the rightward direction. No such difference was found in babble conditions. Therefore, consistent with challenges associated with cocktail party listening, directed spatial attention could be compromised in the presence of stimulus noise and likely leads to poorer use of spatial cues in auditory streaming.
Collapse
Affiliation(s)
- Erol J Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States of America
| | - Katherine N Menon
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, United States of America
| |
Collapse
|
30
|
Becker R, Hervais-Adelman A. Individual theta-band cortical entrainment to speech in quiet predicts word-in-noise comprehension. Cereb Cortex Commun 2023; 4:tgad001. [PMID: 36726796 PMCID: PMC9883620 DOI: 10.1093/texcom/tgad001] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 12/17/2022] [Accepted: 12/18/2022] [Indexed: 01/09/2023] Open
Abstract
Speech elicits brain activity time-locked to its amplitude envelope. The resulting speech-brain synchrony (SBS) is thought to be crucial to speech parsing and comprehension. It has been shown that higher speech-brain coherence is associated with increased speech intelligibility. However, studies depending on the experimental manipulation of speech stimuli do not allow conclusion about the causality of the observed tracking. Here, we investigate whether individual differences in the intrinsic propensity to track the speech envelope when listening to speech-in-quiet is predictive of individual differences in speech-recognition-in-noise, in an independent task. We evaluated the cerebral tracking of speech in source-localized magnetoencephalography, at timescales corresponding to the phrases, words, syllables and phonemes. We found that individual differences in syllabic tracking in right superior temporal gyrus and in left middle temporal gyrus (MTG) were positively associated with recognition accuracy in an independent words-in-noise task. Furthermore, directed connectivity analysis showed that this relationship is partially mediated by top-down connectivity from premotor cortex-associated with speech processing and active sensing in the auditory domain-to left MTG. Thus, the extent of SBS-even during clear speech-reflects an active mechanism of the speech processing system that may confer resilience to noise.
Collapse
Affiliation(s)
- Robert Becker
- Corresponding author: Neurolinguistics, Department of Psychology, University of Zurich (UZH), Zurich, Switzerland.
| | - Alexis Hervais-Adelman
- Neurolinguistics, Department of Psychology, University of Zurich, Zurich 8050, Switzerland,Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, Zurich 8057, Switzerland
| |
Collapse
|
31
|
Pastore A, Tomassini A, Delis I, Dolfini E, Fadiga L, D'Ausilio A. Speech listening entails neural encoding of invisible articulatory features. Neuroimage 2022; 264:119724. [PMID: 36328272 DOI: 10.1016/j.neuroimage.2022.119724] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/28/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022] Open
Abstract
Speech processing entails a complex interplay between bottom-up and top-down computations. The former is reflected in the neural entrainment to the quasi-rhythmic properties of speech acoustics while the latter is supposed to guide the selection of the most relevant input subspace. Top-down signals are believed to originate mainly from motor regions, yet similar activities have been shown to tune attentional cycles also for simpler, non-speech stimuli. Here we examined whether, during speech listening, the brain reconstructs articulatory patterns associated to speech production. We measured electroencephalographic (EEG) data while participants listened to sentences during the production of which articulatory kinematics of lips, jaws and tongue were also recorded (via Electro-Magnetic Articulography, EMA). We captured the patterns of articulatory coordination through Principal Component Analysis (PCA) and used Partial Information Decomposition (PID) to identify whether the speech envelope and each of the kinematic components provided unique, synergistic and/or redundant information regarding the EEG signals. Interestingly, tongue movements contain both unique as well as synergistic information with the envelope that are encoded in the listener's brain activity. This demonstrates that during speech listening the brain retrieves highly specific and unique motor information that is never accessible through vision, thus leveraging audio-motor maps that arise most likely from the acquisition of speech production during development.
Collapse
Affiliation(s)
- A Pastore
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| | - A Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
| | - I Delis
- School of Biomedical Sciences, University of Leeds, Leeds, UK
| | - E Dolfini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - L Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - A D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy; Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy.
| |
Collapse
|
32
|
Pinto D, Kaufman M, Brown A, Zion Golumbic E. An ecological investigation of the capacity to follow simultaneous speech and preferential detection of ones’ own name. Cereb Cortex 2022; 33:5361-5374. [PMID: 36331339 DOI: 10.1093/cercor/bhac424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/11/2022] [Accepted: 09/12/2022] [Indexed: 11/06/2022] Open
Abstract
Abstract
Many situations require focusing attention on one speaker, while monitoring the environment for potentially important information. Some have proposed that dividing attention among 2 speakers involves behavioral trade-offs, due to limited cognitive resources. However the severity of these trade-offs, particularly under ecologically-valid circumstances, is not well understood. We investigated the capacity to process simultaneous speech using a dual-task paradigm simulating task-demands and stimuli encountered in real-life. Participants listened to conversational narratives (Narrative Stream) and monitored a stream of announcements (Barista Stream), to detect when their order was called. We measured participants’ performance, neural activity, and skin conductance as they engaged in this dual-task. Participants achieved extremely high dual-task accuracy, with no apparent behavioral trade-offs. Moreover, robust neural and physiological responses were observed for target-stimuli in the Barista Stream, alongside significant neural speech-tracking of the Narrative Stream. These results suggest that humans have substantial capacity to process simultaneous speech and do not suffer from insufficient processing resources, at least for this highly ecological task-combination and level of perceptual load. Results also confirmed the ecological validity of the advantage for detecting ones’ own name at the behavioral, neural, and physiological level, highlighting the contribution of personal relevance when processing simultaneous speech.
Collapse
Affiliation(s)
- Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Maya Kaufman
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Adi Brown
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| |
Collapse
|
33
|
Tai J, Forrester J, Sekuler R. Costs and benefits of audiovisual interactions. Perception 2022; 51:639-657. [PMID: 35959630 DOI: 10.1177/03010066221111501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A strong temporal correlation promotes integration of concurrent sensory signals, either within a single sensory modality, or from different modalities. Although the benefits of such integration are well known, far less attention has been given to possible costs incurred when concurrent sensory signals are uncorrelated. In two experiments, subjects categorized the rate at which a visual object modulated in size, while they also tried to ignore a concurrent task-irrelevant broadband sound. Overall, the experiments showed that (i) losses in accuracy from mismatched auditory and visual rates were larger than gains from matched rates and (ii) mismatched auditory and visual rates slowed responses more than they were sped up when rates matched. Experiment One showed that audiovisual interaction varied with the difference between the visual modulation rate and the modulation rate of a concurrent auditory stimulus. Experiment Two showed that audiovisual interaction depended upon the strength of the task-irrelevant auditory modulation. Although our stimuli involved abstract, low-dimensional stimuli, not speech, the effects we observed parallel key findings on interference in multi-speaker settings.
Collapse
Affiliation(s)
- Jiayue Tai
- Volen Center for Complex Systems, 8244Brandeis University, Waltham, MA, USA
| | - Jack Forrester
- Volen Center for Complex Systems, 8244Brandeis University, Waltham, MA, USA
| | - Robert Sekuler
- Volen Center for Complex Systems, 8244Brandeis University, Waltham, MA, USA
| |
Collapse
|
34
|
Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception. Curr Biol 2022; 32:3971-3986.e4. [PMID: 35973430 DOI: 10.1016/j.cub.2022.07.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/08/2022] [Accepted: 07/19/2022] [Indexed: 11/20/2022]
Abstract
How the human auditory cortex represents spatially separated simultaneous talkers and how talkers' locations and voices modulate the neural representations of attended and unattended speech are unclear. Here, we measured the neural responses from electrodes implanted in neurosurgical patients as they performed single-talker and multi-talker speech perception tasks. We found that spatial separation between talkers caused a preferential encoding of the contralateral speech in Heschl's gyrus (HG), planum temporale (PT), and superior temporal gyrus (STG). Location and spectrotemporal features were encoded in different aspects of the neural response. Specifically, the talker's location changed the mean response level, whereas the talker's spectrotemporal features altered the variation of response around response's baseline. These components were differentially modulated by the attended talker's voice or location, which improved the population decoding of attended speech features. Attentional modulation due to the talker's voice only appeared in the auditory areas with longer latencies, but attentional modulation due to location was present throughout. Our results show that spatial multi-talker speech perception relies upon a separable pre-attentive neural representation, which could be further tuned by top-down attention to the location and voice of the talker.
Collapse
|
35
|
Raghavendra S, Lee S, Chun H, Martin BA, Tan CT. Cortical entrainment to speech produced by cochlear implant talkers and normal-hearing talkers. Front Neurosci 2022; 16:927872. [PMID: 36017176 PMCID: PMC9396306 DOI: 10.3389/fnins.2022.927872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/01/2022] [Indexed: 11/13/2022] Open
Abstract
Cochlear implants (CIs) are commonly used to restore the ability to hear in those with severe or profound hearing loss. CIs provide the necessary auditory feedback for them to monitor and control speech production. However, the speech produced by CI users may not be fully restored to achieve similar perceived sound quality to that produced by normal-hearing talkers and this difference is easily noticeable in their daily conversation. In this study, we attempt to address this difference as perceived by normal-hearing listeners, when listening to continuous speech produced by CI talkers and normal-hearing talkers. We used a regenerative model to decode and reconstruct the speech envelope from the single-trial electroencephalogram (EEG) recorded on the scalp of the normal-hearing listeners. Bootstrap Spearman correlation between the actual speech envelope and the envelope reconstructed from the EEG was computed as a metric to quantify the difference in response to the speech produced by the two talker groups. The same listeners were asked to rate the perceived sound quality of the speech produced by the two talker groups as a behavioral sound quality assessment. The results show that both the perceived sound quality ratings and the computed metric, which can be seen as the degree of cortical entrainment to the actual speech envelope across the normal-hearing listeners, were higher in value for speech produced by normal hearing talkers than that for CI talkers. The first purpose of the study was to determine how well the envelope of speech is represented neurophysiologically via its similarity to the envelope reconstructed from EEG. The second purpose was to show how well this representation of speech for both CI and normal hearing talker groups differentiates in term of perceived sound quality.
Collapse
Affiliation(s)
- Shruthi Raghavendra
- Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX, United States
- *Correspondence: Shruthi Raghavendra,
| | - Sungmin Lee
- Department of Speech-Language Pathology and Audiology, Tongmyong University, Busan, South Korea
| | - Hyungi Chun
- Graduate Center, City University of New York, New York City, NY, United States
| | - Brett A. Martin
- Graduate Center, City University of New York, New York City, NY, United States
| | - Chin-Tuan Tan
- Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX, United States
| |
Collapse
|
36
|
Peter V, van Ommen S, Kalashnikova M, Mazuka R, Nazzi T, Burnham D. Language specificity in cortical tracking of speech rhythm at the mora, syllable, and foot levels. Sci Rep 2022; 12:13477. [PMID: 35931787 PMCID: PMC9356059 DOI: 10.1038/s41598-022-17401-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 07/25/2022] [Indexed: 11/29/2022] Open
Abstract
Recent research shows that adults' neural oscillations track the rhythm of the speech signal. However, the extent to which this tracking is driven by the acoustics of the signal, or by language-specific processing remains unknown. Here adult native listeners of three rhythmically different languages (English, French, Japanese) were compared on their cortical tracking of speech envelopes synthesized in their three native languages, which allowed for coding at each of the three language's dominant rhythmic unit, respectively the foot (2.5 Hz), syllable (5 Hz), or mora (10 Hz) level. The three language groups were also tested with a sequence in a non-native language, Polish, and a non-speech vocoded equivalent, to investigate possible differential speech/nonspeech processing. The results first showed that cortical tracking was most prominent at 5 Hz (syllable rate) for all three groups, but the French listeners showed enhanced tracking at 5 Hz compared to the English and the Japanese groups. Second, across groups, there were no differences in responses for speech versus non-speech at 5 Hz (syllable rate), but there was better tracking for speech than for non-speech at 10 Hz (not the syllable rate). Together these results provide evidence for both language-general and language-specific influences on cortical tracking.
Collapse
Affiliation(s)
- Varghese Peter
- MARCS Institute for Brain Behaviour and Development, Western Sydney University, Penrith, NSW, Australia.
- School of Health and Behavioural Sciences, University of the Sunshine Coast, Sippy Downs, Australia.
| | - Sandrien van Ommen
- Integrative Neuroscience and Cognition Center, CNRS-Université Paris Cité, Paris, France
- Neurosciences Fondamentales, University of Geneva, Geneva, Switzerland
| | - Marina Kalashnikova
- MARCS Institute for Brain Behaviour and Development, Western Sydney University, Penrith, NSW, Australia
- BCBL, Basque Center on Cognition, Brain and Language, San Sebastian, Guipuzcoa, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, Bizcaya, Spain
| | - Reiko Mazuka
- Laboratory for Language Development, RIKEN Center for Brain Science, Saitama, Japan
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Thierry Nazzi
- Integrative Neuroscience and Cognition Center, CNRS-Université Paris Cité, Paris, France
| | - Denis Burnham
- MARCS Institute for Brain Behaviour and Development, Western Sydney University, Penrith, NSW, Australia
| |
Collapse
|
37
|
Bai F, Meyer AS, Martin AE. Neural dynamics differentially encode phrases and sentences during spoken language comprehension. PLoS Biol 2022; 20:e3001713. [PMID: 35834569 PMCID: PMC9282610 DOI: 10.1371/journal.pbio.3001713] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/14/2022] [Indexed: 11/19/2022] Open
Abstract
Human language stands out in the natural world as a biological signal that uses a structured system to combine the meanings of small linguistic units (e.g., words) into larger constituents (e.g., phrases and sentences). However, the physical dynamics of speech (or sign) do not stand in a one-to-one relationship with the meanings listeners perceive. Instead, listeners infer meaning based on their knowledge of the language. The neural readouts of the perceptual and cognitive processes underlying these inferences are still poorly understood. In the present study, we used scalp electroencephalography (EEG) to compare the neural response to phrases (e.g., the red vase) and sentences (e.g., the vase is red), which were close in semantic meaning and had been synthesized to be physically indistinguishable. Differences in structure were well captured in the reorganization of neural phase responses in delta (approximately <2 Hz) and theta bands (approximately 2 to 7 Hz),and in power and power connectivity changes in the alpha band (approximately 7.5 to 13.5 Hz). Consistent with predictions from a computational model, sentences showed more power, more power connectivity, and more phase synchronization than phrases did. Theta-gamma phase-amplitude coupling occurred, but did not differ between the syntactic structures. Spectral-temporal response function (STRF) modeling revealed different encoding states for phrases and sentences, over and above the acoustically driven neural response. Our findings provide a comprehensive description of how the brain encodes and separates linguistic structures in the dynamics of neural responses. They imply that phase synchronization and strength of connectivity are readouts for the constituent structure of language. The results provide a novel basis for future neurophysiological research on linguistic structure representation in the brain, and, together with our simulations, support time-based binding as a mechanism of structure encoding in neural dynamics.
Collapse
Affiliation(s)
- Fan Bai
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Antje S. Meyer
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Andrea E. Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
38
|
Kachlicka M, Laffere A, Dick F, Tierney A. Slow phase-locked modulations support selective attention to sound. Neuroimage 2022; 252:119024. [PMID: 35231629 PMCID: PMC9133470 DOI: 10.1016/j.neuroimage.2022.119024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/16/2022] [Accepted: 02/19/2022] [Indexed: 11/16/2022] Open
Abstract
To make sense of complex soundscapes, listeners must select and attend to task-relevant streams while ignoring uninformative sounds. One possible neural mechanism underlying this process is alignment of endogenous oscillations with the temporal structure of the target sound stream. Such a mechanism has been suggested to mediate attentional modulation of neural phase-locking to the rhythms of attended sounds. However, such modulations are compatible with an alternate framework, where attention acts as a filter that enhances exogenously-driven neural auditory responses. Here we attempted to test several predictions arising from the oscillatory account by playing two tone streams varying across conditions in tone duration and presentation rate; participants attended to one stream or listened passively. Attentional modulation of the evoked waveform was roughly sinusoidal and scaled with rate, while the passive response did not. However, there was only limited evidence for continuation of modulations through the silence between sequences. These results suggest that attentionally-driven changes in phase alignment reflect synchronization of slow endogenous activity with the temporal structure of attended stimuli.
Collapse
Affiliation(s)
- Magdalena Kachlicka
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England; Division of Psychology & Language Sciences, UCL, Gower Street, London WC1E 6BT, England
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England.
| |
Collapse
|
39
|
Holroyd CB. Interbrain synchrony: on wavy ground. Trends Neurosci 2022; 45:346-357. [PMID: 35236639 DOI: 10.1016/j.tins.2022.02.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 01/08/2022] [Accepted: 02/04/2022] [Indexed: 12/15/2022]
Abstract
In recent years the study of dynamic, between-brain coupling mechanisms has taken social neuroscience by storm. In particular, interbrain synchrony (IBS) is a putative neural mechanism said to promote social interactions by enabling the functional integration of multiple brains. In this article, I argue that this research is beset with three pervasive and interrelated problems. First, the field lacks a widely accepted definition of IBS. Second, IBS wants for theories that can guide the design and interpretation of experiments. Third, a potpourri of tasks and empirical methods permits undue flexibility when testing the hypothesis. These factors synergistically undermine IBS as a theoretical construct. I finish by recommending measures that can address these issues.
Collapse
Affiliation(s)
- Clay B Holroyd
- Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000 Gent, Belgium.
| |
Collapse
|
40
|
Dellaferrera G, Asabuki T, Fukai T. Modeling the Repetition-Based Recovering of Acoustic and Visual Sources With Dendritic Neurons. Front Neurosci 2022; 16:855753. [PMID: 35573290 PMCID: PMC9097820 DOI: 10.3389/fnins.2022.855753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open
Abstract
In natural auditory environments, acoustic signals originate from the temporal superimposition of different sound sources. The problem of inferring individual sources from ambiguous mixtures of sounds is known as blind source decomposition. Experiments on humans have demonstrated that the auditory system can identify sound sources as repeating patterns embedded in the acoustic input. Source repetition produces temporal regularities that can be detected and used for segregation. Specifically, listeners can identify sounds occurring more than once across different mixtures, but not sounds heard only in a single mixture. However, whether such a behavior can be computationally modeled has not yet been explored. Here, we propose a biologically inspired computational model to perform blind source separation on sequences of mixtures of acoustic stimuli. Our method relies on a somatodendritic neuron model trained with a Hebbian-like learning rule which was originally conceived to detect spatio-temporal patterns recurring in synaptic inputs. We show that the segregation capabilities of our model are reminiscent of the features of human performance in a variety of experimental settings involving synthesized sounds with naturalistic properties. Furthermore, we extend the study to investigate the properties of segregation on task settings not yet explored with human subjects, namely natural sounds and images. Overall, our work suggests that somatodendritic neuron models offer a promising neuro-inspired learning strategy to account for the characteristics of the brain segregation capabilities as well as to make predictions on yet untested experimental settings.
Collapse
Affiliation(s)
- Giorgia Dellaferrera
- Neural Coding and Brain Computing Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
- Institute of Neuroinformatics, University of Zurich and Swiss Federal Institute of Technology Zurich (ETH), Zurich, Switzerland
| | - Toshitake Asabuki
- Neural Coding and Brain Computing Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
| | - Tomoki Fukai
- Neural Coding and Brain Computing Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
| |
Collapse
|
41
|
Nakanishi M, Nemoto M, Kawai HD. Cortical nicotinic enhancement of tone-evoked heightened activities and subcortical nicotinic enlargement of activated areas in mouse auditory cortex. Neurosci Res 2022; 181:55-65. [DOI: 10.1016/j.neures.2022.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/19/2022] [Accepted: 04/01/2022] [Indexed: 10/18/2022]
|
42
|
Decoding Selective Auditory Attention with EEG using A Transformer Model. Methods 2022; 204:410-417. [DOI: 10.1016/j.ymeth.2022.04.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/20/2022] [Accepted: 04/14/2022] [Indexed: 11/23/2022] Open
|
43
|
Corcoran AW, Perera R, Koroma M, Kouider S, Hohwy J, Andrillon T. Expectations boost the reconstruction of auditory features from electrophysiological responses to noisy speech. Cereb Cortex 2022; 33:691-708. [PMID: 35253871 PMCID: PMC9890472 DOI: 10.1093/cercor/bhac094] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 02/11/2022] [Accepted: 02/12/2022] [Indexed: 02/04/2023] Open
Abstract
Online speech processing imposes significant computational demands on the listening brain, the underlying mechanisms of which remain poorly understood. Here, we exploit the perceptual "pop-out" phenomenon (i.e. the dramatic improvement of speech intelligibility after receiving information about speech content) to investigate the neurophysiological effects of prior expectations on degraded speech comprehension. We recorded electroencephalography (EEG) and pupillometry from 21 adults while they rated the clarity of noise-vocoded and sine-wave synthesized sentences. Pop-out was reliably elicited following visual presentation of the corresponding written sentence, but not following incongruent or neutral text. Pop-out was associated with improved reconstruction of the acoustic stimulus envelope from low-frequency EEG activity, implying that improvements in perceptual clarity were mediated via top-down signals that enhanced the quality of cortical speech representations. Spectral analysis further revealed that pop-out was accompanied by a reduction in theta-band power, consistent with predictive coding accounts of acoustic filling-in and incremental sentence processing. Moreover, delta-band power, alpha-band power, and pupil diameter were all increased following the provision of any written sentence information, irrespective of content. Together, these findings reveal distinctive profiles of neurophysiological activity that differentiate the content-specific processes associated with degraded speech comprehension from the context-specific processes invoked under adverse listening conditions.
Collapse
Affiliation(s)
- Andrew W Corcoran
- Corresponding author: Room E672, 20 Chancellors Walk, Clayton, VIC 3800, Australia.
| | - Ricardo Perera
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Matthieu Koroma
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Sid Kouider
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Jakob Hohwy
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia,Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Thomas Andrillon
- Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia,Paris Brain Institute, Sorbonne Université, Inserm-CNRS, Paris 75013, France
| |
Collapse
|
44
|
Vanden Bosch der Nederlanden CM, Joanisse MF, Grahn JA, Snijders TM, Schoffelen JM. Familiarity modulates neural tracking of sung and spoken utterances. Neuroimage 2022; 252:119049. [PMID: 35248707 DOI: 10.1016/j.neuroimage.2022.119049] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 02/11/2022] [Accepted: 03/01/2022] [Indexed: 10/18/2022] Open
Abstract
Music is often described in the laboratory and in the classroom as a beneficial tool for memory encoding and retention, with a particularly strong effect when words are sung to familiar compared to unfamiliar melodies. However, the neural mechanisms underlying this memory benefit, especially for benefits related to familiar music are not well understood. The current study examined whether neural tracking of the slow syllable rhythms of speech and song is modulated by melody familiarity. Participants became familiar with twelve novel melodies over four days prior to MEG testing. Neural tracking of the same utterances spoken and sung revealed greater cerebro-acoustic phase coherence for sung compared to spoken utterances, but did not show an effect of familiar melody when stimuli were grouped by their assigned (trained) familiarity. However, when participant's subjective ratings of perceived familiarity were used to group stimuli, a large effect of familiarity was observed. This effect was not specific to song, as it was observed in both sung and spoken utterances. Exploratory analyses revealed some in-session learning of unfamiliar and spoken utterances, with increased neural tracking for untrained stimuli by the end of the MEG testing session. Our results indicate that top-down factors like familiarity are strong modulators of neural tracking for music and language. Participants' neural tracking was related to their perception of familiarity, which was likely driven by a combination of effects from repeated listening, stimulus-specific melodic simplicity, and individual differences. Beyond simply the acoustic features of music, top-down factors built into the music listening experience, like repetition and familiarity, play a large role in the way we attend to and encode information presented in a musical context.
Collapse
Affiliation(s)
| | - Marc F Joanisse
- The Brain and Mind Institute, The University of Western Ontario, London, Ontario, Canada; Psychology Department, The University of Western Ontario, London, Ontario, Canada
| | - Jessica A Grahn
- The Brain and Mind Institute, The University of Western Ontario, London, Ontario, Canada; Psychology Department, The University of Western Ontario, London, Ontario, Canada
| | - Tineke M Snijders
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands; Radboud University, Donders Institute for Brain, Cognition and Behaviour, the Netherlands
| | - Jan-Mathijs Schoffelen
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, the Netherlands.
| |
Collapse
|
45
|
Korzeczek A, Neef NE, Steinmann I, Paulus W, Sommer M. Stuttering severity relates to frontotemporal low-beta synchronization during pre-speech preparation. Clin Neurophysiol 2022; 138:84-96. [DOI: 10.1016/j.clinph.2022.03.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 03/02/2022] [Accepted: 03/09/2022] [Indexed: 12/15/2022]
|
46
|
Wang L, Wang Y, Liu Z, Wu EX, Chen F. A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes. Front Neurosci 2022; 15:760611. [PMID: 35221885 PMCID: PMC8866945 DOI: 10.3389/fnins.2021.760611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 12/30/2021] [Indexed: 11/21/2022] Open
Abstract
In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous studies revealed that the root mean square (RMS)–level–based speech segmentation made a great contribution to the target speech perception with the modulation of sustained auditory attention. This study further investigated the effect of the RMS-level–based speech segmentation on the auditory attention decoding (AAD) performance with both sustained and switched attention in the competing speaker auditory scenes. Objective biomarkers derived from the cortical activities were also developed to index the dynamic auditory attention states. In the current study, subjects were asked to concentrate or switch their attention between two competing speaker streams. The neural responses to the higher- and lower-RMS-level speech segments were analyzed via the linear temporal response function (TRF) before and after the attention switching from one to the other speaker stream. Furthermore, the AAD performance decoded by the unified TRF decoding model was compared to that by the speech-RMS-level–based segmented decoding model with the dynamic change of the auditory attention states. The results showed that the weight of the typical TRF component approximately 100-ms time lag was sensitive to the switching of the auditory attention. Compared to the unified AAD model, the segmented AAD model improved attention decoding performance under both the sustained and switched auditory attention modulations in a wide range of signal-to-masker ratios (SMRs). In the competing speaker scenes, the TRF weight and AAD accuracy could be used as effective indicators to detect the changes of the auditory attention. In addition, with a wide range of SMRs (i.e., from 6 to –6 dB in this study), the segmented AAD model showed the robust decoding performance even with short decision window length, suggesting that this speech-RMS-level–based model has the potential to decode dynamic attention states in the realistic auditory scenarios.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Yihan Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Zhixing Liu
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Ed X. Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Fei Chen,
| |
Collapse
|
47
|
Auerbach BD, Gritton HJ. Hearing in Complex Environments: Auditory Gain Control, Attention, and Hearing Loss. Front Neurosci 2022; 16:799787. [PMID: 35221899 PMCID: PMC8866963 DOI: 10.3389/fnins.2022.799787] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 01/18/2022] [Indexed: 12/12/2022] Open
Abstract
Listening in noisy or complex sound environments is difficult for individuals with normal hearing and can be a debilitating impairment for those with hearing loss. Extracting meaningful information from a complex acoustic environment requires the ability to accurately encode specific sound features under highly variable listening conditions and segregate distinct sound streams from multiple overlapping sources. The auditory system employs a variety of mechanisms to achieve this auditory scene analysis. First, neurons across levels of the auditory system exhibit compensatory adaptations to their gain and dynamic range in response to prevailing sound stimulus statistics in the environment. These adaptations allow for robust representations of sound features that are to a large degree invariant to the level of background noise. Second, listeners can selectively attend to a desired sound target in an environment with multiple sound sources. This selective auditory attention is another form of sensory gain control, enhancing the representation of an attended sound source while suppressing responses to unattended sounds. This review will examine both “bottom-up” gain alterations in response to changes in environmental sound statistics as well as “top-down” mechanisms that allow for selective extraction of specific sound features in a complex auditory scene. Finally, we will discuss how hearing loss interacts with these gain control mechanisms, and the adaptive and/or maladaptive perceptual consequences of this plasticity.
Collapse
Affiliation(s)
- Benjamin D. Auerbach
- Department of Molecular and Integrative Physiology, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- *Correspondence: Benjamin D. Auerbach,
| | - Howard J. Gritton
- Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- Department of Comparative Biosciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
48
|
Su E, Cai S, Xie L, Li H, Schultz T. STAnet: A Spatiotemporal Attention Network for Decoding Auditory Spatial Attention from EEG. IEEE Trans Biomed Eng 2022; 69:2233-2242. [PMID: 34982671 DOI: 10.1109/tbme.2022.3140246] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Humans are able to localize the source of a sound. This enables them to direct attention to a particular speaker in a cocktail party. Psycho-acoustic studies show that the sensory cortices of the human brain respond to the location of sound sources differently, and the auditory attention itself is a dynamic and temporally based brain activity. In this work, we seek to build a computational model which uses both spatial and temporal information manifested in EEG signals for auditory spatial attention detection (ASAD). METHODS We propose an end-to-end spatiotemporal attention network, denoted as STAnet, to detect auditory spatial attention from EEG. The STAnet is designed to assign differentiated weights dynamically to EEG channels through a spatial attention mechanism, and to temporal patterns in EEG signals through a temporal attention mechanism. RESULTS We report the ASAD experiments on two publicly available datasets. The STAnet outperforms other competitive models by a large margin under various experimental conditions. Its attention decision for 1-second decision window outperforms that of the state-of-the-art techniques for 10-second decision window. Experimental results also demonstrate that the STAnet achieves competitive performance on EEG signals ranging from 64 to as few as 16 channels. CONCLUSION This study provides evidence suggesting that efficient low-density EEG online decoding is within reach. SIGNIFICANCE This study also marks an important step towards the practical implementation of ASAD in real life applications.
Collapse
|
49
|
Soltanparast S, Toufan R, Talebian S, Pourbakht A. Regularity of background auditory scene and selective attention: a brain oscillatory study. Neurosci Lett 2022; 772:136465. [DOI: 10.1016/j.neulet.2022.136465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 12/29/2021] [Accepted: 01/14/2022] [Indexed: 11/27/2022]
|
50
|
Straetmans L, Holtze B, Debener S, Jaeger M, Mirkovic B. Neural tracking to go: auditory attention decoding and saliency detection with mobile EEG. J Neural Eng 2021; 18. [PMID: 34902846 DOI: 10.1088/1741-2552/ac42b5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 12/13/2021] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Neuro-steered assistive technologies have been suggested to offer a major advancement in future devices like neuro-steered hearing aids. Auditory attention decoding methods would in that case allow for identification of an attended speaker within complex auditory environments, exclusively from neural data. Decoding the attended speaker using neural information has so far only been done in controlled laboratory settings. Yet, it is known that ever-present factors like distraction and movement are reflected in the neural signal parameters related to attention. APPROACH Thus, in the current study we applied a two-competing speaker paradigm to investigate performance of a commonly applied EEG-based auditory attention decoding (AAD) model outside of the laboratory during leisure walking and distraction. Unique environmental sounds were added to the auditory scene and served as distractor events. MAIN RESULTS The current study shows, for the first time, that the attended speaker can be accurately decoded during natural movement. At a temporal resolution of as short as 5-seconds and without artifact attenuation, decoding was found to be significantly above chance level. Further, as hypothesized, we found a decrease in attention to the to-be-attended and the to-be-ignored speech stream after the occurrence of a salient event. Additionally, we demonstrate that it is possible to predict neural correlates of distraction with a computational model of auditory saliency based on acoustic features. CONCLUSION Taken together, our study shows that auditory attention tracking outside of the laboratory in ecologically valid conditions is feasible and a step towards the development of future neural-steered hearing aids.
Collapse
Affiliation(s)
- Lisa Straetmans
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstraße 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - B Holtze
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Stefan Debener
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Manuela Jaeger
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Bojana Mirkovic
- Department of Psychology , Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| |
Collapse
|