1
|
Joshi N, Ng Y, Thakkar K, Duque D, Yin P, Fritz J, Elhilali M, Shamma S. Temporal Coherence Shapes Cortical Responses to Speech Mixtures in a Ferret Cocktail Party. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.21.595171. [PMID: 38915590 PMCID: PMC11195067 DOI: 10.1101/2024.05.21.595171] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Segregation of complex sounds such as speech, music and animal vocalizations as they simultaneously emanate from multiple sources (referred to as the "cocktail party problem") is a remarkable ability that is common in humans and animals alike. The neural underpinnings of this process have been extensively studied behaviorally and physiologically in non-human animals primarily with simplified sounds (tones and noise sequences). In humans, segregation experiments utilizing more complex speech mixtures are common; but physiological experiments have relied on EEG/MEG/ECoG recordings that sample activity from thousands of neurons, often obscuring the detailed processes that give rise to the observed segregation. The present study combines the insights from animal single-unit physiology with segregation of speech-like mixtures. Ferrets were trained to attend to a female voice and detect a target word, both in presence or absence of a concurrent, equally salient male voice. Single neuron recordings were obtained from primary and secondary ferret auditory cortical fields, as well as frontal cortex. During task performance, representation of the female words became more enhanced relative to those of the (distractor) male in all cortical regions, especially in the higher auditory cortical field. Analysis of the temporal and spectral response characteristics during task performance reveals how speech segregation gradually emerges in the auditory cortex. A computational model evaluated on the same voice mixtures replicates and extends these results to different attentional targets (attention to female or male voices). These findings are consistent with the temporal coherence theory whereby attention to a target voice anchors neural activity in cortical networks hence binding together channels that are coherently temporally-modulated with the target, and ultimately forming a common auditory stream.
Collapse
Affiliation(s)
- Neha Joshi
- Electrical and Computer Engineering Department, University of Maryland College Park, MD
| | - Yu Ng
- Electrical and Computer Engineering Department, University of Maryland College Park, MD
| | - Karran Thakkar
- Electrical and Computer Engineering Department, The Johns Hopkins University, MD
| | - Daniel Duque
- Institute of Neuroscience of Castilla Y León, University of Salamanca
| | - Pingbo Yin
- Institute for Systems Research, University of Maryland College Park, MD
| | | | - Mounya Elhilali
- Electrical and Computer Engineering Department, The Johns Hopkins University, MD
| | - Shihab Shamma
- Electrical and Computer Engineering Department, University of Maryland College Park, MD
- Institute for Systems Research, University of Maryland College Park, MD
- Départment d'étude cognitives, école normale supérieure, PSL, Paris
| |
Collapse
|
2
|
Lu K, Dutta K, Mohammed A, Elhilali M, Shamma S. Temporal-Coherence Induces Binding of Responses to Sound Sequences in Ferret Auditory Cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.21.595170. [PMID: 38854125 PMCID: PMC11160575 DOI: 10.1101/2024.05.21.595170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Binding the attributes of a sensory source is necessary to perceive it as a unified entity, one that can be attended to and extracted from its surrounding scene. In auditory perception, this is the essence of the cocktail party problem in which a listener segregates one speaker from a mixture of voices, or a musical stream from simultaneous others. It is postulated that coherence of the temporal modulations of a source's features is necessary to bind them. The focus of this study is on the role of temporal-coherence in binding and segregation, and specifically as evidenced by the neural correlates of rapid plasticity that enhance cortical responses among synchronized neurons, while suppressing them among asynchronized ones. In a first experiment, we find that attention to a sound sequence rapidly binds it to other coherent sequences while suppressing nearby incoherent sequences, thus enhancing the contrast between the two groups. In a second experiment, a sequence of synchronized multi-tone complexes, embedded in a cloud of randomly dispersed background of desynchronized tones, perceptually and neurally pops-out after a fraction of a second highlighting the binding among its coherent tones against the incoherent background. These findings demonstrate the role of temporal-coherence in binding and segregation.
Collapse
Affiliation(s)
- Kai Lu
- Emory University Medical School
| | - Kelsey Dutta
- Electrical and Computer Engineering Department & Institute for Systems Research, University of Maryland College Park
| | - Ali Mohammed
- Electrical and Computer Engineering Department & Institute for Systems Research, University of Maryland College Park
| | - Mounya Elhilali
- Electrical and Computer Engineering, The Johns Hopkins University
| | - Shihab Shamma
- Electrical and Computer Engineering Department & Institute for Systems Research, University of Maryland College Park
- Départment d'étude Cognitives, école normale supérieure, PSL
| |
Collapse
|
3
|
van der Heijden K, Patel P, Bickel S, Herrero JL, Mehta AD, Mesgarani N. Joint population coding and temporal coherence link an attended talker's voice and location features in naturalistic multi-talker scenes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.13.593814. [PMID: 38798551 PMCID: PMC11118436 DOI: 10.1101/2024.05.13.593814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Listeners readily extract multi-dimensional auditory objects such as a 'localized talker' from complex acoustic scenes with multiple talkers. Yet, the neural mechanisms underlying simultaneous encoding and linking of different sound features - for example, a talker's voice and location - are poorly understood. We analyzed invasive intracranial recordings in neurosurgical patients attending to a localized talker in real-life cocktail party scenarios. We found that sensitivity to an individual talker's voice and location features was distributed throughout auditory cortex and that neural sites exhibited a gradient from sensitivity to a single feature to joint sensitivity to both features. On a population level, cortical response patterns of both dual-feature sensitive sites but also single-feature sensitive sites revealed simultaneous encoding of an attended talker's voice and location features. However, for single-feature sensitive sites, the representation of the primary feature was more precise. Further, sites which selective tracked an attended speech stream concurrently encoded an attended talker's voice and location features, indicating that such sites combine selective tracking of an attended auditory object with encoding of the object's features. Finally, we found that attending a localized talker selectively enhanced temporal coherence between single-feature voice sensitive sites and single-feature location sensitive sites, providing an additional mechanism for linking voice and location in multi-talker scenes. These results demonstrate that a talker's voice and location features are linked during multi-dimensional object formation in naturalistic multi-talker scenes by joint population coding as well as by temporal coherence between neural sites. SIGNIFICANCE STATEMENT Listeners effortlessly extract auditory objects from complex acoustic scenes consisting of multiple sound sources in naturalistic, spatial sound scenes. Yet, how the brain links different sound features to form a multi-dimensional auditory object is poorly understood. We investigated how neural responses encode and integrate an attended talker's voice and location features in spatial multi-talker sound scenes to elucidate which neural mechanisms underlie simultaneous encoding and linking of different auditory features. Our results show that joint population coding as well as temporal coherence mechanisms contribute to distributed multi-dimensional auditory object encoding. These findings shed new light on cortical functional specialization and multidimensional auditory object formation in complex, naturalistic listening scenes. HIGHLIGHTS Cortical responses to an single talker exhibit a distributed gradient, ranging from sites that are sensitive to both a talker's voice and location (dual-feature sensitive sites) to sites that are sensitive to either voice or location (single-feature sensitive sites).Population response patterns of dual-feature sensitive sites encode voice and location features of the attended talker in multi-talker scenes jointly and with equal precision.Despite their sensitivity to a single feature at the level of individual cortical sites, population response patterns of single-feature sensitive sites also encode location and voice features of a talker jointly, but with higher precision for the feature they are primarily sensitive to.Neural sites which selectively track an attended speech stream concurrently encode the attended talker's voice and location features.Attention selectively enhances temporal coherence between voice and location selective sites over time.Joint population coding as well as temporal coherence mechanisms underlie distributed multi-dimensional auditory object encoding in auditory cortex.
Collapse
|
4
|
Puschmann S, Regev M, Fakhar K, Zatorre RJ, Thiel CM. Attention-Driven Modulation of Auditory Cortex Activity during Selective Listening in a Multispeaker Setting. J Neurosci 2024; 44:e1157232023. [PMID: 38388426 PMCID: PMC11007309 DOI: 10.1523/jneurosci.1157-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/30/2023] [Accepted: 11/05/2023] [Indexed: 02/24/2024] Open
Abstract
Real-world listening settings often consist of multiple concurrent sound streams. To limit perceptual interference during selective listening, the auditory system segregates and filters the relevant sensory input. Previous work provided evidence that the auditory cortex is critically involved in this process and selectively gates attended input toward subsequent processing stages. We studied at which level of auditory cortex processing this filtering of attended information occurs using functional magnetic resonance imaging (fMRI) and a naturalistic selective listening task. Forty-five human listeners (of either sex) attended to one of two continuous speech streams, presented either concurrently or in isolation. Functional data were analyzed using an inter-subject analysis to assess stimulus-specific components of ongoing auditory cortex activity. Our results suggest that stimulus-related activity in the primary auditory cortex and the adjacent planum temporale are hardly affected by attention, whereas brain responses at higher stages of the auditory cortex processing hierarchy become progressively more selective for the attended input. Consistent with these findings, a complementary analysis of stimulus-driven functional connectivity further demonstrated that information on the to-be-ignored speech stream is shared between the primary auditory cortex and the planum temporale but largely fails to reach higher processing stages. Our findings suggest that the neural processing of ignored speech cannot be effectively suppressed at the level of early cortical processing of acoustic features but is gradually attenuated once the competing speech streams are fully segregated.
Collapse
Affiliation(s)
- Sebastian Puschmann
- Biological Psychology Lab, Department of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
| | - Mor Regev
- Montreal Neurological Institute, McGill University, Montreal, Quebec H3A 2B4, Canada
| | - Kayson Fakhar
- Institute of Computational Neuroscience, University Medical Center Eppendorf, Hamburg University, Hamburg Center of Neuroscience, Hamburg 20246, Germany
| | - Robert J Zatorre
- Montreal Neurological Institute, McGill University, Montreal, Quebec H3A 2B4, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, Quebec H2V 2S9, Canada
| | - Christiane M Thiel
- Biological Psychology Lab, Department of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
| |
Collapse
|
5
|
Viswanathan V, Rupp KM, Hect JL, Harford EE, Holt LL, Abel TJ. Intracranial Mapping of Response Latencies and Task Effects for Spoken Syllable Processing in the Human Brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588349. [PMID: 38617227 PMCID: PMC11014624 DOI: 10.1101/2024.04.05.588349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Prior lesion, noninvasive-imaging, and intracranial-electroencephalography (iEEG) studies have documented hierarchical, parallel, and distributed characteristics of human speech processing. Yet, there have not been direct, intracranial observations of the latency with which regions outside the temporal lobe respond to speech, or how these responses are impacted by task demands. We leveraged human intracranial recordings via stereo-EEG to measure responses from diverse forebrain sites during (i) passive listening to /bi/ and /pi/ syllables, and (ii) active listening requiring /bi/-versus-/pi/ categorization. We find that neural response latency increases from a few tens of ms in Heschl's gyrus (HG) to several tens of ms in superior temporal gyrus (STG), superior temporal sulcus (STS), and early parietal areas, and hundreds of ms in later parietal areas, insula, frontal cortex, hippocampus, and amygdala. These data also suggest parallel flow of speech information dorsally and ventrally, from HG to parietal areas and from HG to STG and STS, respectively. Latency data also reveal areas in parietal cortex, frontal cortex, hippocampus, and amygdala that are not responsive to the stimuli during passive listening but are responsive during categorization. Furthermore, multiple regions-spanning auditory, parietal, frontal, and insular cortices, and hippocampus and amygdala-show greater neural response amplitudes during active versus passive listening (a task-related effect). Overall, these results are consistent with hierarchical processing of speech at a macro level and parallel streams of information flow in temporal and parietal regions. These data also reveal regions where the speech code is stimulus-faithful and those that encode task-relevant representations.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
| | - Kyle M. Rupp
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
| | - Jasmine L. Hect
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
| | - Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, Austin, TX 78712
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15238
| |
Collapse
|
6
|
Tune S, Obleser J. Neural attentional filters and behavioural outcome follow independent individual trajectories over the adult lifespan. eLife 2024; 12:RP92079. [PMID: 38470243 DOI: 10.7554/elife.92079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024] Open
Abstract
Preserved communication abilities promote healthy ageing. To this end, the age-typical loss of sensory acuity might in part be compensated for by an individual's preserved attentional neural filtering. Is such a compensatory brain-behaviour link longitudinally stable? Can it predict individual change in listening behaviour? We here show that individual listening behaviour and neural filtering ability follow largely independent developmental trajectories modelling electroencephalographic and behavioural data of N = 105 ageing individuals (39-82 y). First, despite the expected decline in hearing-threshold-derived sensory acuity, listening-task performance proved stable over 2 y. Second, neural filtering and behaviour were correlated only within each separate measurement timepoint (T1, T2). Longitudinally, however, our results raise caution on attention-guided neural filtering metrics as predictors of individual trajectories in listening behaviour: neither neural filtering at T1 nor its 2-year change could predict individual 2-year behavioural change, under a combination of modelling strategies.
Collapse
Affiliation(s)
- Sarah Tune
- Center of Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Jonas Obleser
- Center of Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
- Department of Psychology, University of Lübeck, Lübeck, Germany
| |
Collapse
|
7
|
Wikman P, Salmela V, Sjöblom E, Leminen M, Laine M, Alho K. Attention to audiovisual speech shapes neural processing through feedback-feedforward loops between different nodes of the speech network. PLoS Biol 2024; 22:e3002534. [PMID: 38466713 PMCID: PMC10957087 DOI: 10.1371/journal.pbio.3002534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 03/21/2024] [Accepted: 01/30/2024] [Indexed: 03/13/2024] Open
Abstract
Selective attention-related top-down modulation plays a significant role in separating relevant speech from irrelevant background speech when vocal attributes separating concurrent speakers are small and continuously evolving. Electrophysiological studies have shown that such top-down modulation enhances neural tracking of attended speech. Yet, the specific cortical regions involved remain unclear due to the limited spatial resolution of most electrophysiological techniques. To overcome such limitations, we collected both electroencephalography (EEG) (high temporal resolution) and functional magnetic resonance imaging (fMRI) (high spatial resolution), while human participants selectively attended to speakers in audiovisual scenes containing overlapping cocktail party speech. To utilise the advantages of the respective techniques, we analysed neural tracking of speech using the EEG data and performed representational dissimilarity-based EEG-fMRI fusion. We observed that attention enhanced neural tracking and modulated EEG correlates throughout the latencies studied. Further, attention-related enhancement of neural tracking fluctuated in predictable temporal profiles. We discuss how such temporal dynamics could arise from a combination of interactions between attention and prediction as well as plastic properties of the auditory cortex. EEG-fMRI fusion revealed attention-related iterative feedforward-feedback loops between hierarchically organised nodes of the ventral auditory object related processing stream. Our findings support models where attention facilitates dynamic neural changes in the auditory cortex, ultimately aiding discrimination of relevant sounds from irrelevant ones while conserving neural resources.
Collapse
Affiliation(s)
- Patrik Wikman
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Viljami Salmela
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Eetu Sjöblom
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
| | - Miika Leminen
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- AI and Analytics Unit, Helsinki University Hospital, Helsinki, Finland
| | - Matti Laine
- Department of Psychology, Åbo Akademi University, Turku, Finland
| | - Kimmo Alho
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
8
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
9
|
Wang Q, Luo L, Xu N, Wang J, Yang R, Chen G, Ren J, Luan G, Fang F. Neural response properties predict perceived contents and locations elicited by intracranial electrical stimulation of human auditory cortex. Cereb Cortex 2024; 34:bhad517. [PMID: 38185991 DOI: 10.1093/cercor/bhad517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 12/09/2023] [Accepted: 12/10/2023] [Indexed: 01/09/2024] Open
Abstract
Intracranial electrical stimulation (iES) of auditory cortex can elicit sound experiences with a variety of perceived contents (hallucination or illusion) and locations (contralateral or bilateral side), independent of actual acoustic inputs. However, the neural mechanisms underlying this elicitation heterogeneity remain undiscovered. Here, we collected subjective reports following iES at 3062 intracranial sites in 28 patients (both sexes) and identified 113 auditory cortical sites with iES-elicited sound experiences. We then decomposed the sound-induced intracranial electroencephalogram (iEEG) signals recorded from all 113 sites into time-frequency features. We found that the iES-elicited perceived contents can be predicted by the early high-γ features extracted from sound-induced iEEG. In contrast, the perceived locations elicited by stimulating hallucination sites and illusion sites are determined by the late high-γ and long-lasting α features, respectively. Our study unveils the crucial neural signatures of iES-elicited sound experiences in human and presents a new strategy to hearing restoration for individuals suffering from deafness.
Collapse
Affiliation(s)
- Qian Wang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- National Key Laboratory of General Artificial Intelligence, Peking University, Beijing 100871, China
| | - Lu Luo
- School of Psychology, Beijing Sport University, Beijing 100084, China
| | - Na Xu
- Division of Brain Sciences, Changping Laboratory, Beijing 102206, China
| | - Jing Wang
- Department of Neurology, Sanbo Brain Hospital, Capital Medical University, Beijing 100093, China
| | - Ruolin Yang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Guanpeng Chen
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Jie Ren
- Department of Functional Neurosurgery, Beijing Key Laboratory of Epilepsy, Sanbo Brain Hospital, Capital Medical University, Beijing 100093, China
- Epilepsy Center, Kunming Sanbo Brain Hospital, Kunming 650100 China
| | - Guoming Luan
- Department of Functional Neurosurgery, Beijing Key Laboratory of Epilepsy, Sanbo Brain Hospital, Capital Medical University, Beijing 100093, China
- Beijing Institute for Brain Disorders, Beijing 100069, China
| | - Fang Fang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
10
|
Commuri V, Kulasingham JP, Simon JZ. Cortical responses time-locked to continuous speech in the high-gamma band depend on selective attention. Front Neurosci 2023; 17:1264453. [PMID: 38156264 PMCID: PMC10752935 DOI: 10.3389/fnins.2023.1264453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/21/2023] [Indexed: 12/30/2023] Open
Abstract
Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of ~40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.
Collapse
Affiliation(s)
- Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, MD, United States
| |
Collapse
|
11
|
Schüller A, Schilling A, Krauss P, Rampp S, Reichenbach T. Attentional Modulation of the Cortical Contribution to the Frequency-Following Response Evoked by Continuous Speech. J Neurosci 2023; 43:7429-7440. [PMID: 37793908 PMCID: PMC10621774 DOI: 10.1523/jneurosci.1247-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/07/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open
Abstract
Selective attention to one of several competing speakers is required for comprehending a target speaker among other voices and for successful communication with them. It moreover has been found to involve the neural tracking of low-frequency speech rhythms in the auditory cortex. Effects of selective attention have also been found in subcortical neural activities, in particular regarding the frequency-following response related to the fundamental frequency of speech (speech-FFR). Recent investigations have, however, shown that the speech-FFR contains cortical contributions as well. It remains unclear whether these are also modulated by selective attention. Here we used magnetoencephalography to assess the attentional modulation of the cortical contributions to the speech-FFR. We presented both male and female participants with two competing speech signals and analyzed the cortical responses during attentional switching between the two speakers. Our findings revealed robust attentional modulation of the cortical contribution to the speech-FFR: the neural responses were higher when the speaker was attended than when they were ignored. We also found that, regardless of attention, a voice with a lower fundamental frequency elicited a larger cortical contribution to the speech-FFR than a voice with a higher fundamental frequency. Our results show that the attentional modulation of the speech-FFR does not only occur subcortically but extends to the auditory cortex as well.SIGNIFICANCE STATEMENT Understanding speech in noise requires attention to a target speaker. One of the speech features that a listener can use to identify a target voice among others and attend it is the fundamental frequency, together with its higher harmonics. The fundamental frequency arises from the opening and closing of the vocal folds and is tracked by high-frequency neural activity in the auditory brainstem and in the cortex. Previous investigations showed that the subcortical neural tracking is modulated by selective attention. Here we show that attention affects the cortical tracking of the fundamental frequency as well: it is stronger when a particular voice is attended than when it is ignored.
Collapse
Affiliation(s)
- Alina Schüller
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Achim Schilling
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Patrick Krauss
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
- Pattern Recognition Lab, Department Computer Science, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Stefan Rampp
- Department of Neurosurgery, University Hospital Erlangen, 91058 Erlangen, Germany
- Department of Neurosurgery, University Hospital Halle (Saale), 06120 Halle (Saale), Germany
- Department of Neuroradiology, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Tobias Reichenbach
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
12
|
Commuri V, Kulasingham JP, Simon JZ. Cortical Responses Time-Locked to Continuous Speech in the High-Gamma Band Depend on Selective Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.20.549567. [PMID: 37546895 PMCID: PMC10401961 DOI: 10.1101/2023.07.20.549567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of approximately 40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.
Collapse
Affiliation(s)
- Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, MD, United States
| |
Collapse
|
13
|
Ying R, Hamlette L, Nikoobakht L, Balaji R, Miko N, Caras ML. Organization of orbitofrontal-auditory pathways in the Mongolian gerbil. J Comp Neurol 2023; 531:1459-1481. [PMID: 37477903 PMCID: PMC10529810 DOI: 10.1002/cne.25525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/11/2023] [Accepted: 06/26/2023] [Indexed: 07/22/2023]
Abstract
Sound perception is highly malleable, rapidly adjusting to the acoustic environment and behavioral demands. This flexibility is the result of ongoing changes in auditory cortical activity driven by fluctuations in attention, arousal, or prior expectations. Recent work suggests that the orbitofrontal cortex (OFC) may mediate some of these rapid changes, but the anatomical connections between the OFC and the auditory system are not well characterized. Here, we used virally mediated fluorescent tracers to map the projection from OFC to the auditory midbrain, thalamus, and cortex in a classic animal model for auditory research, the Mongolian gerbil (Meriones unguiculatus). We observed no connectivity between the OFC and the auditory midbrain, and an extremely sparse connection between the dorsolateral OFC and higher order auditory thalamic regions. In contrast, we observed a robust connection between the ventral and medial subdivisions of the OFC and the auditory cortex, with a clear bias for secondary auditory cortical regions. OFC axon terminals were found in all auditory cortical lamina but were significantly more concentrated in the infragranular layers. Tissue-clearing and lightsheet microscopy further revealed that auditory cortical-projecting OFC neurons send extensive axon collaterals throughout the brain, targeting both sensory and non-sensory regions involved in learning, decision-making, and memory. These findings provide a more detailed map of orbitofrontal-auditory connections and shed light on the possible role of the OFC in supporting auditory cognition.
Collapse
Affiliation(s)
- Rose Ying
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, Maryland, 20742
- Department of Biology, University of Maryland, College Park, Maryland, 20742
- Center for Comparative and Evolutionary Biology of Hearing, University of Maryland, College Park, Maryland, 20742
| | - Lashaka Hamlette
- Department of Biology, University of Maryland, College Park, Maryland, 20742
| | - Laudan Nikoobakht
- Department of Biology, University of Maryland, College Park, Maryland, 20742
| | - Rakshita Balaji
- Department of Biology, University of Maryland, College Park, Maryland, 20742
| | - Nicole Miko
- Department of Biology, University of Maryland, College Park, Maryland, 20742
| | - Melissa L. Caras
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, Maryland, 20742
- Department of Biology, University of Maryland, College Park, Maryland, 20742
- Center for Comparative and Evolutionary Biology of Hearing, University of Maryland, College Park, Maryland, 20742
| |
Collapse
|
14
|
Grijseels DM, Prendergast BJ, Gorman JC, Miller CT. The neurobiology of vocal communication in marmosets. Ann N Y Acad Sci 2023; 1528:13-28. [PMID: 37615212 PMCID: PMC10592205 DOI: 10.1111/nyas.15057] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
An increasingly popular animal model for studying the neural basis of social behavior, cognition, and communication is the common marmoset (Callithrix jacchus). Interest in this New World primate across neuroscience is now being driven by their proclivity for prosociality across their repertoire, high volubility, and rapid development, as well as their amenability to naturalistic testing paradigms and freely moving neural recording and imaging technologies. The complement of these characteristics set marmosets up to be a powerful model of the primate social brain in the years to come. Here, we focus on vocal communication because it is the area that has both made the most progress and illustrates the prodigious potential of this species. We review the current state of the field with a focus on the various brain areas and networks involved in vocal perception and production, comparing the findings from marmosets to other animals, including humans.
Collapse
Affiliation(s)
- Dori M Grijseels
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
| | - Brendan J Prendergast
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
| | - Julia C Gorman
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, California, USA
| | - Cory T Miller
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, California, USA
| |
Collapse
|
15
|
Raghavan VS, O’Sullivan J, Bickel S, Mehta AD, Mesgarani N. Distinct neural encoding of glimpsed and masked speech in multitalker situations. PLoS Biol 2023; 21:e3002128. [PMID: 37279203 PMCID: PMC10243639 DOI: 10.1371/journal.pbio.3002128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/19/2023] [Indexed: 06/08/2023] Open
Abstract
Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.
Collapse
Affiliation(s)
- Vinay S Raghavan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - James O’Sullivan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - Stephan Bickel
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
- Department of Neurology, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Ashesh D. Mehta
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
16
|
Ahmed F, Nidiffer AR, O'Sullivan AE, Zuk NJ, Lalor EC. The integration of continuous audio and visual speech in a cocktail-party environment depends on attention. Neuroimage 2023; 274:120143. [PMID: 37121375 DOI: 10.1016/j.neuroimage.2023.120143] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 03/17/2023] [Accepted: 04/27/2023] [Indexed: 05/02/2023] Open
Abstract
In noisy environments, our ability to understand speech benefits greatly from seeing the speaker's face. This is attributed to the brain's ability to integrate audio and visual information, a process known as multisensory integration. In addition, selective attention plays an enormous role in what we understand, the so-called cocktail-party phenomenon. But how attention and multisensory integration interact remains incompletely understood, particularly in the case of natural, continuous speech. Here, we addressed this issue by analyzing EEG data recorded from participants who undertook a multisensory cocktail-party task using natural speech. To assess multisensory integration, we modeled the EEG responses to the speech in two ways. The first assumed that audiovisual speech processing is simply a linear combination of audio speech processing and visual speech processing (i.e., an A+V model), while the second allows for the possibility of audiovisual interactions (i.e., an AV model). Applying these models to the data revealed that EEG responses to attended audiovisual speech were better explained by an AV model, providing evidence for multisensory integration. In contrast, unattended audiovisual speech responses were best captured using an A+V model, suggesting that multisensory integration is suppressed for unattended speech. Follow up analyses revealed some limited evidence for early multisensory integration of unattended AV speech, with no integration occurring at later levels of processing. We take these findings as evidence that the integration of natural audio and visual speech occurs at multiple levels of processing in the brain, each of which can be differentially affected by attention.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
| | - Aisling E O'Sullivan
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Nathaniel J Zuk
- Edmond & Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem, Israel
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.
| |
Collapse
|
17
|
Manting CL, Gulyas B, Ullén F, Lundqvist D. Steady-state responses to concurrent melodies: source distribution, top-down, and bottom-up attention. Cereb Cortex 2023; 33:3053-3066. [PMID: 35858223 PMCID: PMC10016039 DOI: 10.1093/cercor/bhac260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 06/03/2022] [Accepted: 06/03/2022] [Indexed: 11/13/2022] Open
Abstract
Humans can direct attentional resources to a single sound occurring simultaneously among others to extract the most behaviourally relevant information present. To investigate this cognitive phenomenon in a precise manner, we used frequency-tagging to separate neural auditory steady-state responses (ASSRs) that can be traced back to each auditory stimulus, from the neural mix elicited by multiple simultaneous sounds. Using a mixture of 2 frequency-tagged melody streams, we instructed participants to selectively attend to one stream or the other while following the development of the pitch contour. Bottom-up attention towards either stream was also manipulated with salient changes in pitch. Distributed source analyses of magnetoencephalography measurements showed that the effect of ASSR enhancement from top-down driven attention was strongest at the left frontal cortex, while that of bottom-up driven attention was dominant at the right temporal cortex. Furthermore, the degree of ASSR suppression from simultaneous stimuli varied across cortical lobes and hemisphere. The ASSR source distribution changes from temporal-dominance during single-stream perception, to proportionally more activity in the frontal and centro-parietal cortical regions when listening to simultaneous streams. These findings are a step forward to studying cognition in more complex and naturalistic soundscapes using frequency-tagging.
Collapse
Affiliation(s)
| | - Balazs Gulyas
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
- Cognitive Neuroimaging Centre (CoNiC), Lee Kong Chien School of Medicine, Nanyang Technological University, Singapore 636921, Singapore
| | - Fredrik Ullén
- Department of Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt 60322, Germany
| | - Daniel Lundqvist
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
| |
Collapse
|
18
|
Makov S, Pinto D, Har-Shai Yahav P, Miller LM, Zion Golumbic E. "Unattended, distracting or irrelevant": Theoretical implications of terminological choices in auditory selective attention research. Cognition 2023; 231:105313. [PMID: 36344304 DOI: 10.1016/j.cognition.2022.105313] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 09/30/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022]
Abstract
For seventy years, auditory selective attention research has focused on studying the cognitive mechanisms of prioritizing the processing a 'main' task-relevant stimulus, in the presence of 'other' stimuli. However, a closer look at this body of literature reveals deep empirical inconsistencies and theoretical confusion regarding the extent to which this 'other' stimulus is processed. We argue that many key debates regarding attention arise, at least in part, from inappropriate terminological choices for experimental variables that may not accurately map onto the cognitive constructs they are meant to describe. Here we critically review the more common or disruptive terminological ambiguities, differentiate between methodology-based and theory-derived terms, and unpack the theoretical assumptions underlying different terminological choices. Particularly, we offer an in-depth analysis of the terms 'unattended' and 'distractor' and demonstrate how their use can lead to conflicting theoretical inferences. We also offer a framework for thinking about terminology in a more productive and precise way, in hope of fostering more productive debates and promoting more nuanced and accurate cognitive models of selective attention.
Collapse
Affiliation(s)
- Shiri Makov
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Paz Har-Shai Yahav
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Lee M Miller
- The Center for Mind and Brain, University of California, Davis, CA, United States of America; Department of Neurobiology, Physiology, & Behavior, University of California, Davis, CA, United States of America; Department of Otolaryngology / Head and Neck Surgery, University of California, Davis, CA, United States of America
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel.
| |
Collapse
|
19
|
Simon JZ, Commuri V, Kulasingham JP. Time-locked auditory cortical responses in the high-gamma band: A window into primary auditory cortex. Front Neurosci 2022; 16:1075369. [PMID: 36570848 PMCID: PMC9773383 DOI: 10.3389/fnins.2022.1075369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022] Open
Abstract
Primary auditory cortex is a critical stage in the human auditory pathway, a gateway between subcortical and higher-level cortical areas. Receiving the output of all subcortical processing, it sends its output on to higher-level cortex. Non-invasive physiological recordings of primary auditory cortex using electroencephalography (EEG) and magnetoencephalography (MEG), however, may not have sufficient specificity to separate responses generated in primary auditory cortex from those generated in underlying subcortical areas or neighboring cortical areas. This limitation is important for investigations of effects of top-down processing (e.g., selective-attention-based) on primary auditory cortex: higher-level areas are known to be strongly influenced by top-down processes, but subcortical areas are often assumed to perform strictly bottom-up processing. Fortunately, recent advances have made it easier to isolate the neural activity of primary auditory cortex from other areas. In this perspective, we focus on time-locked responses to stimulus features in the high gamma band (70-150 Hz) and with early cortical latency (∼40 ms), intermediate between subcortical and higher-level areas. We review recent findings from physiological studies employing either repeated simple sounds or continuous speech, obtaining either a frequency following response (FFR) or temporal response function (TRF). The potential roles of top-down processing are underscored, and comparisons with invasive intracranial EEG (iEEG) and animal model recordings are made. We argue that MEG studies employing continuous speech stimuli may offer particular benefits, in that only a few minutes of speech generates robust high gamma responses from bilateral primary auditory cortex, and without measurable interference from subcortical or higher-level areas.
Collapse
Affiliation(s)
- Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States,Department of Biology, University of Maryland, College Park, College Park, MD, United States,Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States,*Correspondence: Jonathan Z. Simon,
| | - Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
| | | |
Collapse
|
20
|
Lee JH, Shim H, Gantz B, Choi I. Strength of Attentional Modulation on Cortical Auditory Evoked Responses Correlates with Speech-in-Noise Performance in Bimodal Cochlear Implant Users. Trends Hear 2022; 26:23312165221141143. [PMID: 36464791 PMCID: PMC9726851 DOI: 10.1177/23312165221141143] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Auditory selective attention is a crucial top-down cognitive mechanism for understanding speech in noise. Cochlear implant (CI) users display great variability in speech-in-noise performance that is not easily explained by peripheral auditory profile or demographic factors. Thus, it is imperative to understand if auditory cognitive processes such as selective attention explain such variability. The presented study directly addressed this question by quantifying attentional modulation of cortical auditory responses during an attention task and comparing its individual differences with speech-in-noise performance. In our attention experiment, participants with CI were given a pre-stimulus visual cue that directed their attention to either of two speech streams and were asked to select a deviant syllable in the target stream. The two speech streams consisted of the female voice saying "Up" five times every 800 ms and the male voice saying "Down" four times every 1 s. The onset of each syllable elicited distinct event-related potentials (ERPs). At each syllable onset, the difference in the amplitudes of ERPs between the two attentional conditions (attended - ignored) was computed. This ERP amplitude difference served as a proxy for attentional modulation strength. Our group-level analysis showed that the amplitude of ERPs was greater when the syllable was attended than ignored, exhibiting that attention modulated cortical auditory responses. Moreover, the strength of attentional modulation showed a significant correlation with speech-in-noise performance. These results suggest that the attentional modulation of cortical auditory responses may provide a neural marker for predicting CI users' success in clinical tests of speech-in-noise listening.
Collapse
Affiliation(s)
- Jae-Hee Lee
- Dept. Communication Sciences and Disorders, University of Iowa, Iowa City, IA, 52242, USA,Dept. Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, IA, 52242, USA
| | - Hwan Shim
- Dept. Electrical and Computer Engineering Technology, Rochester Institute of Technology, Rochester, NY, 14623, USA
| | - Bruce Gantz
- Dept. Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, IA, 52242, USA
| | - Inyong Choi
- Dept. Communication Sciences and Disorders, University of Iowa, Iowa City, IA, 52242, USA,Dept. Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, IA, 52242, USA,Inyong Choi, 250 Hawkins Dr., Iowa City, IA 52242, USA.
| |
Collapse
|
21
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
22
|
Zhou LF, Zhao D, Cui X, Guo B, Zhu F, Feng C, Wang J, Meng M. Separate neural subsystems support goal-directed speech listening. Neuroimage 2022; 263:119613. [PMID: 36075539 DOI: 10.1016/j.neuroimage.2022.119613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 08/30/2022] [Accepted: 09/04/2022] [Indexed: 10/31/2022] Open
Abstract
How do humans excel at tracking the narrative of a particular speaker with a distracting noisy background? This feat places great demands on the collaboration between speech processing and goal-related regulatory functions. Here, we propose that separate subsystems with different cross-task dynamic activity properties and distinct functional purposes support goal-directed speech listening. We adopted a naturalistic dichotic speech listening paradigm in which listeners were instructed to attend to only one narrative from two competing inputs. Using functional magnetic resonance imaging with inter- and intra-subject correlation techniques, we discovered a dissociation in response consistency in temporal, parietal and frontal brain areas as the task demand varied. Specifically, some areas in the bilateral temporal cortex (SomMotB_Aud and TempPar) and lateral prefrontal cortex (DefaultB_PFCl and ContA_PFCl) always showed consistent activation across subjects and across scan runs, regardless of the task demand. In contrast, some areas in the parietal cortex (DefaultA_pCunPCC and ContC_pCun) responded reliably only when the task goal remained the same. These results suggested two dissociated functional neural networks that were independently validated by performing a data-driven clustering analysis of voxelwise functional connectivity patterns. A subsequent meta-analysis revealed distinct functional profiles for these two brain correlation maps. The different-task correlation map was strongly associated with language-related processes (e.g., listening, speech and sentences), whereas the same-task versus different-task correlation map was linked to self-referencing functions (e.g., default mode, theory of mind and autobiographical topics). Altogether, the three-pronged findings revealed two anatomically and functionally dissociated subsystems supporting goal-directed speech listening.
Collapse
Affiliation(s)
- Liu-Fang Zhou
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China; Key Lab of BI-AI Collaborated Information Behavior, School of Business and Management, Shanghai International Studies University, Shanghai, 201620, China
| | - Dan Zhao
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China; Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China
| | - Xuan Cui
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China
| | - Bingbing Guo
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China; School of Teacher Education, Nanjing Xiaozhuang University, Nanjing, 211171, China
| | - Fangwei Zhu
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China; Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China
| | - Chunliang Feng
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China
| | - Jinhui Wang
- Key Laboratory of Brain, Cognition and Education Sciences (South China Normal University), Ministry of Education, China; Center for Studies of Psychological Application, Guangdong Key Laboratory of Mental Health and Cognitive Science, Institute for Brain Research and Rehabilitation, South China Normal University, Guangzhou, 510631, China.
| | - Ming Meng
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China.
| |
Collapse
|
23
|
Xie Y, Ma J. How to discern external acoustic waves in a piezoelectric neuron under noise? J Biol Phys 2022; 48:339-353. [PMID: 35948818 PMCID: PMC9411441 DOI: 10.1007/s10867-022-09611-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/27/2022] [Indexed: 10/15/2022] Open
Abstract
Biological neurons keep sensitive to external stimuli and appropriate firing modes can be triggered to give effective response to external chemical and physical signals. A piezoelectric neural circuit can perceive external voice and nonlinear vibration by generating equivalent piezoelectric voltage, which can generate an equivalent trans-membrane current for inducing a variety of firing modes in the neural activities. Biological neurons can receive external stimuli from more ion channels and synapse synchronously, but the further encoding and priority in mode selection are competitive. In particular, noisy disturbance and electromagnetic radiation make it more difficult in signals identification and mode selection in the firing patterns of neurons driven by multi-channel signals. In this paper, two different periodic signals accompanied by noise are used to excite the piezoelectric neural circuit, and the signal processing in the piezoelectric neuron driven by acoustic waves under noise is reproduced and explained. The physical energy of the piezoelectric neural circuit and Hamilton energy in the neuron driven by mixed signals are calculated to explain the biophysical mechanism of auditory neuron when external stimuli are applied. It is found that the neuron prefers to respond to the external stimulus with higher physical energy and the signal which can increase the Hamilton energy of the neuron. For example, stronger inputs used to inject higher energy and it is detected and responded more sensitively. The involvement of noise is helpful to detect the external signal under stochastic resonance, and the additive noise changes the excitability of neuron as the external stimulus. The results indicate that energy controls the firing patterns and mode selection in neurons, and it provides clues to control the neural activities by injecting appropriate energy into the neurons and network.
Collapse
Affiliation(s)
- Ying Xie
- Department of Physics, Lanzhou University of Technology, Lanzhou, 730050, China
| | - Jun Ma
- Department of Physics, Lanzhou University of Technology, Lanzhou, 730050, China.
- School of Science, Chongqing University of Posts and Telecommunications, Chongqing, 430065, China.
| |
Collapse
|
24
|
Xu Z, Bai Y, Zhao R, Zheng Q, Ni G, Ming D. Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction. Hear Res 2022; 422:108552. [PMID: 35714555 DOI: 10.1016/j.heares.2022.108552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 06/01/2022] [Accepted: 06/08/2022] [Indexed: 11/23/2022]
Abstract
In the cocktail party circumstance, the human auditory system extracts the information from a specific speaker of interest and ignores others. Many studies have focused on auditory attention decoding (AAD), but the stimulation materials were mainly non-tonal languages. We used a tonal language (Mandarin) as the speech stimulus and constructed a Long Short-Term Memory (LSTM) architecture for speech envelope reconstruction based on electroencephalogram (EEG) data. The correlation coefficient between the reconstructed and candidate envelopes was calculated to determine the subject's auditory attention. The proposed LSTM architecture outperformed the linear models. The average decoding accuracy in cross-subject and inter-subject cases varies from 63.02 to 74.29%, with the highest accuracy rate of 89.1% in a decision window of 0.15 s. In addition, the beta-band rhythm was found to play an essential role in identifying the attention and the non-attention state. These results provide a new AAD architecture to help develop neuro-steered hearing devices, especially for tonal languages.
Collapse
Affiliation(s)
- Zihao Xu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Yanru Bai
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Ran Zhao
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Qi Zheng
- Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Guangjian Ni
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China; Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China.
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China; Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
25
|
Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception. Curr Biol 2022; 32:3971-3986.e4. [PMID: 35973430 DOI: 10.1016/j.cub.2022.07.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/08/2022] [Accepted: 07/19/2022] [Indexed: 11/20/2022]
Abstract
How the human auditory cortex represents spatially separated simultaneous talkers and how talkers' locations and voices modulate the neural representations of attended and unattended speech are unclear. Here, we measured the neural responses from electrodes implanted in neurosurgical patients as they performed single-talker and multi-talker speech perception tasks. We found that spatial separation between talkers caused a preferential encoding of the contralateral speech in Heschl's gyrus (HG), planum temporale (PT), and superior temporal gyrus (STG). Location and spectrotemporal features were encoded in different aspects of the neural response. Specifically, the talker's location changed the mean response level, whereas the talker's spectrotemporal features altered the variation of response around response's baseline. These components were differentially modulated by the attended talker's voice or location, which improved the population decoding of attended speech features. Attentional modulation due to the talker's voice only appeared in the auditory areas with longer latencies, but attentional modulation due to location was present throughout. Our results show that spatial multi-talker speech perception relies upon a separable pre-attentive neural representation, which could be further tuned by top-down attention to the location and voice of the talker.
Collapse
|
26
|
Myers JC, Smith EH, Leszczynski M, O'Sullivan J, Yates MJ, McKhann G, Mesgarani N, Schroeder C, Schevon C, Sheth SA. The Spatial Reach of Neuronal Coherence and Spike-Field Coupling across the Human Neocortex. J Neurosci 2022; 42:6285-6294. [PMID: 35790403 PMCID: PMC9374135 DOI: 10.1523/jneurosci.0050-22.2022] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 04/21/2022] [Accepted: 05/25/2022] [Indexed: 11/21/2022] Open
Abstract
Neuronal coherence is thought to be a fundamental mechanism of communication in the brain, where synchronized field potentials coordinate synaptic and spiking events to support plasticity and learning. Although the spread of field potentials has garnered great interest, little is known about the spatial reach of phase synchronization, or neuronal coherence. Functional connectivity between different brain regions is known to occur across long distances, but the locality of synchronization across the neocortex is understudied. Here we used simultaneous recordings from electrocorticography (ECoG) grids and high-density microelectrode arrays to estimate the spatial reach of neuronal coherence and spike-field coherence (SFC) across frontal, temporal, and occipital cortices during cognitive tasks in humans. We observed the strongest coherence within a 2-3 cm distance from the microelectrode arrays, potentially defining an effective range for local communication. This range was relatively consistent across brain regions, spectral frequencies, and cognitive tasks. The magnitude of coherence showed power law decay with increasing distance from the microelectrode arrays, where the highest coherence occurred between ECoG contacts, followed by coherence between ECoG and deep cortical local field potential (LFP), and then SFC (i.e., ECoG > LFP > SFC). The spectral frequency of coherence also affected its magnitude. Alpha coherence (8-14 Hz) was generally higher than other frequencies for signals nearest the microelectrode arrays, whereas delta coherence (1-3 Hz) was higher for signals that were farther away. Action potentials in all brain regions were most coherent with the phase of alpha oscillations, which suggests that alpha waves could play a larger, more spatially local role in spike timing than other frequencies. These findings provide a deeper understanding of the spatial and spectral dynamics of neuronal synchronization, further advancing knowledge about how activity propagates across the human brain.SIGNIFICANCE STATEMENT Coherence is theorized to facilitate information transfer across cerebral space by providing a convenient electrophysiological mechanism to modulate membrane potentials in spatiotemporally complex patterns. Our work uses a multiscale approach to evaluate the spatial reach of phase coherence and spike-field coherence during cognitive tasks in humans. Locally, coherence can reach up to 3 cm around a given area of neocortex. The spectral properties of coherence revealed that alpha phase-field and spike-field coherence were higher within ranges <2 cm, whereas lower-frequency delta coherence was higher for contacts farther away. Spatiotemporally shared information (i.e., coherence) across neocortex seems to reach farther than field potentials alone.
Collapse
Affiliation(s)
- John C Myers
- Department of Neurosurgery, Baylor College of Medicine, Houston, Texas 77030
| | - Elliot H Smith
- Department of Neurosurgery, University of Utah, Salt Lake City, Utah 84132
- Department of Neurology, Columbia University, New York, New York 10032
| | | | - James O'Sullivan
- Department of Electrical Engineering, Columbia University, New York, New York 10027
| | - Mark J Yates
- Department of Psychiatry, Columbia University, New York, New York 10032
| | - Guy McKhann
- Department of Psychiatry, Columbia University, New York, New York 10032
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York 10027
| | - Charles Schroeder
- Department of Psychiatry, Columbia University, New York, New York 10032
| | - Catherine Schevon
- Department of Neurology, Columbia University, New York, New York 10032
| | - Sameer A Sheth
- Department of Neurosurgery, Baylor College of Medicine, Houston, Texas 77030
| |
Collapse
|
27
|
Brodbeck C, Simon JZ. Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Front Neurosci 2022; 16:828546. [PMID: 36003957 PMCID: PMC9393379 DOI: 10.3389/fnins.2022.828546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker’s fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker’s speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- *Correspondence: Christian Brodbeck,
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
28
|
Decoding Selective Auditory Attention with EEG using A Transformer Model. Methods 2022; 204:410-417. [DOI: 10.1016/j.ymeth.2022.04.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/20/2022] [Accepted: 04/14/2022] [Indexed: 11/23/2022] Open
|
29
|
Wang L, Liu H, Zhang X, Zhao S, Guo L, Han J, Hu X. Exploring Hierarchical Auditory Representation via a Neural Encoding Model. Front Neurosci 2022; 16:843988. [PMID: 35401085 PMCID: PMC8987159 DOI: 10.3389/fnins.2022.843988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 02/16/2022] [Indexed: 11/13/2022] Open
Abstract
By integrating hierarchical feature modeling of auditory information using deep neural networks (DNNs), recent functional magnetic resonance imaging (fMRI) encoding studies have revealed the hierarchical neural auditory representation in the superior temporal gyrus (STG). Most of these studies adopted supervised DNNs (e.g., for audio classification) to derive the hierarchical feature representation of external auditory stimuli. One possible limitation is that the extracted features could be biased toward discriminative features while ignoring general attributes shared by auditory information in multiple categories. Consequently, the hierarchy of neural acoustic processing revealed by the encoding model might be biased toward classification. In this study, we explored the hierarchical neural auditory representation via an fMRI encoding framework in which an unsupervised deep convolutional auto-encoder (DCAE) model was adopted to derive the hierarchical feature representations of the stimuli (naturalistic auditory excerpts in different categories) in fMRI acquisition. The experimental results showed that the neural representation of hierarchical auditory features is not limited to previously reported STG, but also involves the bilateral insula, ventral visual cortex, and thalamus. The current study may provide complementary evidence to understand the hierarchical auditory processing in the human brain.
Collapse
Affiliation(s)
- Liting Wang
- School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Huan Liu
- School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Xin Zhang
- Institute of Medical Research, Northwestern Polytechnical University, Xi’an, China
| | - Shijie Zhao
- School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Lei Guo
- School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Junwei Han
- School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Xintao Hu
- School of Automation, Northwestern Polytechnical University, Xi’an, China
- *Correspondence: Xintao Hu,
| |
Collapse
|
30
|
Di Liberto GM, Hjortkjær J, Mesgarani N. Editorial: Neural Tracking: Closing the Gap Between Neurophysiology and Translational Medicine. Front Neurosci 2022; 16:872600. [PMID: 35368278 PMCID: PMC8966872 DOI: 10.3389/fnins.2022.872600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 02/17/2022] [Indexed: 11/25/2022] Open
Affiliation(s)
- Giovanni M. Di Liberto
- School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
- ADAPT Centre, d-real, Trinity College Institute for Neuroscience, Dublin, Ireland
- *Correspondence: Giovanni M. Di Liberto
| | - Jens Hjortkjær
- Hearing Systems Group, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Ireland
| | - Nima Mesgarani
- Electrical Engineering Department, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States
| |
Collapse
|
31
|
Teoh ES, Ahmed F, Lalor EC. Attention Differentially Affects Acoustic and Phonetic Feature Encoding in a Multispeaker Environment. J Neurosci 2022; 42:682-691. [PMID: 34893546 PMCID: PMC8805628 DOI: 10.1523/jneurosci.1455-20.2021] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/28/2021] [Accepted: 09/29/2021] [Indexed: 11/21/2022] Open
Abstract
Humans have the remarkable ability to selectively focus on a single talker in the midst of other competing talkers. The neural mechanisms that underlie this phenomenon remain incompletely understood. In particular, there has been longstanding debate over whether attention operates at an early or late stage in the speech processing hierarchy. One way to better understand this is to examine how attention might differentially affect neurophysiological indices of hierarchical acoustic and linguistic speech representations. In this study, we do this by using encoding models to identify neural correlates of speech processing at various levels of representation. Specifically, we recorded EEG from fourteen human subjects (nine female and five male) during a "cocktail party" attention experiment. Model comparisons based on these data revealed phonetic feature processing for attended, but not unattended speech. Furthermore, we show that attention specifically enhances isolated indices of phonetic feature processing, but that such attention effects are not apparent for isolated measures of acoustic processing. These results provide new insights into the effects of attention on different prelexical representations of speech, insights that complement recent anatomic accounts of the hierarchical encoding of attended speech. Furthermore, our findings support the notion that, for attended speech, phonetic features are processed as a distinct stage, separate from the processing of the speech acoustics.SIGNIFICANCE STATEMENT Humans are very good at paying attention to one speaker in an environment with multiple speakers. However, the details of how attended and unattended speech are processed differently by the brain is not completely clear. Here, we explore how attention affects the processing of the acoustic sounds of speech as well as the mapping of those sounds onto categorical phonetic features. We find evidence of categorical phonetic feature processing for attended, but not unattended speech. Furthermore, we find evidence that categorical phonetic feature processing is enhanced by attention, but acoustic processing is not. These findings add an important new layer in our understanding of how the human brain solves the cocktail party problem.
Collapse
Affiliation(s)
- Emily S Teoh
- School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College, University of Dublin, Dublin 2, Ireland
| | - Farhin Ahmed
- Department of Neuroscience, Department of Biomedical Engineering, and Del Monte Neuroscience Institute, University of Rochester, Rochester, New York 14627
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College, University of Dublin, Dublin 2, Ireland
- Department of Neuroscience, Department of Biomedical Engineering, and Del Monte Neuroscience Institute, University of Rochester, Rochester, New York 14627
| |
Collapse
|
32
|
Ylinen A, Wikman P, Leminen M, Alho K. Task-dependent cortical activations during selective attention to audiovisual speech. Brain Res 2022; 1775:147739. [PMID: 34843702 DOI: 10.1016/j.brainres.2021.147739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 10/21/2021] [Accepted: 11/21/2021] [Indexed: 11/28/2022]
Abstract
Selective listening to speech depends on widespread networks of the brain, but how the involvement of different neural systems in speech processing is affected by factors such as the task performed by a listener and speech intelligibility remains poorly understood. We used functional magnetic resonance imaging to systematically examine the effects that performing different tasks has on neural activations during selective attention to continuous audiovisual speech in the presence of task-irrelevant speech. Participants viewed audiovisual dialogues and attended either to the semantic or the phonological content of speech, or ignored speech altogether and performed a visual control task. The tasks were factorially combined with good and poor auditory and visual speech qualities. Selective attention to speech engaged superior temporal regions and the left inferior frontal gyrus regardless of the task. Frontoparietal regions implicated in selective auditory attention to simple sounds (e.g., tones, syllables) were not engaged by the semantic task, suggesting that this network may not be not as crucial when attending to continuous speech. The medial orbitofrontal cortex, implicated in social cognition, was most activated by the semantic task. Activity levels during the phonological task in the left prefrontal, premotor, and secondary somatosensory regions had a distinct temporal profile as well as the highest overall activity, possibly relating to the role of the dorsal speech processing stream in sub-lexical processing. Our results demonstrate that the task type influences neural activations during selective attention to speech, and emphasize the importance of ecologically valid experimental designs.
Collapse
Affiliation(s)
- Artturi Ylinen
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.
| | - Patrik Wikman
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland; Department of Neuroscience, Georgetown University, Washington D.C., USA
| | - Miika Leminen
- Analytics and Data Services, HUS Helsinki University Hospital, Helsinki, Finland
| | - Kimmo Alho
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland; Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
33
|
Petersen EB. Hearing-Aid Directionality Improves Neural Speech Tracking in Older Hearing-Impaired Listeners. Trends Hear 2022; 26:23312165221099894. [PMID: 35730193 PMCID: PMC9228639 DOI: 10.1177/23312165221099894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
In recent years, a growing body of literature has explored the effect of hearing impairment on the neural processing of speech, particularly related to the neural tracking of speech envelopes. However, only limited work has focused on the potential usage of the method for evaluating the effect of hearing aids designed to amplify and process the auditory input provided to hearing-impaired listeners. The current study investigates how directional sound processing in hearing-aids, denoted directionality, affects the neural tracking and encoding of speech in EEG recorded from 11 older hearing-impaired listeners. Behaviorally, the task performance improved when directionality was applied, while subjective ratings of listening effort were not affected. The reconstruction of the to-be-attended speech envelopes improved significantly when applying directionality, as well as when removing the background noise altogether. When inspecting the modelled response of the neural encoding of speech, a faster transition was observed between the early bottom-up response and the later top-down attentional-driven responses when directionality was applied. In summary, hearing-aid directionality affects both the neural speech tracking and neural encoding of to-be-attended speech. This result shows that hearing-aid signal processing impacts the neural processing of sounds and that neural speech tracking is indicative of the benefits associated with applying hearing-aid processing algorithms.
Collapse
|
34
|
Effect of Noise Reduction on Cortical Speech-in-Noise Processing and Its Variance due to Individual Noise Tolerance. Ear Hear 2021; 43:849-861. [PMID: 34751679 PMCID: PMC9010348 DOI: 10.1097/aud.0000000000001144] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Despite the widespread use of noise reduction (NR) in modern digital hearing aids, our neurophysiological understanding of how NR affects speech-in-noise perception and why its effect is variable is limited. The current study aimed to (1) characterize the effect of NR on the neural processing of target speech and (2) seek neural determinants of individual differences in the NR effect on speech-in-noise performance, hypothesizing that an individual's own capability to inhibit background noise would inversely predict NR benefits in speech-in-noise perception. DESIGN Thirty-six adult listeners with normal hearing participated in the study. Behavioral and electroencephalographic responses were simultaneously obtained during a speech-in-noise task in which natural monosyllabic words were presented at three different signal-to-noise ratios, each with NR off and on. A within-subject analysis assessed the effect of NR on cortical evoked responses to target speech in the temporal-frontal speech and language brain regions, including supramarginal gyrus and inferior frontal gyrus in the left hemisphere. In addition, an across-subject analysis related an individual's tolerance to noise, measured as the amplitude ratio of auditory-cortical responses to target speech and background noise, to their speech-in-noise performance. RESULTS At the group level, in the poorest signal-to-noise ratio condition, NR significantly increased early supramarginal gyrus activity and decreased late inferior frontal gyrus activity, indicating a switch to more immediate lexical access and less effortful cognitive processing, although no improvement in behavioral performance was found. The across-subject analysis revealed that the cortical index of individual noise tolerance significantly correlated with NR-driven changes in speech-in-noise performance. CONCLUSIONS NR can facilitate speech-in-noise processing despite no improvement in behavioral performance. Findings from the current study also indicate that people with lower noise tolerance are more likely to get more benefits from NR. Overall, results suggest that future research should take a mechanistic approach to NR outcomes and individual noise tolerance.
Collapse
|
35
|
Decoding Object-Based Auditory Attention from Source-Reconstructed MEG Alpha Oscillations. J Neurosci 2021; 41:8603-8617. [PMID: 34429378 DOI: 10.1523/jneurosci.0583-21.2021] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/08/2021] [Accepted: 08/11/2021] [Indexed: 11/21/2022] Open
Abstract
How do we attend to relevant auditory information in complex naturalistic scenes? Much research has focused on detecting which information is attended, without regarding underlying top-down control mechanisms. Studies investigating attentional control generally manipulate and cue specific features in simple stimuli. However, in naturalistic scenes it is impossible to dissociate relevant from irrelevant information based on low-level features. Instead, the brain has to parse and select auditory objects of interest. The neural underpinnings of object-based auditory attention remain not well understood. Here we recorded MEG while 15 healthy human subjects (9 female) prepared for the repetition of an auditory object presented in one of two overlapping naturalistic auditory streams. The stream containing the repetition was prospectively cued with 70% validity. Crucially, this task could not be solved by attending low-level features, but only by processing the objects fully. We trained a linear classifier on the cortical distribution of source-reconstructed oscillatory activity to distinguish which auditory stream was attended. We could successfully classify the attended stream from alpha (8-14 Hz) activity in anticipation of repetition onset. Importantly, attention could only be classified from trials in which subjects subsequently detected the repetition, but not from miss trials. Behavioral relevance was further supported by a correlation between classification accuracy and detection performance. Decodability was not sustained throughout stimulus presentation, but peaked shortly before repetition onset, suggesting that attention acted transiently according to temporal expectations. We thus demonstrate anticipatory alpha oscillations to underlie top-down control of object-based auditory attention in complex naturalistic scenes.SIGNIFICANCE STATEMENT In everyday life, we often find ourselves bombarded with auditory information, from which we need to select what is relevant to our current goals. Previous research has highlighted how we attend to specific highly controlled aspects of the auditory input. Although invaluable, it is still unclear how this relates to attentional control in naturalistic auditory scenes. Here we used the high precision of magnetoencephalography in space and time to investigate the brain mechanisms underlying top-down control of object-based attention in ecologically valid sound scenes. We show that rhythmic activity in auditory association cortex at a frequency of ∼10 Hz (alpha waves) controls attention to currently relevant segments within the auditory scene and predicts whether these segments are subsequently detected.
Collapse
|
36
|
Rezaeizadeh M, Shamma S. Binding the Acoustic Features of an Auditory Source through Temporal Coherence. Cereb Cortex Commun 2021; 2:tgab060. [PMID: 34746791 PMCID: PMC8567849 DOI: 10.1093/texcom/tgab060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 09/29/2021] [Accepted: 09/30/2021] [Indexed: 11/25/2022] Open
Abstract
Numerous studies have suggested that the perception of a target sound stream (or source) can only be segregated from a complex acoustic background mixture if the acoustic features underlying its perceptual attributes (e.g., pitch, location, and timbre) induce temporally modulated responses that are mutually correlated (or coherent), and that are uncorrelated (incoherent) from those of other sources in the mixture. This "temporal coherence" hypothesis asserts that attentive listening to one acoustic feature of a target enhances brain responses to that feature but would also concomitantly (1) induce mutually excitatory influences with other coherently responding neurons, thus enhancing (or binding) them all as they respond to the attended source; by contrast, (2) suppressive interactions are hypothesized to build up among neurons driven by temporally incoherent sound features, thus relatively reducing their activity. In this study, we report on EEG measurements in human subjects engaged in various sound segregation tasks that demonstrate rapid binding among the temporally coherent features of the attended source regardless of their identity (pure tone components, tone complexes, or noise), harmonic relationship, or frequency separation, thus confirming the key role temporal coherence plays in the analysis and organization of auditory scenes.
Collapse
Affiliation(s)
- Mohsen Rezaeizadeh
- Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park 20742, USA
| | - Shihab Shamma
- Department of Electrical and Computer Engineering and Institute for Systems Research, University of Maryland, College Park 20742, USA
- Laboratoire des systèmes perceptifs, Département d’études cognitive, Ecole Normale Supérieure, PSL, 75005 Paris, France
| |
Collapse
|
37
|
Abstract
Hearing aids continue to acquire increasingly sophisticated sound-processing features beyond basic amplification. On the one hand, these have the potential to add user benefit and allow for personalization. On the other hand, if such features are to benefit according to their potential, they require clinicians to be acquainted with both the underlying technologies and the specific fitting handles made available by the individual hearing aid manufacturers. Ensuring benefit from hearing aids in typical daily listening environments requires that the hearing aids handle sounds that interfere with communication, generically referred to as “noise.” With this aim, considerable efforts from both academia and industry have led to increasingly advanced algorithms that handle noise, typically using the principles of directional processing and postfiltering. This article provides an overview of the techniques used for noise reduction in modern hearing aids. First, classical techniques are covered as they are used in modern hearing aids. The discussion then shifts to how deep learning, a subfield of artificial intelligence, provides a radically different way of solving the noise problem. Finally, the results of several experiments are used to showcase the benefits of recent algorithmic advances in terms of signal-to-noise ratio, speech intelligibility, selective attention, and listening effort.
Collapse
|
38
|
Lowe MX, Mohsenzadeh Y, Lahner B, Charest I, Oliva A, Teng S. Cochlea to categories: The spatiotemporal dynamics of semantic auditory representations. Cogn Neuropsychol 2021; 38:468-489. [PMID: 35729704 PMCID: PMC10589059 DOI: 10.1080/02643294.2022.2085085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 03/31/2022] [Accepted: 05/25/2022] [Indexed: 10/17/2022]
Abstract
How does the auditory system categorize natural sounds? Here we apply multimodal neuroimaging to illustrate the progression from acoustic to semantically dominated representations. Combining magnetoencephalographic (MEG) and functional magnetic resonance imaging (fMRI) scans of observers listening to naturalistic sounds, we found superior temporal responses beginning ∼55 ms post-stimulus onset, spreading to extratemporal cortices by ∼100 ms. Early regions were distinguished less by onset/peak latency than by functional properties and overall temporal response profiles. Early acoustically-dominated representations trended systematically toward category dominance over time (after ∼200 ms) and space (beyond primary cortex). Semantic category representation was spatially specific: Vocalizations were preferentially distinguished in frontotemporal voice-selective regions and the fusiform; scenes and objects were distinguished in parahippocampal and medial place areas. Our results are consistent with real-world events coded via an extended auditory processing hierarchy, in which acoustic representations rapidly enter multiple streams specialized by category, including areas typically considered visual cortex.
Collapse
Affiliation(s)
- Matthew X. Lowe
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Unlimited Sciences, Colorado Springs, CO
| | - Yalda Mohsenzadeh
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- The Brain and Mind Institute, The University of Western Ontario, London, ON, Canada
- Department of Computer Science, The University of Western Ontario, London, ON, Canada
| | - Benjamin Lahner
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Ian Charest
- Département de Psychologie, Université de Montréal, Montréal, Québec, Canada
- Center for Human Brain Health, University of Birmingham, UK
| | - Aude Oliva
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Santani Teng
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Smith-Kettlewell Eye Research Institute (SKERI), San Francisco, CA
| |
Collapse
|
39
|
Kiremitçi I, Yilmaz Ö, Çelik E, Shahdloo M, Huth AG, Çukur T. Attentional Modulation of Hierarchical Speech Representations in a Multitalker Environment. Cereb Cortex 2021; 31:4986-5005. [PMID: 34115102 PMCID: PMC8491717 DOI: 10.1093/cercor/bhab136] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Humans are remarkably adept in listening to a desired speaker in a crowded environment, while filtering out nontarget speakers in the background. Attention is key to solving this difficult cocktail-party task, yet a detailed characterization of attentional effects on speech representations is lacking. It remains unclear across what levels of speech features and how much attentional modulation occurs in each brain area during the cocktail-party task. To address these questions, we recorded whole-brain blood-oxygen-level-dependent (BOLD) responses while subjects either passively listened to single-speaker stories, or selectively attended to a male or a female speaker in temporally overlaid stories in separate experiments. Spectral, articulatory, and semantic models of the natural stories were constructed. Intrinsic selectivity profiles were identified via voxelwise models fit to passive listening responses. Attentional modulations were then quantified based on model predictions for attended and unattended stories in the cocktail-party task. We find that attention causes broad modulations at multiple levels of speech representations while growing stronger toward later stages of processing, and that unattended speech is represented up to the semantic level in parabelt auditory cortex. These results provide insights on attentional mechanisms that underlie the ability to selectively listen to a desired speaker in noisy multispeaker environments.
Collapse
Affiliation(s)
- Ibrahim Kiremitçi
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Özgür Yilmaz
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
| | - Emin Çelik
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Mo Shahdloo
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Experimental Psychology, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford OX3 9DU, UK
| | - Alexander G Huth
- Department of Neuroscience, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| | - Tolga Çukur
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| |
Collapse
|
40
|
Zuk NJ, Murphy JW, Reilly RB, Lalor EC. Envelope reconstruction of speech and music highlights stronger tracking of speech at low frequencies. PLoS Comput Biol 2021; 17:e1009358. [PMID: 34534211 PMCID: PMC8480853 DOI: 10.1371/journal.pcbi.1009358] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 09/29/2021] [Accepted: 08/18/2021] [Indexed: 11/19/2022] Open
Abstract
The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one's cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.
Collapse
Affiliation(s)
- Nathaniel J. Zuk
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Del Monte Institute of Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Jeremy W. Murphy
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Richard B. Reilly
- Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Edmund C. Lalor
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Del Monte Institute of Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
| |
Collapse
|
41
|
Homma NY, Bajo VM. Lemniscal Corticothalamic Feedback in Auditory Scene Analysis. Front Neurosci 2021; 15:723893. [PMID: 34489635 PMCID: PMC8417129 DOI: 10.3389/fnins.2021.723893] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/30/2021] [Indexed: 12/15/2022] Open
Abstract
Sound information is transmitted from the ear to central auditory stations of the brain via several nuclei. In addition to these ascending pathways there exist descending projections that can influence the information processing at each of these nuclei. A major descending pathway in the auditory system is the feedback projection from layer VI of the primary auditory cortex (A1) to the ventral division of medial geniculate body (MGBv) in the thalamus. The corticothalamic axons have small glutamatergic terminals that can modulate thalamic processing and thalamocortical information transmission. Corticothalamic neurons also provide input to GABAergic neurons of the thalamic reticular nucleus (TRN) that receives collaterals from the ascending thalamic axons. The balance of corticothalamic and TRN inputs has been shown to refine frequency tuning, firing patterns, and gating of MGBv neurons. Therefore, the thalamus is not merely a relay stage in the chain of auditory nuclei but does participate in complex aspects of sound processing that include top-down modulations. In this review, we aim (i) to examine how lemniscal corticothalamic feedback modulates responses in MGBv neurons, and (ii) to explore how the feedback contributes to auditory scene analysis, particularly on frequency and harmonic perception. Finally, we will discuss potential implications of the role of corticothalamic feedback in music and speech perception, where precise spectral and temporal processing is essential.
Collapse
Affiliation(s)
- Natsumi Y. Homma
- Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA, United States
- Coleman Memorial Laboratory, Department of Otolaryngology – Head and Neck Surgery, University of California, San Francisco, San Francisco, CA, United States
| | - Victoria M. Bajo
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
42
|
Keshavarzi M, Varano E, Reichenbach T. Cortical Tracking of a Background Speaker Modulates the Comprehension of a Foreground Speech Signal. J Neurosci 2021; 41:5093-5101. [PMID: 33926996 PMCID: PMC8197648 DOI: 10.1523/jneurosci.3200-20.2021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/23/2021] [Accepted: 04/12/2021] [Indexed: 11/21/2022] Open
Abstract
Understanding speech in background noise is a difficult task. The tracking of speech rhythms such as the rate of syllables and words by cortical activity has emerged as a key neural mechanism for speech-in-noise comprehension. In particular, recent investigations have used transcranial alternating current stimulation (tACS) with the envelope of a speech signal to influence the cortical speech tracking, demonstrating that this type of stimulation modulates comprehension and therefore providing evidence of a functional role of the cortical tracking in speech processing. Cortical activity has been found to track the rhythms of a background speaker as well, but the functional significance of this neural response remains unclear. Here we use a speech-comprehension task with a target speaker in the presence of a distractor voice to show that tACS with the speech envelope of the target voice as well as tACS with the envelope of the distractor speaker both modulate the comprehension of the target speech. Because the envelope of the distractor speech does not carry information about the target speech stream, the modulation of speech comprehension through tACS with this envelope provides evidence that the cortical tracking of the background speaker affects the comprehension of the foreground speech signal. The phase dependency of the resulting modulation of speech comprehension is, however, opposite to that obtained from tACS with the envelope of the target speech signal. This suggests that the cortical tracking of the ignored speech stream and that of the attended speech stream may compete for neural resources.SIGNIFICANCE STATEMENT Loud environments such as busy pubs or restaurants can make conversation difficult. However, they also allow us to eavesdrop into other conversations that occur in the background. In particular, we often notice when somebody else mentions our name, even if we have not been listening to that person. However, the neural mechanisms by which background speech is processed remain poorly understood. Here we use transcranial alternating current stimulation, a technique through which neural activity in the cerebral cortex can be influenced, to show that cortical responses to rhythms in the distractor speech modulate the comprehension of the target speaker. Our results provide evidence that the cortical tracking of background speech rhythms plays a functional role in speech processing.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Enrico Varano
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| |
Collapse
|
43
|
Bröhl F, Kayser C. Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 2021; 233:117958. [PMID: 33744458 PMCID: PMC8204264 DOI: 10.1016/j.neuroimage.2021.117958] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/01/2022] Open
Abstract
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed 'speech tracking') many studies consider the broadband speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this broadband envelope can provide a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech-derived envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency speech-derived envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and may reflect the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the broadband speech envelope.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
| |
Collapse
|
44
|
Wang L, Wu EX, Chen F. EEG-based auditory attention decoding using speech-level-based segmented computational models. J Neural Eng 2021; 18. [PMID: 33957606 DOI: 10.1088/1741-2552/abfeba] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
Objective.Auditory attention in complex scenarios can be decoded by electroencephalography (EEG)-based cortical speech-envelope tracking. The relative root-mean-square (RMS) intensity is a valuable cue for the decomposition of speech into distinct characteristic segments. To improve auditory attention decoding (AAD) performance, this work proposed a novel segmented AAD approach to decode target speech envelopes from different RMS-level-based speech segments.Approach.Speech was decomposed into higher- and lower-RMS-level speech segments with a threshold of -10 dB relative RMS level. A support vector machine classifier was designed to identify higher- and lower-RMS-level speech segments, using clean target and mixed speech as reference signals based on corresponding EEG signals recorded when subjects listened to target auditory streams in competing two-speaker auditory scenes. Segmented computational models were developed with the classification results of higher- and lower-RMS-level speech segments. Speech envelopes were reconstructed based on segmented decoding models for either higher- or lower-RMS-level speech segments. AAD accuracies were calculated according to the correlations between actual and reconstructed speech envelopes. The performance of the proposed segmented AAD computational model was compared to those of traditional AAD methods with unified decoding functions.Main results.Higher- and lower-RMS-level speech segments in continuous sentences could be identified robustly with classification accuracies that approximated or exceeded 80% based on corresponding EEG signals at 6 dB, 3 dB, 0 dB, -3 dB and -6 dB signal-to-mask ratios (SMRs). Compared with unified AAD decoding methods, the proposed segmented AAD approach achieved more accurate results in the reconstruction of target speech envelopes and in the detection of attentional directions. Moreover, the proposed segmented decoding method had higher information transfer rates (ITRs) and shorter minimum expected switch times compared with the unified decoder.Significance.This study revealed that EEG signals may be used to classify higher- and lower-RMS-level-based speech segments across a wide range of SMR conditions (from 6 dB to -6 dB). A novel finding was that the specific information in different RMS-level-based speech segments facilitated EEG-based decoding of auditory attention. The significantly improved AAD accuracies and ITRs of the segmented decoding method suggests that this proposed computational model may be an effective method for the application of neuro-controlled brain-computer interfaces in complex auditory scenes.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
45
|
Boos M, Lücke J, Rieger JW. Generalizable dimensions of human cortical auditory processing of speech in natural soundscapes: A data-driven ultra high field fMRI approach. Neuroimage 2021; 237:118106. [PMID: 33991696 DOI: 10.1016/j.neuroimage.2021.118106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 04/25/2021] [Indexed: 11/27/2022] Open
Abstract
Speech comprehension in natural soundscapes rests on the ability of the auditory system to extract speech information from a complex acoustic signal with overlapping contributions from many sound sources. Here we reveal the canonical processing of speech in natural soundscapes on multiple scales by using data-driven modeling approaches to characterize sounds to analyze ultra high field fMRI recorded while participants listened to the audio soundtrack of a movie. We show that at the functional level the neuronal processing of speech in natural soundscapes can be surprisingly low dimensional in the human cortex, highlighting the functional efficiency of the auditory system for a seemingly complex task. Particularly, we find that a model comprising three functional dimensions of auditory processing in the temporal lobes is shared across participants' fMRI activity. We further demonstrate that the three functional dimensions are implemented in anatomically overlapping networks that process different aspects of speech in natural soundscapes. One is most sensitive to complex auditory features present in speech, another to complex auditory features and fast temporal modulations, that are not specific to speech, and one codes mainly sound level. These results were derived with few a-priori assumptions and provide a detailed and computationally reproducible account of the cortical activity in the temporal lobe elicited by the processing of speech in natural soundscapes.
Collapse
Affiliation(s)
- Moritz Boos
- Applied Neurocognitive Psychology Lab, University of Oldenburg, Oldenburg, Germany; Cluster of Excellence "Hearing4all", University of Oldenburg, Oldenburg, Germany.
| | - Jörg Lücke
- Machine Learning Division, University of Oldenburg, Oldenburg, Germany; Cluster of Excellence "Hearing4all", University of Oldenburg, Oldenburg, Germany
| | - Jochem W Rieger
- Applied Neurocognitive Psychology Lab, University of Oldenburg, Oldenburg, Germany; Cluster of Excellence "Hearing4all", University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
46
|
de Cheveigné A, Slaney M, Fuglsang SA, Hjortkjaer J. Auditory stimulus-response modeling with a match-mismatch task. J Neural Eng 2021; 18. [PMID: 33849003 DOI: 10.1088/1741-2552/abf771] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 04/13/2021] [Indexed: 11/12/2022]
Abstract
Objective.An auditory stimulus can be related to the brain response that it evokes by a stimulus-response model fit to the data. This offers insight into perceptual processes within the brain and is also of potential use for devices such as brain computer interfaces (BCIs). The quality of the model can be quantified by measuring the fit with a regression problem, or by applying it to a classification task and measuring its performance.Approach.Here we focus on amatch-mismatch(MM) task that entails deciding whether a segment of brain signal matches, via a model, the auditory stimulus that evoked it.Main results. Using these metrics, we describe a range of models of increasing complexity that we compare to methods in the literature, showing state-of-the-art performance. We document in detail one particular implementation, calibrated on a publicly-available database, that can serve as a robust reference to evaluate future developments.Significance.The MM task allows stimulus-response models to be evaluated in the limit of very high model accuracy, making it an attractive alternative to the more commonly used task of auditory attention detection. The MM task does not require class labels, so it is immune to mislabeling, and it is applicable to data recorded in listening scenarios with only one sound source, thus it is cheap to obtain large quantities of training and testing data. Performance metrics from this task, associated with regression accuracy, provide complementary insights into the relation between stimulus and response, as well as information about discriminatory power directly applicable to BCI applications.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Paris, CNRS UMR 8248, France.,Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, PSL, France.,UCL Ear Institute, London, United Kingdom.,Audition, DEC, ENS, 29 rue d'Ulm, 75230 Paris, France
| | - Malcolm Slaney
- Google Research, Machine Hearing Group, Mountain View, CA, United States of America
| | - Søren A Fuglsang
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Copenhagen, Denmark
| | - Jens Hjortkjaer
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Copenhagen, Denmark
| |
Collapse
|
47
|
Mesik J, Ray L, Wojtczak M. Effects of Age on Cortical Tracking of Word-Level Features of Continuous Competing Speech. Front Neurosci 2021; 15:635126. [PMID: 33867920 PMCID: PMC8047075 DOI: 10.3389/fnins.2021.635126] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 03/12/2021] [Indexed: 01/17/2023] Open
Abstract
Speech-in-noise comprehension difficulties are common among the elderly population, yet traditional objective measures of speech perception are largely insensitive to this deficit, particularly in the absence of clinical hearing loss. In recent years, a growing body of research in young normal-hearing adults has demonstrated that high-level features related to speech semantics and lexical predictability elicit strong centro-parietal negativity in the EEG signal around 400 ms following the word onset. Here we investigate effects of age on cortical tracking of these word-level features within a two-talker speech mixture, and their relationship with self-reported difficulties with speech-in-noise understanding. While undergoing EEG recordings, younger and older adult participants listened to a continuous narrative story in the presence of a distractor story. We then utilized forward encoding models to estimate cortical tracking of four speech features: (1) word onsets, (2) "semantic" dissimilarity of each word relative to the preceding context, (3) lexical surprisal for each word, and (4) overall word audibility. Our results revealed robust tracking of all features for attended speech, with surprisal and word audibility showing significantly stronger contributions to neural activity than dissimilarity. Additionally, older adults exhibited significantly stronger tracking of word-level features than younger adults, especially over frontal electrode sites, potentially reflecting increased listening effort. Finally, neuro-behavioral analyses revealed trends of a negative relationship between subjective speech-in-noise perception difficulties and the model goodness-of-fit for attended speech, as well as a positive relationship between task performance and the goodness-of-fit, indicating behavioral relevance of these measures. Together, our results demonstrate the utility of modeling cortical responses to multi-talker speech using complex, word-level features and the potential for their use to study changes in speech processing due to aging and hearing loss.
Collapse
Affiliation(s)
- Juraj Mesik
- Department of Psychology, University of Minnesota, Minneapolis, MN, United States
| | | | | |
Collapse
|
48
|
Roswandowitz C, Swanborough H, Frühholz S. Categorizing human vocal signals depends on an integrated auditory-frontal cortical network. Hum Brain Mapp 2021; 42:1503-1517. [PMID: 33615612 PMCID: PMC7927295 DOI: 10.1002/hbm.25309] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/20/2020] [Accepted: 11/25/2020] [Indexed: 11/30/2022] Open
Abstract
Voice signals are relevant for auditory communication and suggested to be processed in dedicated auditory cortex (AC) regions. While recent reports highlighted an additional role of the inferior frontal cortex (IFC), a detailed description of the integrated functioning of the AC-IFC network and its task relevance for voice processing is missing. Using neuroimaging, we tested sound categorization while human participants either focused on the higher-order vocal-sound dimension (voice task) or feature-based intensity dimension (loudness task) while listening to the same sound material. We found differential involvements of the AC and IFC depending on the task performed and whether the voice dimension was of task relevance or not. First, when comparing neural vocal-sound processing of our task-based with previously reported passive listening designs we observed highly similar cortical activations in the AC and IFC. Second, during task-based vocal-sound processing we observed voice-sensitive responses in the AC and IFC whereas intensity processing was restricted to distinct AC regions. Third, the IFC flexibly adapted to the vocal-sounds' task relevance, being only active when the voice dimension was task relevant. Forth and finally, connectivity modeling revealed that vocal signals independent of their task relevance provided significant input to bilateral AC. However, only when attention was on the voice dimension, we found significant modulations of auditory-frontal connections. Our findings suggest an integrated auditory-frontal network to be essential for behaviorally relevant vocal-sounds processing. The IFC seems to be an important hub of the extended voice network when representing higher-order vocal objects and guiding goal-directed behavior.
Collapse
Affiliation(s)
- Claudia Roswandowitz
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
| | - Huw Swanborough
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
| | - Sascha Frühholz
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
- Center for Integrative Human Physiology (ZIHP)University of ZurichZurichSwitzerland
| |
Collapse
|
49
|
Alickovic E, Ng EHN, Fiedler L, Santurette S, Innes-Brown H, Graversen C. Effects of Hearing Aid Noise Reduction on Early and Late Cortical Representations of Competing Talkers in Noise. Front Neurosci 2021; 15:636060. [PMID: 33841081 PMCID: PMC8032942 DOI: 10.3389/fnins.2021.636060] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 02/26/2021] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Previous research using non-invasive (magnetoencephalography, MEG) and invasive (electrocorticography, ECoG) neural recordings has demonstrated the progressive and hierarchical representation and processing of complex multi-talker auditory scenes in the auditory cortex. Early responses (<85 ms) in primary-like areas appear to represent the individual talkers with almost equal fidelity and are independent of attention in normal-hearing (NH) listeners. However, late responses (>85 ms) in higher-order non-primary areas selectively represent the attended talker with significantly higher fidelity than unattended talkers in NH and hearing-impaired (HI) listeners. Motivated by these findings, the objective of this study was to investigate the effect of a noise reduction scheme (NR) in a commercial hearing aid (HA) on the representation of complex multi-talker auditory scenes in distinct hierarchical stages of the auditory cortex by using high-density electroencephalography (EEG). DESIGN We addressed this issue by investigating early (<85 ms) and late (>85 ms) EEG responses recorded in 34 HI subjects fitted with HAs. The HA noise reduction (NR) was either on or off while the participants listened to a complex auditory scene. Participants were instructed to attend to one of two simultaneous talkers in the foreground while multi-talker babble noise played in the background (+3 dB SNR). After each trial, a two-choice question about the content of the attended speech was presented. RESULTS Using a stimulus reconstruction approach, our results suggest that the attention-related enhancement of neural representations of target and masker talkers located in the foreground, as well as suppression of the background noise in distinct hierarchical stages is significantly affected by the NR scheme. We found that the NR scheme contributed to the enhancement of the foreground and of the entire acoustic scene in the early responses, and that this enhancement was driven by better representation of the target speech. We found that the target talker in HI listeners was selectively represented in late responses. We found that use of the NR scheme resulted in enhanced representations of the target and masker speech in the foreground and a suppressed representation of the noise in the background in late responses. We found a significant effect of EEG time window on the strengths of the cortical representation of the target and masker. CONCLUSION Together, our analyses of the early and late responses obtained from HI listeners support the existing view of hierarchical processing in the auditory cortex. Our findings demonstrate the benefits of a NR scheme on the representation of complex multi-talker auditory scenes in different areas of the auditory cortex in HI listeners.
Collapse
Affiliation(s)
- Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| | - Elaine Hoi Ning Ng
- Centre for Applied Audiology Research, Oticon A/S, Smørum, Denmark
- Department of Behavioral Sciences and Learning, Linkoping University, Linkoping, Sweden
| | - Lorenz Fiedler
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | - Sébastien Santurette
- Centre for Applied Audiology Research, Oticon A/S, Smørum, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | | | | |
Collapse
|
50
|
Kim S, Schwalje AT, Liu AS, Gander PE, McMurray B, Griffiths TD, Choi I. Pre- and post-target cortical processes predict speech-in-noise performance. Neuroimage 2021; 228:117699. [PMID: 33387631 PMCID: PMC8291856 DOI: 10.1016/j.neuroimage.2020.117699] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 11/06/2020] [Accepted: 12/23/2020] [Indexed: 12/19/2022] Open
Abstract
Understanding speech in noise (SiN) is a complex task that recruits multiple cortical subsystems. There is a variance in individuals' ability to understand SiN that cannot be explained by simple hearing profiles, which suggests that central factors may underlie the variance in SiN ability. Here, we elucidated a few cortical functions involved during a SiN task and their contributions to individual variance using both within- and across-subject approaches. Through our within-subject analysis of source-localized electroencephalography, we investigated how acoustic signal-to-noise ratio (SNR) alters cortical evoked responses to a target word across the speech recognition areas, finding stronger responses in left supramarginal gyrus (SMG, BA40 the dorsal lexicon area) with quieter noise. Through an individual differences approach, we found that listeners show different neural sensitivity to the background noise and target speech, reflected in the amplitude ratio of earlier auditory-cortical responses to speech and noise, named as an internal SNR. Listeners with better internal SNR showed better SiN performance. Further, we found that the post-speech time SMG activity explains a further amount of variance in SiN performance that is not accounted for by internal SNR. This result demonstrates that at least two cortical processes contribute to SiN performance independently: pre-target time processing to attenuate neural representation of background noise and post-target time processing to extract information from speech sounds.
Collapse
Affiliation(s)
- Subong Kim
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Adam T Schwalje
- Department of Otolaryngology - Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA
| | - Andrew S Liu
- Department of Otolaryngology - Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA
| | - Phillip E Gander
- Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA
| | - Bob McMurray
- Department of Otolaryngology - Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA; Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA 52242, USA; Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA 52242, USA
| | - Timothy D Griffiths
- Biosciences Institute, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | - Inyong Choi
- Department of Otolaryngology - Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA; Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|