1
|
Liu H, Bai Y, Zheng Q, Liu J, Zhu J, Ni G. Electrophysiological correlation of auditory selective spatial attention in the "cocktail party" situation. Hum Brain Mapp 2024; 45:e26793. [PMID: 39037186 PMCID: PMC11261592 DOI: 10.1002/hbm.26793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/04/2024] [Accepted: 07/09/2024] [Indexed: 07/23/2024] Open
Abstract
The auditory system can selectively attend to the target source in complex environments, the phenomenon known as the "cocktail party" effect. However, the spatiotemporal dynamics of electrophysiological activity associated with auditory selective spatial attention (ASSA) remain largely unexplored. In this study, single-source and multiple-source paradigms were designed to simulate different auditory environments, and microstate analysis was introduced to reveal the electrophysiological correlates of ASSA. Furthermore, cortical source analysis was employed to reveal the neural activity regions of these microstates. The results showed that five microstates could explain the spatiotemporal dynamics of ASSA, ranging from MS1 to MS5. Notably, MS2 and MS3 showed significantly lower partial properties in multiple-source situations than in single-source situations, whereas MS4 had shorter durations and MS5 longer durations in multiple-source situations than in single-source situations. MS1 had insignificant differences between the two situations. Cortical source analysis showed that the activation regions of these microstates initially transferred from the right temporal cortex to the temporal-parietal cortex, and subsequently to the dorsofrontal cortex. Moreover, the neural activity of the single-source situations was greater than that of the multiple-source situations in MS2 and MS3, correlating with the N1 and P2 components, with the greatest differences observed in the superior temporal gyrus and inferior parietal lobule. These findings suggest that these specific microstates and their associated activation regions may serve as promising substrates for decoding ASSA in complex environments.
Collapse
Affiliation(s)
- Hongxing Liu
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
| | - Yanru Bai
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
- Haihe Laboratory of Brain‐computer Interaction and Human‐machine IntegrationTianjinChina
| | - Qi Zheng
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
| | - Jihan Liu
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
| | - Jianing Zhu
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
| | - Guangjian Ni
- Academy of Medical Engineering and Translational MedicineTianjin UniversityTianjinChina
- State Key Laboratory of Advanced Medical Materials and DevicesTianjin UniversityTianjinChina
- Haihe Laboratory of Brain‐computer Interaction and Human‐machine IntegrationTianjinChina
- Tianjin Key Laboratory of Brain Science and NeuroengineeringTianjinChina
| |
Collapse
|
2
|
Sun W, Zou J, Zhu T, Sun Z, Ding N. Linguistic feedback supports rapid adaptation to acoustically degraded speech. iScience 2024; 27:110055. [PMID: 38868204 PMCID: PMC11167482 DOI: 10.1016/j.isci.2024.110055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/15/2024] [Accepted: 05/17/2024] [Indexed: 06/14/2024] Open
Abstract
Humans can quickly adapt to recognize acoustically degraded speech, and here we hypothesize that the quick adaptation is enabled by internal linguistic feedback - Listeners use partially recognized sentences to adapt the mapping between acoustic features and phonetic labels. We test this hypothesis by quantifying how quickly humans adapt to degraded speech and analyzing whether the adaptation process can be simulated by adapting an automatic speech recognition (ASR) system based on its own speech recognition results. We consider three types of acoustic degradation, i.e., noise vocoding, time compression, and local time-reversal. The human speech recognition rate can increase by >20% after exposure to just a few acoustically degraded sentences. Critically, the ASR system with internal linguistic feedback can adapt to degraded speech with human-level speed and accuracy. These results suggest that self-supervised learning based on linguistic feedback is a plausible strategy for human adaptation to acoustically degraded speech.
Collapse
Affiliation(s)
- Wenhui Sun
- Research Center for Life Sciences Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Tianyi Zhu
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Zhoujian Sun
- Research Center for Life Sciences Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
3
|
Roebben A, Heintz N, Geirnaert S, Francart T, Bertrand A. 'Are you even listening?' - EEG-based decoding of absolute auditory attention to natural speech. J Neural Eng 2024; 21:036046. [PMID: 38834062 DOI: 10.1088/1741-2552/ad5403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 06/04/2024] [Indexed: 06/06/2024]
Abstract
Objective.In this study, we use electroencephalography (EEG) recordings to determine whether a subject is actively listening to a presented speech stimulus. More precisely, we aim to discriminate between an active listening condition, and a distractor condition where subjects focus on an unrelated distractor task while being exposed to a speech stimulus. We refer to this task as absolute auditory attention decoding.Approach.We re-use an existing EEG dataset where the subjects watch a silent movie as a distractor condition, and introduce a new dataset with two distractor conditions (silently reading a text and performing arithmetic exercises). We focus on two EEG features, namely neural envelope tracking (NET) and spectral entropy (SE). Additionally, we investigate whether the detection of such an active listening condition can be combined with a selective auditory attention decoding (sAAD) task, where the goal is to decide to which of multiple competing speakers the subject is attending. The latter is a key task in so-called neuro-steered hearing devices that aim to suppress unattended audio, while preserving the attended speaker.Main results.Contrary to a previous hypothesis of higher SE being related with actively listening rather than passively listening (without any distractors), we find significantly lower SE in the active listening condition compared to the distractor conditions. Nevertheless, the NET is consistently significantly higher when actively listening. Similarly, we show that the accuracy of a sAAD task improves when evaluating the accuracy only on the highest NET segments. However, the reverse is observed when evaluating the accuracy only on the lowest SE segments.Significance.We conclude that the NET is more reliable for decoding absolute auditory attention as it is consistently higher when actively listening, whereas the relation of the SE between actively and passively listening seems to depend on the nature of the distractor.
Collapse
Affiliation(s)
- Arnout Roebben
- KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
| | - Nicolas Heintz
- KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
- KU Leuven, Department of Neurosciences, Experimental Oto-Rhino-Laryngology (ExpORL), Leuven, Belgium
- Leuven.AI-KU Leuven institute for AI, Leuven, Belgium
| | - Simon Geirnaert
- KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
- KU Leuven, Department of Neurosciences, Experimental Oto-Rhino-Laryngology (ExpORL), Leuven, Belgium
- Leuven.AI-KU Leuven institute for AI, Leuven, Belgium
| | - Tom Francart
- KU Leuven, Department of Neurosciences, Experimental Oto-Rhino-Laryngology (ExpORL), Leuven, Belgium
- Leuven.AI-KU Leuven institute for AI, Leuven, Belgium
| | - Alexander Bertrand
- KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Leuven, Belgium
- Leuven.AI-KU Leuven institute for AI, Leuven, Belgium
| |
Collapse
|
4
|
Dai B, Zhai Y, Long Y, Lu C. How the Listener's Attention Dynamically Switches Between Different Speakers During a Natural Conversation. Psychol Sci 2024; 35:635-652. [PMID: 38657276 DOI: 10.1177/09567976241243367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024] Open
Abstract
The neural mechanisms underpinning the dynamic switching of a listener's attention between speakers are not well understood. Here we addressed this issue in a natural conversation involving 21 triadic adult groups. Results showed that when the listener's attention dynamically switched between speakers, neural synchronization with the to-be-attended speaker was significantly enhanced, whereas that with the to-be-ignored speaker was significantly suppressed. Along with attention switching, semantic distances between sentences significantly increased in the to-be-ignored speech. Moreover, neural synchronization negatively correlated with the increase in semantic distance but not with acoustic change of the to-be-ignored speech. However, no difference in neural synchronization was found between the listener and the two speakers during the phase of sustained attention. These findings support the attenuation model of attention, indicating that both speech signals are processed beyond the basic physical level. Additionally, shifting attention imposes a cognitive burden, as demonstrated by the opposite fluctuations of interpersonal neural synchronization.
Collapse
Affiliation(s)
- Bohan Dai
- Max Planck Institute for Psycholinguistics
- Donders Institute for Brain, Cognition and Behaviour, Radboud University
| | - Yu Zhai
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University
| | - Yuhang Long
- Institute of Developmental Psychology, Faculty of Psychology, Beijing Normal University
| | - Chunming Lu
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University
| |
Collapse
|
5
|
Zeng X, Cai S, Xie L. Attention-guided graph structure learning network for EEG-enabled auditory attention detection. J Neural Eng 2024; 21:036025. [PMID: 38776893 DOI: 10.1088/1741-2552/ad4f1a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 05/22/2024] [Indexed: 05/25/2024]
Abstract
Objective: Decoding auditory attention from brain signals is essential for the development of neuro-steered hearing aids. This study aims to overcome the challenges of extracting discriminative feature representations from electroencephalography (EEG) signals for auditory attention detection (AAD) tasks, particularly focusing on the intrinsic relationships between different EEG channels.Approach: We propose a novel attention-guided graph structure learning network, AGSLnet, which leverages potential relationships between EEG channels to improve AAD performance. Specifically, AGSLnet is designed to dynamically capture latent relationships between channels and construct a graph structure of EEG signals.Main result: We evaluated AGSLnet on two publicly available AAD datasets and demonstrated its superiority and robustness over state-of-the-art models. Visualization of the graph structure trained by AGSLnet supports previous neuroscience findings, enhancing our understanding of the underlying neural mechanisms.Significance: This study presents a novel approach for examining brain functional connections, improving AAD performance in low-latency settings, and supporting the development of neuro-steered hearing aids.
Collapse
Affiliation(s)
- Xianzhang Zeng
- School of Intelligent Engineering, South China University of Technology, Guangzhou, People's Republic of China
| | - Siqi Cai
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
| | - Longhan Xie
- School of Intelligent Engineering, South China University of Technology, Guangzhou, People's Republic of China
| |
Collapse
|
6
|
Levy O, Hackmon SL, Zvilichovsky Y, Korisky A, Bidet-Caulet A, Schweitzer JB, Golumbic EZ. Neurophysiological Patterns of Attention and Distraction during Realistic Virtual-Reality Classroom Learning in Adults with and without ADHD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.17.590012. [PMID: 38659916 PMCID: PMC11042341 DOI: 10.1101/2024.04.17.590012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Many people, and particularly those diagnosed with ADHD, report difficulties maintaining attention and proneness to distraction during classroom learning. However, the behavioral, neural and physiological basis of attention in realistic learning contexts is not well understood, since current clinical and scientific tools used for evaluating and quantifying the constructs of "distractibility" and "inattention", are removed from the real-life experience in organic classrooms. Here we introduce a novel Virtual Reality (VR) platform for studying students' brain activity and physiological responses as they immerse in realistic frontal classroom learning. Using this approach, we studied whether adults with and without ADHD (N=49) exhibit differences in neurophysiological metrics associated with sustained attention, such as speech-tracking of the teacher's voice, power of alpha-oscillations and levels of arousal, as well as responses to potential disturbances by background sound-events in the classroom. Under these ecological conditions, we find that adults with ADHD exhibit higher auditory neural response to background sounds relative to their control-peers, which also contributed to explaining variance in the severity of ADHD symptoms, together with higher power of alpha-oscillations and more frequent gaze-shifts around the classroom. These results are in-line with higher sensitivity to irrelevant stimuli in the environment and increased mind-wandering/boredom. At the same time, both groups exhibited similar learning outcomes and showed similar neural tracking of the teacher's speech. This suggests that in this context, attention may not operate as a zero-sum game and that allocating some resources to irrelevant stimuli does not always detract from performing the task at hand. Given the dire need for more objective, dimensional and ecologically-valid measures of attention and its real-life deficits, this work provides new insights into the neurophysiological manifestations of attention and distraction experienced in real-life contexts, while challenging some prevalent notions regarding the nature of attentional challenges experienced by those with ADHD.
Collapse
Affiliation(s)
- Orel Levy
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | | | - Yair Zvilichovsky
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | - Adi Korisky
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| | | | - Julie B. Schweitzer
- Department of Psychiatry and Behavioral Sciences, University of California, Davis, Sacramento, CA U.S.A
| | | |
Collapse
|
7
|
Puschmann S, Regev M, Fakhar K, Zatorre RJ, Thiel CM. Attention-Driven Modulation of Auditory Cortex Activity during Selective Listening in a Multispeaker Setting. J Neurosci 2024; 44:e1157232023. [PMID: 38388426 PMCID: PMC11007309 DOI: 10.1523/jneurosci.1157-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/30/2023] [Accepted: 11/05/2023] [Indexed: 02/24/2024] Open
Abstract
Real-world listening settings often consist of multiple concurrent sound streams. To limit perceptual interference during selective listening, the auditory system segregates and filters the relevant sensory input. Previous work provided evidence that the auditory cortex is critically involved in this process and selectively gates attended input toward subsequent processing stages. We studied at which level of auditory cortex processing this filtering of attended information occurs using functional magnetic resonance imaging (fMRI) and a naturalistic selective listening task. Forty-five human listeners (of either sex) attended to one of two continuous speech streams, presented either concurrently or in isolation. Functional data were analyzed using an inter-subject analysis to assess stimulus-specific components of ongoing auditory cortex activity. Our results suggest that stimulus-related activity in the primary auditory cortex and the adjacent planum temporale are hardly affected by attention, whereas brain responses at higher stages of the auditory cortex processing hierarchy become progressively more selective for the attended input. Consistent with these findings, a complementary analysis of stimulus-driven functional connectivity further demonstrated that information on the to-be-ignored speech stream is shared between the primary auditory cortex and the planum temporale but largely fails to reach higher processing stages. Our findings suggest that the neural processing of ignored speech cannot be effectively suppressed at the level of early cortical processing of acoustic features but is gradually attenuated once the competing speech streams are fully segregated.
Collapse
Affiliation(s)
- Sebastian Puschmann
- Biological Psychology Lab, Department of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
| | - Mor Regev
- Montreal Neurological Institute, McGill University, Montreal, Quebec H3A 2B4, Canada
| | - Kayson Fakhar
- Institute of Computational Neuroscience, University Medical Center Eppendorf, Hamburg University, Hamburg Center of Neuroscience, Hamburg 20246, Germany
| | - Robert J Zatorre
- Montreal Neurological Institute, McGill University, Montreal, Quebec H3A 2B4, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, Quebec H2V 2S9, Canada
| | - Christiane M Thiel
- Biological Psychology Lab, Department of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
| |
Collapse
|
8
|
Viswanathan V, Rupp KM, Hect JL, Harford EE, Holt LL, Abel TJ. Intracranial Mapping of Response Latencies and Task Effects for Spoken Syllable Processing in the Human Brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588349. [PMID: 38617227 PMCID: PMC11014624 DOI: 10.1101/2024.04.05.588349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Prior lesion, noninvasive-imaging, and intracranial-electroencephalography (iEEG) studies have documented hierarchical, parallel, and distributed characteristics of human speech processing. Yet, there have not been direct, intracranial observations of the latency with which regions outside the temporal lobe respond to speech, or how these responses are impacted by task demands. We leveraged human intracranial recordings via stereo-EEG to measure responses from diverse forebrain sites during (i) passive listening to /bi/ and /pi/ syllables, and (ii) active listening requiring /bi/-versus-/pi/ categorization. We find that neural response latency increases from a few tens of ms in Heschl's gyrus (HG) to several tens of ms in superior temporal gyrus (STG), superior temporal sulcus (STS), and early parietal areas, and hundreds of ms in later parietal areas, insula, frontal cortex, hippocampus, and amygdala. These data also suggest parallel flow of speech information dorsally and ventrally, from HG to parietal areas and from HG to STG and STS, respectively. Latency data also reveal areas in parietal cortex, frontal cortex, hippocampus, and amygdala that are not responsive to the stimuli during passive listening but are responsive during categorization. Furthermore, multiple regions-spanning auditory, parietal, frontal, and insular cortices, and hippocampus and amygdala-show greater neural response amplitudes during active versus passive listening (a task-related effect). Overall, these results are consistent with hierarchical processing of speech at a macro level and parallel streams of information flow in temporal and parietal regions. These data also reveal regions where the speech code is stimulus-faithful and those that encode task-relevant representations.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
| | - Kyle M. Rupp
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
| | - Jasmine L. Hect
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
| | - Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, Austin, TX 78712
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15260
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15238
| |
Collapse
|
9
|
Li Z, Zhang D. How does the human brain process noisy speech in real life? Insights from the second-person neuroscience perspective. Cogn Neurodyn 2024; 18:371-382. [PMID: 38699619 PMCID: PMC11061069 DOI: 10.1007/s11571-022-09924-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/20/2022] [Accepted: 12/19/2022] [Indexed: 01/07/2023] Open
Abstract
Comprehending speech with the existence of background noise is of great importance for human life. In the past decades, a large number of psychological, cognitive and neuroscientific research has explored the neurocognitive mechanisms of speech-in-noise comprehension. However, as limited by the low ecological validity of the speech stimuli and the experimental paradigm, as well as the inadequate attention on the high-order linguistic and extralinguistic processes, there remains much unknown about how the brain processes noisy speech in real-life scenarios. A recently emerging approach, i.e., the second-person neuroscience approach, provides a novel conceptual framework. It measures both of the speaker's and the listener's neural activities, and estimates the speaker-listener neural coupling with regarding of the speaker's production-related neural activity as a standardized reference. The second-person approach not only promotes the use of naturalistic speech but also allows for free communication between speaker and listener as in a close-to-life context. In this review, we first briefly review the previous discoveries about how the brain processes speech in noise; then, we introduce the principles and advantages of the second-person neuroscience approach and discuss its implications to unravel the linguistic and extralinguistic processes during speech-in-noise comprehension; finally, we conclude by proposing some critical issues and calls for more research interests in the second-person approach, which would further extend the present knowledge about how people comprehend speech in noise.
Collapse
Affiliation(s)
- Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, 100084 China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, 100084 China
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, 100084 China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, 100084 China
| |
Collapse
|
10
|
Tune S, Obleser J. Neural attentional filters and behavioural outcome follow independent individual trajectories over the adult lifespan. eLife 2024; 12:RP92079. [PMID: 38470243 DOI: 10.7554/elife.92079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024] Open
Abstract
Preserved communication abilities promote healthy ageing. To this end, the age-typical loss of sensory acuity might in part be compensated for by an individual's preserved attentional neural filtering. Is such a compensatory brain-behaviour link longitudinally stable? Can it predict individual change in listening behaviour? We here show that individual listening behaviour and neural filtering ability follow largely independent developmental trajectories modelling electroencephalographic and behavioural data of N = 105 ageing individuals (39-82 y). First, despite the expected decline in hearing-threshold-derived sensory acuity, listening-task performance proved stable over 2 y. Second, neural filtering and behaviour were correlated only within each separate measurement timepoint (T1, T2). Longitudinally, however, our results raise caution on attention-guided neural filtering metrics as predictors of individual trajectories in listening behaviour: neither neural filtering at T1 nor its 2-year change could predict individual 2-year behavioural change, under a combination of modelling strategies.
Collapse
Affiliation(s)
- Sarah Tune
- Center of Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Jonas Obleser
- Center of Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
- Department of Psychology, University of Lübeck, Lübeck, Germany
| |
Collapse
|
11
|
Wikman P, Salmela V, Sjöblom E, Leminen M, Laine M, Alho K. Attention to audiovisual speech shapes neural processing through feedback-feedforward loops between different nodes of the speech network. PLoS Biol 2024; 22:e3002534. [PMID: 38466713 PMCID: PMC10957087 DOI: 10.1371/journal.pbio.3002534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 03/21/2024] [Accepted: 01/30/2024] [Indexed: 03/13/2024] Open
Abstract
Selective attention-related top-down modulation plays a significant role in separating relevant speech from irrelevant background speech when vocal attributes separating concurrent speakers are small and continuously evolving. Electrophysiological studies have shown that such top-down modulation enhances neural tracking of attended speech. Yet, the specific cortical regions involved remain unclear due to the limited spatial resolution of most electrophysiological techniques. To overcome such limitations, we collected both electroencephalography (EEG) (high temporal resolution) and functional magnetic resonance imaging (fMRI) (high spatial resolution), while human participants selectively attended to speakers in audiovisual scenes containing overlapping cocktail party speech. To utilise the advantages of the respective techniques, we analysed neural tracking of speech using the EEG data and performed representational dissimilarity-based EEG-fMRI fusion. We observed that attention enhanced neural tracking and modulated EEG correlates throughout the latencies studied. Further, attention-related enhancement of neural tracking fluctuated in predictable temporal profiles. We discuss how such temporal dynamics could arise from a combination of interactions between attention and prediction as well as plastic properties of the auditory cortex. EEG-fMRI fusion revealed attention-related iterative feedforward-feedback loops between hierarchically organised nodes of the ventral auditory object related processing stream. Our findings support models where attention facilitates dynamic neural changes in the auditory cortex, ultimately aiding discrimination of relevant sounds from irrelevant ones while conserving neural resources.
Collapse
Affiliation(s)
- Patrik Wikman
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Viljami Salmela
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Eetu Sjöblom
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
| | - Miika Leminen
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- AI and Analytics Unit, Helsinki University Hospital, Helsinki, Finland
| | - Matti Laine
- Department of Psychology, Åbo Akademi University, Turku, Finland
| | - Kimmo Alho
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland
- Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
12
|
Corsini A, Tomassini A, Pastore A, Delis I, Fadiga L, D'Ausilio A. Speech perception difficulty modulates theta-band encoding of articulatory synergies. J Neurophysiol 2024; 131:480-491. [PMID: 38323331 DOI: 10.1152/jn.00388.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 01/04/2024] [Accepted: 01/25/2024] [Indexed: 02/08/2024] Open
Abstract
The human brain tracks available speech acoustics and extrapolates missing information such as the speaker's articulatory patterns. However, the extent to which articulatory reconstruction supports speech perception remains unclear. This study explores the relationship between articulatory reconstruction and task difficulty. Participants listened to sentences and performed a speech-rhyming task. Real kinematic data of the speaker's vocal tract were recorded via electromagnetic articulography (EMA) and aligned to corresponding acoustic outputs. We extracted articulatory synergies from the EMA data with principal component analysis (PCA) and employed partial information decomposition (PID) to separate the electroencephalographic (EEG) encoding of acoustic and articulatory features into unique, redundant, and synergistic atoms of information. We median-split sentences into easy (ES) and hard (HS) based on participants' performance and found that greater task difficulty involved greater encoding of unique articulatory information in the theta band. We conclude that fine-grained articulatory reconstruction plays a complementary role in the encoding of speech acoustics, lending further support to the claim that motor processes support speech perception.NEW & NOTEWORTHY Top-down processes originating from the motor system contribute to speech perception through the reconstruction of the speaker's articulatory movement. This study investigates the role of such articulatory simulation under variable task difficulty. We show that more challenging listening tasks lead to increased encoding of articulatory kinematics in the theta band and suggest that, in such situations, fine-grained articulatory reconstruction complements acoustic encoding.
Collapse
Affiliation(s)
- Alessandro Corsini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Alice Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Aldo Pastore
- Laboratorio NEST, Scuola Normale Superiore, Pisa, Italy
| | - Ioannis Delis
- School of Biomedical Sciences, University of Leeds, Leeds, United Kingdom
| | - Luciano Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Alessandro D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| |
Collapse
|
13
|
Fink L, Simola J, Tavano A, Lange E, Wallot S, Laeng B. From pre-processing to advanced dynamic modeling of pupil data. Behav Res Methods 2024; 56:1376-1412. [PMID: 37351785 PMCID: PMC10991010 DOI: 10.3758/s13428-023-02098-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2023] [Indexed: 06/24/2023]
Abstract
The pupil of the eye provides a rich source of information for cognitive scientists, as it can index a variety of bodily states (e.g., arousal, fatigue) and cognitive processes (e.g., attention, decision-making). As pupillometry becomes a more accessible and popular methodology, researchers have proposed a variety of techniques for analyzing pupil data. Here, we focus on time series-based, signal-to-signal approaches that enable one to relate dynamic changes in pupil size over time with dynamic changes in a stimulus time series, continuous behavioral outcome measures, or other participants' pupil traces. We first introduce pupillometry, its neural underpinnings, and the relation between pupil measurements and other oculomotor behaviors (e.g., blinks, saccades), to stress the importance of understanding what is being measured and what can be inferred from changes in pupillary activity. Next, we discuss possible pre-processing steps, and the contexts in which they may be necessary. Finally, we turn to signal-to-signal analytic techniques, including regression-based approaches, dynamic time-warping, phase clustering, detrended fluctuation analysis, and recurrence quantification analysis. Assumptions of these techniques, and examples of the scientific questions each can address, are outlined, with references to key papers and software packages. Additionally, we provide a detailed code tutorial that steps through the key examples and figures in this paper. Ultimately, we contend that the insights gained from pupillometry are constrained by the analysis techniques used, and that signal-to-signal approaches offer a means to generate novel scientific insights by taking into account understudied spectro-temporal relationships between the pupil signal and other signals of interest.
Collapse
Affiliation(s)
- Lauren Fink
- Department of Music, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, 60322, Frankfurt am Main, Germany.
- Department of Psychology, Neuroscience & Behavior, McMaster University, 1280 Main St. West, Hamilton, Ontario, L8S 4L8, Canada.
| | - Jaana Simola
- Helsinki Collegium for Advanced Studies, University of Helsinki, Helsinki, Finland
- Department of Education, University of Helsinki, Helsinki, Finland
| | - Alessandro Tavano
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Elke Lange
- Department of Music, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, 60322, Frankfurt am Main, Germany
| | - Sebastian Wallot
- Department of Literature, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Institute for Sustainability Education and Psychologyy, Leuphana University, Lüneburg, Germany
| | - Bruno Laeng
- Department of Psychology, University of Oslo, Oslo, Norway
- RITMO Centre for Interdisciplinary studies in Rhythm, Time, and Motion, University of Oslo, Oslo, Norway
| |
Collapse
|
14
|
Yao Y, Stebner A, Tuytelaars T, Geirnaert S, Bertrand A. Identifying temporal correlations between natural single-shot videos and EEG signals. J Neural Eng 2024; 21:016018. [PMID: 38277701 DOI: 10.1088/1741-2552/ad2333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 01/26/2024] [Indexed: 01/28/2024]
Abstract
Objective.Electroencephalography (EEG) is a widely used technology for recording brain activity in brain-computer interface (BCI) research, where understanding the encoding-decoding relationship between stimuli and neural responses is a fundamental challenge. Recently, there is a growing interest in encoding-decoding natural stimuli in a single-trial setting, as opposed to traditional BCI literature where multi-trial presentations of synthetic stimuli are commonplace. While EEG responses to natural speech have been extensively studied, such stimulus-following EEG responses to natural video footage remain underexplored.Approach.We collect a new EEG dataset with subjects passively viewing a film clip and extract a few video features that have been found to be temporally correlated with EEG signals. However, our analysis reveals that these correlations are mainly driven by shot cuts in the video. To avoid the confounds related to shot cuts, we construct another EEG dataset with natural single-shot videos as stimuli and propose a new set of object-based features.Main results.We demonstrate that previous video features lack robustness in capturing the coupling with EEG signals in the absence of shot cuts, and that the proposed object-based features exhibit significantly higher correlations. Furthermore, we show that the correlations obtained with these proposed features are not dominantly driven by eye movements. Additionally, we quantitatively verify the superiority of the proposed features in a match-mismatch task. Finally, we evaluate to what extent these proposed features explain the variance in coherent stimulus responses across subjects.Significance.This work provides valuable insights into feature design for video-EEG analysis and paves the way for applications such as visual attention decoding.
Collapse
Affiliation(s)
- Yuanyuan Yao
- Department of Electrical Engineering, STADIUS, KU Leuven, Leuven, Belgium
| | - Axel Stebner
- Department of Electrical Engineering, PSI, KU Leuven, Leuven, Belgium
| | - Tinne Tuytelaars
- Department of Electrical Engineering, PSI, KU Leuven, Leuven, Belgium
| | - Simon Geirnaert
- Department of Electrical Engineering, STADIUS, Department of Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Alexander Bertrand
- Department of Electrical Engineering, STADIUS, KU Leuven, Leuven, Belgium
| |
Collapse
|
15
|
Karunathilake IMD, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are modulated by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers of the computations underlying speech perception and comprehension.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| |
Collapse
|
16
|
Panela RA, Copelli F, Herrmann B. Reliability and generalizability of neural speech tracking in younger and older adults. Neurobiol Aging 2024; 134:165-180. [PMID: 38103477 DOI: 10.1016/j.neurobiolaging.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/09/2023] [Accepted: 11/16/2023] [Indexed: 12/19/2023]
Abstract
Neural tracking of spoken speech is considered a potential clinical biomarker for speech-processing difficulties, but the reliability of neural speech tracking is unclear. Here, younger and older adults listened to stories in two sessions while electroencephalography was recorded to investigate the reliability and generalizability of neural speech tracking. Speech tracking amplitude was larger for older than younger adults, consistent with an age-related loss of inhibition. The reliability of neural speech tracking was moderate (ICC ∼0.5-0.75) and tended to be higher for older adults. However, reliability was lower for speech tracking than for neural responses to noise bursts (ICC >0.8), which we used as a benchmark for maximum reliability. Neural speech tracking generalized moderately across different stories (ICC ∼0.5-0.6), which appeared greatest for audiobook-like stories spoken by the same person. Hence, a variety of stories could possibly be used for clinical assessments. Overall, the current data are important for developing a biomarker of speech processing but suggest that further work is needed to increase the reliability to meet clinical standards.
Collapse
Affiliation(s)
- Ryan A Panela
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada
| | - Francesca Copelli
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada
| | - Björn Herrmann
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada.
| |
Collapse
|
17
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
18
|
Smith TM, Shen Y, Williams CN, Kidd GR, McAuley JD. Contribution of speech rhythm to understanding speech in noisy conditions: Further test of a selective entrainment hypothesis. Atten Percept Psychophys 2024; 86:627-642. [PMID: 38012475 DOI: 10.3758/s13414-023-02815-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2023] [Indexed: 11/29/2023]
Abstract
Previous work by McAuley et al. Attention, Perception, & Psychophysics, 82, 3222-3233, (2020), Attention, Perception & Psychophysics, 83, 2229-2240, (2021) showed that disruption of the natural rhythm of target (attended) speech worsens speech recognition in the presence of competing background speech or noise (a target-rhythm effect), while disruption of background speech rhythm improves target recognition (a background-rhythm effect). While these results were interpreted as support for the role of rhythmic regularities in facilitating target-speech recognition amidst competing backgrounds (in line with a selective entrainment hypothesis), questions remain about the factors that contribute to the target-rhythm effect. Experiment 1 ruled out the possibility that the target-rhythm effect relies on a decrease in intelligibility of the rhythm-altered keywords. Sentences from the Coordinate Response Measure (CRM) paradigm were presented with a background of speech-shaped noise, and the rhythm of the initial portion of these target sentences (the target rhythmic context) was altered while critically leaving the target Color and Number keywords intact. Results showed a target-rhythm effect, evidenced by poorer keyword recognition when the target rhythmic context was altered, despite the absence of rhythmic manipulation of the keywords. Experiment 2 examined the influence of the relative onset asynchrony between target and background keywords. Results showed a significant target-rhythm effect that was independent of the effect of target-background keyword onset asynchrony. Experiment 3 provided additional support for the selective entrainment hypothesis by replicating the target-rhythm effect with a set of speech materials that were less rhythmically constrained than the CRM sentences.
Collapse
Affiliation(s)
- Toni M Smith
- Department of Psychology, Michigan State University, East Lansing, MI, USA.
| | - Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Christina N Williams
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Gary R Kidd
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
| | - J Devin McAuley
- Department of Psychology, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
19
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
20
|
Guerra G, Tierney A, Tijms J, Vaessen A, Bonte M, Dick F. Attentional modulation of neural sound tracking in children with and without dyslexia. Dev Sci 2024; 27:e13420. [PMID: 37350014 DOI: 10.1111/desc.13420] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 04/09/2023] [Accepted: 05/26/2023] [Indexed: 06/24/2023]
Abstract
Auditory selective attention forms an important foundation of children's learning by enabling the prioritisation and encoding of relevant stimuli. It may also influence reading development, which relies on metalinguistic skills including the awareness of the sound structure of spoken language. Reports of attentional impairments and speech perception difficulties in noisy environments in dyslexic readers are also suggestive of the putative contribution of auditory attention to reading development. To date, it is unclear whether non-speech selective attention and its underlying neural mechanisms are impaired in children with dyslexia and to which extent these deficits relate to individual reading and speech perception abilities in suboptimal listening conditions. In this EEG study, we assessed non-speech sustained auditory selective attention in 106 7-to-12-year-old children with and without dyslexia. Children attended to one of two tone streams, detecting occasional sequence repeats in the attended stream, and performed a speech-in-speech perception task. Results show that when children directed their attention to one stream, inter-trial-phase-coherence at the attended rate increased in fronto-central sites; this, in turn, was associated with better target detection. Behavioural and neural indices of attention did not systematically differ as a function of dyslexia diagnosis. However, behavioural indices of attention did explain individual differences in reading fluency and speech-in-speech perception abilities: both these skills were impaired in dyslexic readers. Taken together, our results show that children with dyslexia do not show group-level auditory attention deficits but these deficits may represent a risk for developing reading impairments and problems with speech perception in complex acoustic environments. RESEARCH HIGHLIGHTS: Non-speech sustained auditory selective attention modulates EEG phase coherence in children with/without dyslexia Children with dyslexia show difficulties in speech-in-speech perception Attention relates to dyslexic readers' speech-in-speech perception and reading skills Dyslexia diagnosis is not linked to behavioural/EEG indices of auditory attention.
Collapse
Affiliation(s)
- Giada Guerra
- Centre for Brain and Cognitive Development, Birkbeck College, University of London, London, UK
- Maastricht Brain Imaging Center and Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| | - Adam Tierney
- Centre for Brain and Cognitive Development, Birkbeck College, University of London, London, UK
| | - Jurgen Tijms
- RID, Amsterdam, Netherlands
- Rudolf Berlin Center, Department of Psychology, University of Amsterdam, Amsterdam, Netherlands
| | | | - Milene Bonte
- Maastricht Brain Imaging Center and Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| | - Frederic Dick
- Division of Psychology & Language Sciences, UCL, London, UK
| |
Collapse
|
21
|
Ha J, Baek SC, Lim Y, Chung JH. Validation of cost-efficient EEG experimental setup for neural tracking in an auditory attention task. Sci Rep 2023; 13:22682. [PMID: 38114579 PMCID: PMC10730561 DOI: 10.1038/s41598-023-49990-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
When individuals listen to speech, their neural activity phase-locks to the slow temporal rhythm, which is commonly referred to as "neural tracking". The neural tracking mechanism allows for the detection of an attended sound source in a multi-talker situation by decoding neural signals obtained by electroencephalography (EEG), known as auditory attention decoding (AAD). Neural tracking with AAD can be utilized as an objective measurement tool for diverse clinical contexts, and it has potential to be applied to neuro-steered hearing devices. To effectively utilize this technology, it is essential to enhance the accessibility of EEG experimental setup and analysis. The aim of the study was to develop a cost-efficient neural tracking system and validate the feasibility of neural tracking measurement by conducting an AAD task using an offline and real-time decoder model outside the soundproof environment. We devised a neural tracking system capable of conducting AAD experiments using an OpenBCI and Arduino board. Nine participants were recruited to assess the performance of the AAD using the developed system, which involved presenting competing speech signals in an experiment setting without soundproofing. As a result, the offline decoder model demonstrated an average performance of 90%, and real-time decoder model exhibited a performance of 78%. The present study demonstrates the feasibility of implementing neural tracking and AAD using cost-effective devices in a practical environment.
Collapse
Affiliation(s)
- Jiyeon Ha
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, 60322, Frankfurt\ Main, Germany
| | - Yoonseob Lim
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
| | - Jae Ho Chung
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul, 04763, Korea.
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Hanyang University, 222-Wangshimni-ro, Seongdong-gu, Seoul, 133-792, Korea.
| |
Collapse
|
22
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. Front Hum Neurosci 2023; 17:1283206. [PMID: 38162285 PMCID: PMC10754997 DOI: 10.3389/fnhum.2023.1283206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/20/2023] [Indexed: 01/03/2024] Open
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers-an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
| | | | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
23
|
Commuri V, Kulasingham JP, Simon JZ. Cortical responses time-locked to continuous speech in the high-gamma band depend on selective attention. Front Neurosci 2023; 17:1264453. [PMID: 38156264 PMCID: PMC10752935 DOI: 10.3389/fnins.2023.1264453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/21/2023] [Indexed: 12/30/2023] Open
Abstract
Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of ~40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.
Collapse
Affiliation(s)
- Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, MD, United States
| |
Collapse
|
24
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
25
|
Çetinçelik M, Rowland CF, Snijders TM. Ten-month-old infants' neural tracking of naturalistic speech is not facilitated by the speaker's eye gaze. Dev Cogn Neurosci 2023; 64:101297. [PMID: 37778275 PMCID: PMC10543766 DOI: 10.1016/j.dcn.2023.101297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 08/21/2023] [Accepted: 09/08/2023] [Indexed: 10/03/2023] Open
Abstract
Eye gaze is a powerful ostensive cue in infant-caregiver interactions, with demonstrable effects on language acquisition. While the link between gaze following and later vocabulary is well-established, the effects of eye gaze on other aspects of language, such as speech processing, are less clear. In this EEG study, we examined the effects of the speaker's eye gaze on ten-month-old infants' neural tracking of naturalistic audiovisual speech, a marker for successful speech processing. Infants watched videos of a speaker telling stories, addressing the infant with direct or averted eye gaze. We assessed infants' speech-brain coherence at stress (1-1.75 Hz) and syllable (2.5-3.5 Hz) rates, tested for differences in attention by comparing looking times and EEG theta power in the two conditions, and investigated whether neural tracking predicts later vocabulary. Our results showed that infants' brains tracked the speech rhythm both at the stress and syllable rates, and that infants' neural tracking at the syllable rate predicted later vocabulary. However, speech-brain coherence did not significantly differ between direct and averted gaze conditions and infants did not show greater attention to direct gaze. Overall, our results suggest significant neural tracking at ten months, related to vocabulary development, but not modulated by speaker's gaze.
Collapse
Affiliation(s)
- Melis Çetinçelik
- Department of Experimental Psychology, Utrecht University, Utrecht, the Netherlands; Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands.
| | - Caroline F Rowland
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Tineke M Snijders
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands; Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands; Cognitive Neuropsychology Department, Tilburg University, Tilburg, the Netherlands
| |
Collapse
|
26
|
Wilroth J, Bernhardsson B, Heskebeck F, Skoglund MA, Bergeling C, Alickovic E. Improving EEG-based decoding of the locus of auditory attention through domain adaptation . J Neural Eng 2023; 20:066022. [PMID: 37988748 DOI: 10.1088/1741-2552/ad0e7b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 11/21/2023] [Indexed: 11/23/2023]
Abstract
Objective.This paper presents a novel domain adaptation (DA) framework to enhance the accuracy of electroencephalography (EEG)-based auditory attention classification, specifically for classifying the direction (left or right) of attended speech. The framework aims to improve the performances for subjects with initially low classification accuracy, overcoming challenges posed by instrumental and human factors. Limited dataset size, variations in EEG data quality due to factors such as noise, electrode misplacement or subjects, and the need for generalization across different trials, conditions and subjects necessitate the use of DA methods. By leveraging DA methods, the framework can learn from one EEG dataset and adapt to another, potentially resulting in more reliable and robust classification models.Approach.This paper focuses on investigating a DA method, based on parallel transport, for addressing the auditory attention classification problem. The EEG data utilized in this study originates from an experiment where subjects were instructed to selectively attend to one of the two spatially separated voices presented simultaneously.Main results.Significant improvement in classification accuracy was observed when poor data from one subject was transported to the domain of good data from different subjects, as compared to the baseline. The mean classification accuracy for subjects with poor data increased from 45.84% to 67.92%. Specifically, the highest achieved classification accuracy from one subject reached 83.33%, a substantial increase from the baseline accuracy of 43.33%.Significance.The findings of our study demonstrate the improved classification performances achieved through the implementation of DA methods. This brings us a step closer to leveraging EEG in neuro-steered hearing devices.
Collapse
Affiliation(s)
- Johanna Wilroth
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| | - Bo Bernhardsson
- Department of Automatic Control, Lund University, Lund, Sweden
| | - Frida Heskebeck
- Department of Automatic Control, Lund University, Lund, Sweden
| | - Martin A Skoglund
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | - Carolina Bergeling
- Department of Mathematics and Natural Sciences, Blekinge Institute of Technology, Karlskrona, Sweden
| | - Emina Alickovic
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| |
Collapse
|
27
|
Li Y, Anumanchipalli GK, Mohamed A, Chen P, Carney LH, Lu J, Wu J, Chang EF. Dissecting neural computations in the human auditory pathway using deep neural networks for speech. Nat Neurosci 2023; 26:2213-2225. [PMID: 37904043 PMCID: PMC10689246 DOI: 10.1038/s41593-023-01468-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/13/2023] [Indexed: 11/01/2023]
Abstract
The human auditory system extracts rich linguistic abstractions from speech signals. Traditional approaches to understanding this complex process have used linear feature-encoding models, with limited success. Artificial neural networks excel in speech recognition tasks and offer promising computational models of speech processing. We used speech representations in state-of-the-art deep neural network (DNN) models to investigate neural coding from the auditory nerve to the speech cortex. Representations in hierarchical layers of the DNN correlated well with the neural activity throughout the ascending auditory system. Unsupervised speech models performed at least as well as other purely supervised or fine-tuned models. Deeper DNN layers were better correlated with the neural activity in the higher-order auditory cortex, with computations aligned with phonemic and syllabic structures in speech. Accordingly, DNN models trained on either English or Mandarin predicted cortical responses in native speakers of each language. These results reveal convergence between DNN model representations and the biological auditory pathway, offering new approaches for modeling neural coding in the auditory cortex.
Collapse
Affiliation(s)
- Yuanning Li
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Gopala K Anumanchipalli
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | | | - Peili Chen
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materialsand Devices, ShanghaiTech University, Shanghai, China
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Junfeng Lu
- Neurologic Surgery Department, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- Brain Function Laboratory, Neurosurgical Institute, Fudan University, Shanghai, China
| | - Jinsong Wu
- Neurologic Surgery Department, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- Brain Function Laboratory, Neurosurgical Institute, Fudan University, Shanghai, China
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
28
|
Di Liberto GM, Attaheri A, Cantisani G, Reilly RB, Ní Choisdealbha Á, Rocha S, Brusini P, Goswami U. Emergence of the cortical encoding of phonetic features in the first year of life. Nat Commun 2023; 14:7789. [PMID: 38040720 PMCID: PMC10692113 DOI: 10.1038/s41467-023-43490-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 11/10/2023] [Indexed: 12/03/2023] Open
Abstract
Even prior to producing their first words, infants are developing a sophisticated speech processing system, with robust word recognition present by 4-6 months of age. These emergent linguistic skills, observed with behavioural investigations, are likely to rely on increasingly sophisticated neural underpinnings. The infant brain is known to robustly track the speech envelope, however previous cortical tracking studies were unable to demonstrate the presence of phonetic feature encoding. Here we utilise temporal response functions computed from electrophysiological responses to nursery rhymes to investigate the cortical encoding of phonetic features in a longitudinal cohort of infants when aged 4, 7 and 11 months, as well as adults. The analyses reveal an increasingly detailed and acoustically invariant phonetic encoding emerging over the first year of life, providing neurophysiological evidence that the pre-verbal human cortex learns phonetic categories. By contrast, we found no credible evidence for age-related increases in cortical tracking of the acoustic spectrogram.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College, The University of Dublin, Dublin, Ireland.
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland.
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom.
| | - Adam Attaheri
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Giorgia Cantisani
- ADAPT Centre, School of Computer Science and Statistics, Trinity College, The University of Dublin, Dublin, Ireland
- Laboratoire des Systémes Perceptifs, Département d'études Cognitives, École normale supérieure, PSL University, CNRS, 75005, Paris, France
| | - Richard B Reilly
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- School of Engineering, Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin., Dublin, Ireland
- School of Medicine, Trinity College, The University of Dublin, Dublin, Ireland
| | - Áine Ní Choisdealbha
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Sinead Rocha
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Perrine Brusini
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
29
|
Brodbeck C, Das P, Gillis M, Kulasingham JP, Bhattasali S, Gaston P, Resnik P, Simon JZ. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife 2023; 12:e85012. [PMID: 38018501 PMCID: PMC10783870 DOI: 10.7554/elife.85012] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 11/24/2023] [Indexed: 11/30/2023] Open
Abstract
Even though human experience unfolds continuously in time, it is not strictly linear; instead, it entails cascading processes building hierarchical cognitive structures. For instance, during speech perception, humans transform a continuously varying acoustic signal into phonemes, words, and meaning, and these levels all have distinct but interdependent temporal structures. Time-lagged regression using temporal response functions (TRFs) has recently emerged as a promising tool for disentangling electrophysiological brain responses related to such complex models of perception. Here, we introduce the Eelbrain Python toolkit, which makes this kind of analysis easy and accessible. We demonstrate its use, using continuous speech as a sample paradigm, with a freely available EEG dataset of audiobook listening. A companion GitHub repository provides the complete source code for the analysis, from raw data to group-level statistics. More generally, we advocate a hypothesis-driven approach in which the experimenter specifies a hierarchy of time-continuous representations that are hypothesized to have contributed to brain responses, and uses those as predictor variables for the electrophysiological signal. This is analogous to a multiple regression problem, but with the addition of a time dimension. TRF analysis decomposes the brain signal into distinct responses associated with the different predictor variables by estimating a multivariate TRF (mTRF), quantifying the influence of each predictor on brain responses as a function of time(-lags). This allows asking two questions about the predictor variables: (1) Is there a significant neural representation corresponding to this predictor variable? And if so, (2) what are the temporal characteristics of the neural response associated with it? Thus, different predictor variables can be systematically combined and evaluated to jointly model neural processing at multiple hierarchical levels. We discuss applications of this approach, including the potential for linking algorithmic/representational theories at different cognitive levels to brain responses through computational models with appropriate linking hypotheses.
Collapse
Affiliation(s)
| | - Proloy Das
- Stanford UniversityStanfordUnited States
| | | | | | | | | | - Philip Resnik
- University of Maryland, College ParkCollege ParkUnited States
| | | |
Collapse
|
30
|
Wang B, Xu X, Niu Y, Wu C, Wu X, Chen J. EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners. Cereb Cortex 2023; 33:10972-10983. [PMID: 37750333 DOI: 10.1093/cercor/bhad325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/21/2023] [Accepted: 08/22/2023] [Indexed: 09/27/2023] Open
Abstract
Auditory attention decoding (AAD) was used to determine the attended speaker during an auditory selective attention task. However, the auditory factors modulating AAD remained unclear for hearing-impaired (HI) listeners. In this study, scalp electroencephalogram (EEG) was recorded with an auditory selective attention paradigm, in which HI listeners were instructed to attend one of the two simultaneous speech streams with or without congruent visual input (articulation movements), and at a high or low target-to-masker ratio (TMR). Meanwhile, behavioral hearing tests (i.e. audiogram, speech reception threshold, temporal modulation transfer function) were used to assess listeners' individual auditory abilities. The results showed that both visual input and increasing TMR could significantly enhance the cortical tracking of the attended speech and AAD accuracy. Further analysis revealed that the audiovisual (AV) gain in attended speech cortical tracking was significantly correlated with listeners' auditory amplitude modulation (AM) sensitivity, and the TMR gain in attended speech cortical tracking was significantly correlated with listeners' hearing thresholds. Temporal response function analysis revealed that subjects with higher AM sensitivity demonstrated more AV gain over the right occipitotemporal and bilateral frontocentral scalp electrodes.
Collapse
Affiliation(s)
- Bo Wang
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Xiran Xu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Yadong Niu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
| | - Chao Wu
- School of Nursing, Peking University, Beijing 100191, China
| | - Xihong Wu
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| | - Jing Chen
- Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Intelligence Science and Technology, Peking University, Beijing 100871, China
- National Biomedical Imaging Center, College of Future Technology, Beijing 100871, China
| |
Collapse
|
31
|
Li J, Hong B, Nolte G, Engel AK, Zhang D. EEG-based speaker-listener neural coupling reflects speech-selective attentional mechanisms beyond the speech stimulus. Cereb Cortex 2023; 33:11080-11091. [PMID: 37814353 DOI: 10.1093/cercor/bhad347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/01/2023] [Accepted: 09/04/2023] [Indexed: 10/11/2023] Open
Abstract
When we pay attention to someone, do we focus only on the sound they make, the word they use, or do we form a mental space shared with the speaker we want to pay attention to? Some would argue that the human language is no other than a simple signal, but others claim that human beings understand each other because they form a shared mental ground between the speaker and the listener. Our study aimed to explore the neural mechanisms of speech-selective attention by investigating the electroencephalogram-based neural coupling between the speaker and the listener in a cocktail party paradigm. The temporal response function method was employed to reveal how the listener was coupled to the speaker at the neural level. The results showed that the neural coupling between the listener and the attended speaker peaked 5 s before speech onset at the delta band over the left frontal region, and was correlated with speech comprehension performance. In contrast, the attentional processing of speech acoustics and semantics occurred primarily at a later stage after speech onset and was not significantly correlated with comprehension performance. These findings suggest a predictive mechanism to achieve speaker-listener neural coupling for successful speech comprehension.
Collapse
Affiliation(s)
- Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee, Berlin 14195, Germany
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| |
Collapse
|
32
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
33
|
Van Hirtum T, Somers B, Dieudonné B, Verschueren E, Wouters J, Francart T. Neural envelope tracking predicts speech intelligibility and hearing aid benefit in children with hearing loss. Hear Res 2023; 439:108893. [PMID: 37806102 DOI: 10.1016/j.heares.2023.108893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/01/2023] [Accepted: 09/27/2023] [Indexed: 10/10/2023]
Abstract
Early assessment of hearing aid benefit is crucial, as the extent to which hearing aids provide audible speech information predicts speech and language outcomes. A growing body of research has proposed neural envelope tracking as an objective measure of speech intelligibility, particularly for individuals unable to provide reliable behavioral feedback. However, its potential for evaluating speech intelligibility and hearing aid benefit in children with hearing loss remains unexplored. In this study, we investigated neural envelope tracking in children with permanent hearing loss through two separate experiments. EEG data were recorded while children listened to age-appropriate stories (Experiment 1) or an animated movie (Experiment 2) under aided and unaided conditions (using personal hearing aids) at multiple stimulus intensities. Neural envelope tracking was evaluated using a linear decoder reconstructing the speech envelope from the EEG in the delta band (0.5-4 Hz). Additionally, we calculated temporal response functions (TRFs) to investigate the spatio-temporal dynamics of the response. In both experiments, neural tracking increased with increasing stimulus intensity, but only in the unaided condition. In the aided condition, neural tracking remained stable across a wide range of intensities, as long as speech intelligibility was maintained. Similarly, TRF amplitudes increased with increasing stimulus intensity in the unaided condition, while in the aided condition significant differences were found in TRF latency rather than TRF amplitude. This suggests that decreasing stimulus intensity does not necessarily impact neural tracking. Furthermore, the use of personal hearing aids significantly enhanced neural envelope tracking, particularly in challenging speech conditions that would be inaudible when unaided. Finally, we found a strong correlation between neural envelope tracking and behaviorally measured speech intelligibility for both narrated stories (Experiment 1) and movie stimuli (Experiment 2). Altogether, these findings indicate that neural envelope tracking could be a valuable tool for predicting speech intelligibility benefits derived from personal hearing aids in hearing-impaired children. Incorporating narrated stories or engaging movies expands the accessibility of these methods even in clinical settings, offering new avenues for using objective speech measures to guide pediatric audiology decision-making.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Jan Wouters
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium.
| |
Collapse
|
34
|
Schüller A, Schilling A, Krauss P, Rampp S, Reichenbach T. Attentional Modulation of the Cortical Contribution to the Frequency-Following Response Evoked by Continuous Speech. J Neurosci 2023; 43:7429-7440. [PMID: 37793908 PMCID: PMC10621774 DOI: 10.1523/jneurosci.1247-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/07/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open
Abstract
Selective attention to one of several competing speakers is required for comprehending a target speaker among other voices and for successful communication with them. It moreover has been found to involve the neural tracking of low-frequency speech rhythms in the auditory cortex. Effects of selective attention have also been found in subcortical neural activities, in particular regarding the frequency-following response related to the fundamental frequency of speech (speech-FFR). Recent investigations have, however, shown that the speech-FFR contains cortical contributions as well. It remains unclear whether these are also modulated by selective attention. Here we used magnetoencephalography to assess the attentional modulation of the cortical contributions to the speech-FFR. We presented both male and female participants with two competing speech signals and analyzed the cortical responses during attentional switching between the two speakers. Our findings revealed robust attentional modulation of the cortical contribution to the speech-FFR: the neural responses were higher when the speaker was attended than when they were ignored. We also found that, regardless of attention, a voice with a lower fundamental frequency elicited a larger cortical contribution to the speech-FFR than a voice with a higher fundamental frequency. Our results show that the attentional modulation of the speech-FFR does not only occur subcortically but extends to the auditory cortex as well.SIGNIFICANCE STATEMENT Understanding speech in noise requires attention to a target speaker. One of the speech features that a listener can use to identify a target voice among others and attend it is the fundamental frequency, together with its higher harmonics. The fundamental frequency arises from the opening and closing of the vocal folds and is tracked by high-frequency neural activity in the auditory brainstem and in the cortex. Previous investigations showed that the subcortical neural tracking is modulated by selective attention. Here we show that attention affects the cortical tracking of the fundamental frequency as well: it is stronger when a particular voice is attended than when it is ignored.
Collapse
Affiliation(s)
- Alina Schüller
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Achim Schilling
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Patrick Krauss
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
- Pattern Recognition Lab, Department Computer Science, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Stefan Rampp
- Department of Neurosurgery, University Hospital Erlangen, 91058 Erlangen, Germany
- Department of Neurosurgery, University Hospital Halle (Saale), 06120 Halle (Saale), Germany
- Department of Neuroradiology, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Tobias Reichenbach
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
35
|
Brown JA, Bidelman GM. Attention, Musicality, and Familiarity Shape Cortical Speech Tracking at the Musical Cocktail Party. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.28.562773. [PMID: 37961204 PMCID: PMC10634879 DOI: 10.1101/2023.10.28.562773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
The "cocktail party problem" challenges our ability to understand speech in noisy environments, which often include background music. Here, we explored the role of background music in speech-in-noise listening. Participants listened to an audiobook in familiar and unfamiliar music while tracking keywords in either speech or song lyrics. We used EEG to measure neural tracking of the audiobook. When speech was masked by music, the modeled peak latency at 50 ms (P1TRF) was prolonged compared to unmasked. Additionally, P1TRF amplitude was larger in unfamiliar background music, suggesting improved speech tracking. We observed prolonged latencies at 100 ms (N1TRF) when speech was not the attended stimulus, though only in less musical listeners. Our results suggest early neural representations of speech are enhanced with both attention and concurrent unfamiliar music, indicating familiar music is more distracting. One's ability to perceptually filter "musical noise" at the cocktail party depends on objective musical abilities.
Collapse
Affiliation(s)
- Jane A. Brown
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA
| | - Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- Cognitive Science Program, Indiana University, Bloomington, IN, USA
| |
Collapse
|
36
|
Commuri V, Kulasingham JP, Simon JZ. Cortical Responses Time-Locked to Continuous Speech in the High-Gamma Band Depend on Selective Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.20.549567. [PMID: 37546895 PMCID: PMC10401961 DOI: 10.1101/2023.07.20.549567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of approximately 40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.
Collapse
Affiliation(s)
- Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, MD, United States
| |
Collapse
|
37
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
38
|
Cervantes Constantino F, Sánchez-Costa T, Cipriani GA, Carboni A. Visuospatial attention revamps cortical processing of sound amid audiovisual uncertainty. Psychophysiology 2023; 60:e14329. [PMID: 37166096 DOI: 10.1111/psyp.14329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 04/13/2023] [Accepted: 04/25/2023] [Indexed: 05/12/2023]
Abstract
Selective attentional biases arising from one sensory modality manifest in others. The effects of visuospatial attention, important in visual object perception, are unclear in the auditory domain during audiovisual (AV) scene processing. We investigate temporal and spatial factors that underlie such transfer neurally. Auditory encoding of random tone pips in AV scenes was addressed via a temporal response function model (TRF) of participants' electroencephalogram (N = 30). The spatially uninformative pips were associated with spatially distributed visual contrast reversals ("flips"), through asynchronous probabilistic AV temporal onset distributions. Participants deployed visuospatial selection on these AV stimuli to perform a task. A late (~300 ms) cross-modal influence over the neural representation of pips was found in the original and a replication study (N = 21). Transfer depended on selected visual input being (i) presented during or shortly after a related sound, in relatively limited temporal distributions (<165 ms); (ii) positioned across limited (1:4) visual foreground to background ratios. Neural encoding of auditory input, as a function of visual input, was largest at visual foreground quadrant sectors and lowest at locations opposite to the target. The results indicate that ongoing neural representations of sounds incorporate visuospatial attributes for auditory stream segregation, as cross-modal transfer conveys information that specifies the identity of multisensory signals. A potential mechanism is by enhancing or recalibrating the tuning properties of the auditory populations that represent them as objects. The results account for the dynamic evolution under visual attention of multisensory integration, specifying critical latencies at which relevant cortical networks operate.
Collapse
Affiliation(s)
- Francisco Cervantes Constantino
- Centro de Investigación Básica en Psicología, Facultad de Psicología, Universidad de la República, Montevideo, Uruguay
- Instituto de Fundamentos y Métodos en Psicología, Facultad de Psicología, Universidad de la República, Montevideo, Uruguay
- Instituto de Investigaciones Biológicas "Clemente Estable", Montevideo, Uruguay
| | - Thaiz Sánchez-Costa
- Centro de Investigación Básica en Psicología, Facultad de Psicología, Universidad de la República, Montevideo, Uruguay
| | - Germán A Cipriani
- Centro de Investigación Básica en Psicología, Facultad de Psicología, Universidad de la República, Montevideo, Uruguay
| | - Alejandra Carboni
- Centro de Investigación Básica en Psicología, Facultad de Psicología, Universidad de la República, Montevideo, Uruguay
- Instituto de Fundamentos y Métodos en Psicología, Facultad de Psicología, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
39
|
Grijseels DM, Prendergast BJ, Gorman JC, Miller CT. The neurobiology of vocal communication in marmosets. Ann N Y Acad Sci 2023; 1528:13-28. [PMID: 37615212 PMCID: PMC10592205 DOI: 10.1111/nyas.15057] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
An increasingly popular animal model for studying the neural basis of social behavior, cognition, and communication is the common marmoset (Callithrix jacchus). Interest in this New World primate across neuroscience is now being driven by their proclivity for prosociality across their repertoire, high volubility, and rapid development, as well as their amenability to naturalistic testing paradigms and freely moving neural recording and imaging technologies. The complement of these characteristics set marmosets up to be a powerful model of the primate social brain in the years to come. Here, we focus on vocal communication because it is the area that has both made the most progress and illustrates the prodigious potential of this species. We review the current state of the field with a focus on the various brain areas and networks involved in vocal perception and production, comparing the findings from marmosets to other animals, including humans.
Collapse
Affiliation(s)
- Dori M Grijseels
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
| | - Brendan J Prendergast
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
| | - Julia C Gorman
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, California, USA
| | - Cory T Miller
- Cortical Systems and Behavior Laboratory, University of California, San Diego, La Jolla, California, USA
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, California, USA
| |
Collapse
|
40
|
Kovács P, Szalárdy O, Winkler I, Tóth B. Two effects of perceived speaker similarity in resolving the cocktail party situation - ERPs and functional connectivity. Biol Psychol 2023; 182:108651. [PMID: 37517603 DOI: 10.1016/j.biopsycho.2023.108651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 07/15/2023] [Accepted: 07/24/2023] [Indexed: 08/01/2023]
Abstract
Following a speaker in multi-talker environments requires the listener to separate the speakers' voices and continuously focus attention on one speech stream. While the dissimilarity of voices may make speaker separation easier, it may also affect maintaining the focus of attention. To assess these effects, electrophysiological (EEG) and behavioral data were collected from healthy young adults while they listened to two concurrent speech streams performing an online lexical detection task and an offline recognition memory task. Perceptual speaker similarity was manipulated on four levels: identical, similar, dissimilar, and opposite-gender speakers. Behavioral and electrophysiological data suggested that, while speaker similarity hinders auditory stream segregation, dissimilarity hinders maintaining the focus of attention by making the to-be-ignored speech stream more distracting. Thus, resolving the cocktail party situation poses different problems at different levels of perceived speaker similarity, resulting in different listening strategies.
Collapse
Affiliation(s)
- Petra Kovács
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary; Department of Cognitive Science, Budapest University of Technology and Economics, Budapest, Hungary
| | - Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary; Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary.
| |
Collapse
|
41
|
Puffay C, Vanthornhout J, Gillis M, Accou B, Van Hamme H, Francart T. Robust neural tracking of linguistic speech representations using a convolutional neural network. J Neural Eng 2023; 20:046040. [PMID: 37595606 DOI: 10.1088/1741-2552/acf1ce] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/18/2023] [Indexed: 08/20/2023]
Abstract
Objective.When listening to continuous speech, populations of neurons in the brain track different features of the signal. Neural tracking can be measured by relating the electroencephalography (EEG) and the speech signal. Recent studies have shown a significant contribution of linguistic features over acoustic neural tracking using linear models. However, linear models cannot model the nonlinear dynamics of the brain. To overcome this, we use a convolutional neural network (CNN) that relates EEG to linguistic features using phoneme or word onsets as a control and has the capacity to model non-linear relations.Approach.We integrate phoneme- and word-based linguistic features (phoneme surprisal, cohort entropy (CE), word surprisal (WS) and word frequency (WF)) in our nonlinear CNN model and investigate if they carry additional information on top of lexical features (phoneme and word onsets). We then compare the performance of our nonlinear CNN with that of a linear encoder and a linearized CNN.Main results.For the non-linear CNN, we found a significant contribution of CE over phoneme onsets and of WS and WF over word onsets. Moreover, the non-linear CNN outperformed the linear baselines.Significance.Measuring coding of linguistic features in the brain is important for auditory neuroscience research and applications that involve objectively measuring speech understanding. With linear models, this is measurable, but the effects are very small. The proposed non-linear CNN model yields larger differences between linguistic and lexical models and, therefore, could show effects that would otherwise be unmeasurable and may, in the future, lead to improved within-subject measures and shorter recordings.
Collapse
Affiliation(s)
- Corentin Puffay
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | | | - Marlies Gillis
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Bernd Accou
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| |
Collapse
|
42
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554451. [PMID: 37662393 PMCID: PMC10473711 DOI: 10.1101/2023.08.23.554451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers - an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R. Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| |
Collapse
|
43
|
Jia Z, Xu C, Li J, Gao J, Ding N, Luo B, Zou J. Phase Property of Envelope-Tracking EEG Response Is Preserved in Patients with Disorders of Consciousness. eNeuro 2023; 10:ENEURO.0130-23.2023. [PMID: 37500493 PMCID: PMC10420405 DOI: 10.1523/eneuro.0130-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/16/2023] [Accepted: 07/20/2023] [Indexed: 07/29/2023] Open
Abstract
When listening to speech, the low-frequency cortical response below 10 Hz can track the speech envelope. Previous studies have demonstrated that the phase lag between speech envelope and cortical response can reflect the mechanism by which the envelope-tracking response is generated. Here, we analyze whether the mechanism to generate the envelope-tracking response is modulated by the level of consciousness, by studying how the stimulus-response phase lag is modulated by the disorder of consciousness (DoC). It is observed that DoC patients in general show less reliable neural tracking of speech. Nevertheless, the stimulus-response phase lag changes linearly with frequency between 3.5 and 8 Hz, for DoC patients who show reliable cortical tracking to speech, regardless of the consciousness state. The mean phase lag is also consistent across these DoC patients. These results suggest that the envelope-tracking response to speech can be generated by an automatic process that is barely modulated by the consciousness state.
Collapse
Affiliation(s)
- Ziting Jia
- The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250033, China
| | - Chuan Xu
- Department of Neurology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou 310019, China
| | - Jingqi Li
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Jian Gao
- Department of Rehabilitation, Hangzhou Mingzhou Brain Rehabilitation Hospital, Hangzhou 311215, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Benyan Luo
- Department of Neurology, First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310003, China
| | - Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
44
|
Yasmin S, Irsik VC, Johnsrude IS, Herrmann B. The effects of speech masking on neural tracking of acoustic and semantic features of natural speech. Neuropsychologia 2023; 186:108584. [PMID: 37169066 DOI: 10.1016/j.neuropsychologia.2023.108584] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 04/30/2023] [Accepted: 05/08/2023] [Indexed: 05/13/2023]
Abstract
Listening environments contain background sounds that mask speech and lead to communication challenges. Sensitivity to slow acoustic fluctuations in speech can help segregate speech from background noise. Semantic context can also facilitate speech perception in noise, for example, by enabling prediction of upcoming words. However, not much is known about how different degrees of background masking affect the neural processing of acoustic and semantic features during naturalistic speech listening. In the current electroencephalography (EEG) study, participants listened to engaging, spoken stories masked at different levels of multi-talker babble to investigate how neural activity in response to acoustic and semantic features changes with acoustic challenges, and how such effects relate to speech intelligibility. The pattern of neural response amplitudes associated with both acoustic and semantic speech features across masking levels was U-shaped, such that amplitudes were largest for moderate masking levels. This U-shape may be due to increased attentional focus when speech comprehension is challenging, but manageable. The latency of the neural responses increased linearly with increasing background masking, and neural latency change associated with acoustic processing most closely mirrored the changes in speech intelligibility. Finally, tracking responses related to semantic dissimilarity remained robust until severe speech masking (-3 dB SNR). The current study reveals that neural responses to acoustic features are highly sensitive to background masking and decreasing speech intelligibility, whereas neural responses to semantic features are relatively robust, suggesting that individuals track the meaning of the story well even in moderate background sound.
Collapse
Affiliation(s)
- Sonia Yasmin
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada.
| | - Vanessa C Irsik
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada
| | - Ingrid S Johnsrude
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada; School of Communication and Speech Disorders,The University of Western Ontario, London, ON, N6A 5B7, Canada
| | - Björn Herrmann
- Rotman Research Institute, Baycrest, M6A 2E1, Toronto, ON, Canada; Department of Psychology,University of Toronto, M5S 1A1, Toronto, ON, Canada
| |
Collapse
|
45
|
Xia Y, Geng M, Chen Y, Sun S, Liao C, Zhu Z, Li Z, Ochieng WY, Angeloudis P, Elhajj M, Zhang L, Zeng Z, Zhang B, Gao Z, Chen X(M. Understanding common human driving semantics for autonomous vehicles. PATTERNS (NEW YORK, N.Y.) 2023; 4:100730. [PMID: 37521046 PMCID: PMC10382946 DOI: 10.1016/j.patter.2023.100730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/05/2022] [Accepted: 03/20/2023] [Indexed: 08/01/2023]
Abstract
Autonomous vehicles will share roads with human-driven vehicles until the transition to fully autonomous transport systems is complete. The critical challenge of improving mutual understanding between both vehicle types cannot be addressed only by feeding extensive driving data into data-driven models but by enabling autonomous vehicles to understand and apply common driving behaviors analogous to human drivers. Therefore, we designed and conducted two electroencephalography experiments for comparing the cerebral activities of human linguistics and driving understanding. The results showed that driving activates hierarchical neural functions in the auditory cortex, which is analogous to abstraction in linguistic understanding. Subsequently, we proposed a neural-informed, semantics-driven framework to understand common human driving behavior in a brain-inspired manner. This study highlights the pathway of fusing neuroscience into complex human behavior understanding tasks and provides a computational neural model to understand human driving behaviors, which will enable autonomous vehicles to perceive and think like human drivers.
Collapse
Affiliation(s)
- Yingji Xia
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Maosi Geng
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310015, China
| | - Yong Chen
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Sudan Sun
- School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Chenlei Liao
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Zheng Zhu
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Hangzhou 310027, China
- Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| | - Zhihui Li
- School of Transportation, Jilin University, Changchun 130022, China
| | - Washington Yotto Ochieng
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Panagiotis Angeloudis
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Mireille Elhajj
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Lei Zhang
- Alibaba Group, Hangzhou 310052, China
| | | | | | - Ziyou Gao
- School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China
| | - Xiqun (Michael) Chen
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Hangzhou 310027, China
- Zhejiang University/University of Illinois Urbana-Champaign (ZJU-UIUC) Institute, Zhejiang University, Haining 314400, China
- Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| |
Collapse
|
46
|
Wang Z, Shi N, Zhang Y, Zheng N, Li H, Jiao Y, Cheng J, Wang Y, Zhang X, Chen Y, Chen Y, Wang H, Xie T, Wang Y, Ma Y, Gao X, Feng X. Conformal in-ear bioelectronics for visual and auditory brain-computer interfaces. Nat Commun 2023; 14:4213. [PMID: 37452047 PMCID: PMC10349124 DOI: 10.1038/s41467-023-39814-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 06/28/2023] [Indexed: 07/18/2023] Open
Abstract
Brain-computer interfaces (BCIs) have attracted considerable attention in motor and language rehabilitation. Most devices use cap-based non-invasive, headband-based commercial products or microneedle-based invasive approaches, which are constrained for inconvenience, limited applications, inflammation risks and even irreversible damage to soft tissues. Here, we propose in-ear visual and auditory BCIs based on in-ear bioelectronics, named as SpiralE, which can adaptively expand and spiral along the auditory meatus under electrothermal actuation to ensure conformal contact. Participants achieve offline accuracies of 95% in 9-target steady state visual evoked potential (SSVEP) BCI classification and type target phrases successfully in a calibration-free 40-target online SSVEP speller experiment. Interestingly, in-ear SSVEPs exhibit significant 2nd harmonic tendencies, indicating that in-ear sensing may be complementary for studying harmonic spatial distributions in SSVEP studies. Moreover, natural speech auditory classification accuracy can reach 84% in cocktail party experiments. The SpiralE provides innovative concepts for designing 3D flexible bioelectronics and assists the development of biomedical engineering and neural monitoring.
Collapse
Affiliation(s)
- Zhouheng Wang
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Nanlin Shi
- Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China
| | - Yingchao Zhang
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Ning Zheng
- State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Haicheng Li
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Yang Jiao
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Jiahui Cheng
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Yutong Wang
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Xiaoqing Zhang
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, 100730, China
| | - Ying Chen
- Institute of Flexible Electronics Technology of THU, Zhejiang, Jiaxing, 314000, China
| | - Yihao Chen
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Heling Wang
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Tao Xie
- State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Yijun Wang
- Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
| | - Yinji Ma
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China.
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China.
| | - Xiaorong Gao
- Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China.
| | - Xue Feng
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China.
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
47
|
Liu W, Vicario DS. Dynamic encoding of phonetic categories in zebra finch auditory forebrain. Sci Rep 2023; 13:11172. [PMID: 37430030 DOI: 10.1038/s41598-023-37982-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 06/30/2023] [Indexed: 07/12/2023] Open
Abstract
Vocal communication requires the formation of acoustic categories to enable invariant representations of sounds despite superficial variations. Humans form acoustic categories for speech phonemes, enabling the listener to recognize words independent of speakers; animals can also discriminate speech phonemes. We investigated the neural mechanisms of this process using electrophysiological recordings from the zebra finch secondary auditory area, caudomedial nidopallium (NCM), during passive exposure to human speech stimuli consisting of two naturally spoken words produced by multiple speakers. Analysis of neural distance and decoding accuracy showed improvements in neural discrimination between word categories over the course of exposure, and this improved representation transferred to the same words by novel speakers. We conclude that NCM neurons formed generalized representations of word categories independent of speaker-specific variations that became more refined over the course of passive exposure. The discovery of this dynamic encoding process in NCM suggests a general processing mechanism for forming categorical representations of complex acoustic signals that humans share with other animals.
Collapse
Affiliation(s)
- Wanyi Liu
- Department of Psychology, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
| | - David S Vicario
- Department of Psychology, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|
48
|
Han C, Choudhari V, Li YA, Mesgarani N. Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-5. [PMID: 38083559 DOI: 10.1109/embc40787.2023.10340191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment. This is done by comparing the listener's brainwaves to a representation of all the sound sources to find the closest match. The representation is typically the waveform or spectrogram of the sounds. The effectiveness of these representations for AAD is uncertain. In this study, we examined the use of self-supervised learned speech representation in improving the accuracy and speed of AAD. We recorded the brain activity of three subjects using invasive electrocorticography (ECoG) as they listened to two conversations and focused on one. We used WavLM to extract a latent representation of each talker and trained a spatiotemporal filter to map brain activity to intermediate representations of speech. During the evaluation, the reconstructed representation is compared to each speaker's representation to determine the target speaker. Our results indicate that speech representation from WavLM provides better decoding accuracy and speed than the speech envelope and spectrogram. Our findings demonstrate the advantages of self-supervised learned speech representation for auditory attention decoding and pave the way for developing brain-controlled hearable technologies.
Collapse
|
49
|
Gillis M, Vanthornhout J, Francart T. Heard or Understood? Neural Tracking of Language Features in a Comprehensible Story, an Incomprehensible Story and a Word List. eNeuro 2023; 10:ENEURO.0075-23.2023. [PMID: 37451862 DOI: 10.1523/eneuro.0075-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/21/2023] [Accepted: 06/25/2023] [Indexed: 07/18/2023] Open
Abstract
Speech comprehension is a complex neural process on which relies on activation and integration of multiple brain regions. In the current study, we evaluated whether speech comprehension can be investigated by neural tracking. Neural tracking is the phenomenon in which the brain responses time-lock to the rhythm of specific features in continuous speech. These features can be acoustic, i.e., acoustic tracking, or derived from the content of the speech using language properties, i.e., language tracking. We evaluated whether neural tracking of speech differs between a comprehensible story, an incomprehensible story, and a word list. We evaluated the neural responses to speech of 19 participants (six men). No significant difference regarding acoustic tracking was found. However, significant language tracking was only found for the comprehensible story. The most prominent effect was visible to word surprisal, a language feature at the word level. The neural response to word surprisal showed a prominent negativity between 300 and 400 ms, similar to the N400 in evoked response paradigms. This N400 was significantly more negative when the story was comprehended, i.e., when words could be integrated in the context of previous words. These results show that language tracking can capture the effect of speech comprehension.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| |
Collapse
|
50
|
Ahmed F, Nidiffer AR, O'Sullivan AE, Zuk NJ, Lalor EC. The integration of continuous audio and visual speech in a cocktail-party environment depends on attention. Neuroimage 2023; 274:120143. [PMID: 37121375 DOI: 10.1016/j.neuroimage.2023.120143] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 03/17/2023] [Accepted: 04/27/2023] [Indexed: 05/02/2023] Open
Abstract
In noisy environments, our ability to understand speech benefits greatly from seeing the speaker's face. This is attributed to the brain's ability to integrate audio and visual information, a process known as multisensory integration. In addition, selective attention plays an enormous role in what we understand, the so-called cocktail-party phenomenon. But how attention and multisensory integration interact remains incompletely understood, particularly in the case of natural, continuous speech. Here, we addressed this issue by analyzing EEG data recorded from participants who undertook a multisensory cocktail-party task using natural speech. To assess multisensory integration, we modeled the EEG responses to the speech in two ways. The first assumed that audiovisual speech processing is simply a linear combination of audio speech processing and visual speech processing (i.e., an A + V model), while the second allows for the possibility of audiovisual interactions (i.e., an AV model). Applying these models to the data revealed that EEG responses to attended audiovisual speech were better explained by an AV model, providing evidence for multisensory integration. In contrast, unattended audiovisual speech responses were best captured using an A + V model, suggesting that multisensory integration is suppressed for unattended speech. Follow up analyses revealed some limited evidence for early multisensory integration of unattended AV speech, with no integration occurring at later levels of processing. We take these findings as evidence that the integration of natural audio and visual speech occurs at multiple levels of processing in the brain, each of which can be differentially affected by attention.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
| | - Aisling E O'Sullivan
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Nathaniel J Zuk
- Edmond & Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem, Israel
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.
| |
Collapse
|