51
|
Xia Y, Geng M, Chen Y, Sun S, Liao C, Zhu Z, Li Z, Ochieng WY, Angeloudis P, Elhajj M, Zhang L, Zeng Z, Zhang B, Gao Z, Chen X(M. Understanding common human driving semantics for autonomous vehicles. PATTERNS (NEW YORK, N.Y.) 2023; 4:100730. [PMID: 37521046 PMCID: PMC10382946 DOI: 10.1016/j.patter.2023.100730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/05/2022] [Accepted: 03/20/2023] [Indexed: 08/01/2023]
Abstract
Autonomous vehicles will share roads with human-driven vehicles until the transition to fully autonomous transport systems is complete. The critical challenge of improving mutual understanding between both vehicle types cannot be addressed only by feeding extensive driving data into data-driven models but by enabling autonomous vehicles to understand and apply common driving behaviors analogous to human drivers. Therefore, we designed and conducted two electroencephalography experiments for comparing the cerebral activities of human linguistics and driving understanding. The results showed that driving activates hierarchical neural functions in the auditory cortex, which is analogous to abstraction in linguistic understanding. Subsequently, we proposed a neural-informed, semantics-driven framework to understand common human driving behavior in a brain-inspired manner. This study highlights the pathway of fusing neuroscience into complex human behavior understanding tasks and provides a computational neural model to understand human driving behaviors, which will enable autonomous vehicles to perceive and think like human drivers.
Collapse
Affiliation(s)
- Yingji Xia
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Maosi Geng
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310015, China
| | - Yong Chen
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Sudan Sun
- School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Chenlei Liao
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Zheng Zhu
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Hangzhou 310027, China
- Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| | - Zhihui Li
- School of Transportation, Jilin University, Changchun 130022, China
| | - Washington Yotto Ochieng
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Panagiotis Angeloudis
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Mireille Elhajj
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Lei Zhang
- Alibaba Group, Hangzhou 310052, China
| | | | | | - Ziyou Gao
- School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China
| | - Xiqun (Michael) Chen
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Hangzhou 310027, China
- Zhejiang University/University of Illinois Urbana-Champaign (ZJU-UIUC) Institute, Zhejiang University, Haining 314400, China
- Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| |
Collapse
|
52
|
Wang Z, Shi N, Zhang Y, Zheng N, Li H, Jiao Y, Cheng J, Wang Y, Zhang X, Chen Y, Chen Y, Wang H, Xie T, Wang Y, Ma Y, Gao X, Feng X. Conformal in-ear bioelectronics for visual and auditory brain-computer interfaces. Nat Commun 2023; 14:4213. [PMID: 37452047 PMCID: PMC10349124 DOI: 10.1038/s41467-023-39814-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 06/28/2023] [Indexed: 07/18/2023] Open
Abstract
Brain-computer interfaces (BCIs) have attracted considerable attention in motor and language rehabilitation. Most devices use cap-based non-invasive, headband-based commercial products or microneedle-based invasive approaches, which are constrained for inconvenience, limited applications, inflammation risks and even irreversible damage to soft tissues. Here, we propose in-ear visual and auditory BCIs based on in-ear bioelectronics, named as SpiralE, which can adaptively expand and spiral along the auditory meatus under electrothermal actuation to ensure conformal contact. Participants achieve offline accuracies of 95% in 9-target steady state visual evoked potential (SSVEP) BCI classification and type target phrases successfully in a calibration-free 40-target online SSVEP speller experiment. Interestingly, in-ear SSVEPs exhibit significant 2nd harmonic tendencies, indicating that in-ear sensing may be complementary for studying harmonic spatial distributions in SSVEP studies. Moreover, natural speech auditory classification accuracy can reach 84% in cocktail party experiments. The SpiralE provides innovative concepts for designing 3D flexible bioelectronics and assists the development of biomedical engineering and neural monitoring.
Collapse
Affiliation(s)
- Zhouheng Wang
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Nanlin Shi
- Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China
| | - Yingchao Zhang
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Ning Zheng
- State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Haicheng Li
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Yang Jiao
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Jiahui Cheng
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Yutong Wang
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Xiaoqing Zhang
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, 100730, China
| | - Ying Chen
- Institute of Flexible Electronics Technology of THU, Zhejiang, Jiaxing, 314000, China
| | - Yihao Chen
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Heling Wang
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China
| | - Tao Xie
- State Key Laboratory of Chemical Engineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Yijun Wang
- Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
| | - Yinji Ma
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China.
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China.
| | - Xiaorong Gao
- Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China.
| | - Xue Feng
- Laboratory of Flexible Electronics Technology, Tsinghua University, Beijing, 100084, China.
- AML, Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
53
|
Liu W, Vicario DS. Dynamic encoding of phonetic categories in zebra finch auditory forebrain. Sci Rep 2023; 13:11172. [PMID: 37430030 DOI: 10.1038/s41598-023-37982-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 06/30/2023] [Indexed: 07/12/2023] Open
Abstract
Vocal communication requires the formation of acoustic categories to enable invariant representations of sounds despite superficial variations. Humans form acoustic categories for speech phonemes, enabling the listener to recognize words independent of speakers; animals can also discriminate speech phonemes. We investigated the neural mechanisms of this process using electrophysiological recordings from the zebra finch secondary auditory area, caudomedial nidopallium (NCM), during passive exposure to human speech stimuli consisting of two naturally spoken words produced by multiple speakers. Analysis of neural distance and decoding accuracy showed improvements in neural discrimination between word categories over the course of exposure, and this improved representation transferred to the same words by novel speakers. We conclude that NCM neurons formed generalized representations of word categories independent of speaker-specific variations that became more refined over the course of passive exposure. The discovery of this dynamic encoding process in NCM suggests a general processing mechanism for forming categorical representations of complex acoustic signals that humans share with other animals.
Collapse
Affiliation(s)
- Wanyi Liu
- Department of Psychology, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
| | - David S Vicario
- Department of Psychology, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|
54
|
Han C, Choudhari V, Li YA, Mesgarani N. Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-5. [PMID: 38083559 DOI: 10.1109/embc40787.2023.10340191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment. This is done by comparing the listener's brainwaves to a representation of all the sound sources to find the closest match. The representation is typically the waveform or spectrogram of the sounds. The effectiveness of these representations for AAD is uncertain. In this study, we examined the use of self-supervised learned speech representation in improving the accuracy and speed of AAD. We recorded the brain activity of three subjects using invasive electrocorticography (ECoG) as they listened to two conversations and focused on one. We used WavLM to extract a latent representation of each talker and trained a spatiotemporal filter to map brain activity to intermediate representations of speech. During the evaluation, the reconstructed representation is compared to each speaker's representation to determine the target speaker. Our results indicate that speech representation from WavLM provides better decoding accuracy and speed than the speech envelope and spectrogram. Our findings demonstrate the advantages of self-supervised learned speech representation for auditory attention decoding and pave the way for developing brain-controlled hearable technologies.
Collapse
|
55
|
Ahmed F, Nidiffer AR, O'Sullivan AE, Zuk NJ, Lalor EC. The integration of continuous audio and visual speech in a cocktail-party environment depends on attention. Neuroimage 2023; 274:120143. [PMID: 37121375 DOI: 10.1016/j.neuroimage.2023.120143] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 03/17/2023] [Accepted: 04/27/2023] [Indexed: 05/02/2023] Open
Abstract
In noisy environments, our ability to understand speech benefits greatly from seeing the speaker's face. This is attributed to the brain's ability to integrate audio and visual information, a process known as multisensory integration. In addition, selective attention plays an enormous role in what we understand, the so-called cocktail-party phenomenon. But how attention and multisensory integration interact remains incompletely understood, particularly in the case of natural, continuous speech. Here, we addressed this issue by analyzing EEG data recorded from participants who undertook a multisensory cocktail-party task using natural speech. To assess multisensory integration, we modeled the EEG responses to the speech in two ways. The first assumed that audiovisual speech processing is simply a linear combination of audio speech processing and visual speech processing (i.e., an A + V model), while the second allows for the possibility of audiovisual interactions (i.e., an AV model). Applying these models to the data revealed that EEG responses to attended audiovisual speech were better explained by an AV model, providing evidence for multisensory integration. In contrast, unattended audiovisual speech responses were best captured using an A + V model, suggesting that multisensory integration is suppressed for unattended speech. Follow up analyses revealed some limited evidence for early multisensory integration of unattended AV speech, with no integration occurring at later levels of processing. We take these findings as evidence that the integration of natural audio and visual speech occurs at multiple levels of processing in the brain, each of which can be differentially affected by attention.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
| | - Aisling E O'Sullivan
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Nathaniel J Zuk
- Edmond & Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem, Israel
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.
| |
Collapse
|
56
|
Gillis M, Vanthornhout J, Francart T. Heard or Understood? Neural Tracking of Language Features in a Comprehensible Story, an Incomprehensible Story and a Word List. eNeuro 2023; 10:ENEURO.0075-23.2023. [PMID: 37451862 DOI: 10.1523/eneuro.0075-23.2023] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/21/2023] [Accepted: 06/25/2023] [Indexed: 07/18/2023] Open
Abstract
Speech comprehension is a complex neural process on which relies on activation and integration of multiple brain regions. In the current study, we evaluated whether speech comprehension can be investigated by neural tracking. Neural tracking is the phenomenon in which the brain responses time-lock to the rhythm of specific features in continuous speech. These features can be acoustic, i.e., acoustic tracking, or derived from the content of the speech using language properties, i.e., language tracking. We evaluated whether neural tracking of speech differs between a comprehensible story, an incomprehensible story, and a word list. We evaluated the neural responses to speech of 19 participants (six men). No significant difference regarding acoustic tracking was found. However, significant language tracking was only found for the comprehensible story. The most prominent effect was visible to word surprisal, a language feature at the word level. The neural response to word surprisal showed a prominent negativity between 300 and 400 ms, similar to the N400 in evoked response paradigms. This N400 was significantly more negative when the story was comprehended, i.e., when words could be integrated in the context of previous words. These results show that language tracking can capture the effect of speech comprehension.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| |
Collapse
|
57
|
Viswanathan V, Bharadwaj HM, Heinz MG, Shinn-Cunningham BG. Induced alpha and beta electroencephalographic rhythms covary with single-trial speech intelligibility in competition. Sci Rep 2023; 13:10216. [PMID: 37353552 PMCID: PMC10290148 DOI: 10.1038/s41598-023-37173-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/17/2023] [Indexed: 06/25/2023] Open
Abstract
Neurophysiological studies suggest that intrinsic brain oscillations influence sensory processing, especially of rhythmic stimuli like speech. Prior work suggests that brain rhythms may mediate perceptual grouping and selective attention to speech amidst competing sound, as well as more linguistic aspects of speech processing like predictive coding. However, we know of no prior studies that have directly tested, at the single-trial level, whether brain oscillations relate to speech-in-noise outcomes. Here, we combined electroencephalography while simultaneously measuring intelligibility of spoken sentences amidst two different interfering sounds: multi-talker babble or speech-shaped noise. We find that induced parieto-occipital alpha (7-15 Hz; thought to modulate attentional focus) and frontal beta (13-30 Hz; associated with maintenance of the current sensorimotor state and predictive coding) oscillations covary with trial-wise percent-correct scores; importantly, alpha and beta power provide significant independent contributions to predicting single-trial behavioral outcomes. These results can inform models of speech processing and guide noninvasive measures to index different neural processes that together support complex listening.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Hari M Bharadwaj
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael G Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | | |
Collapse
|
58
|
Orf M, Wöstmann M, Hannemann R, Obleser J. Target enhancement but not distractor suppression in auditory neural tracking during continuous speech. iScience 2023; 26:106849. [PMID: 37305701 PMCID: PMC10251127 DOI: 10.1016/j.isci.2023.106849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/13/2023] [Accepted: 05/05/2023] [Indexed: 06/13/2023] Open
Abstract
Selective attention modulates the neural tracking of speech in auditory cortical regions. It is unclear whether this attentional modulation is dominated by enhanced target tracking, or suppression of distraction. To settle this long-standing debate, we employed an augmented electroencephalography (EEG) speech-tracking paradigm with target, distractor, and neutral streams. Concurrent target speech and distractor (i.e., sometimes relevant) speech were juxtaposed with a third, never task-relevant speech stream serving as neutral baseline. Listeners had to detect short target repeats and committed more false alarms originating from the distractor than from the neutral stream. Speech tracking revealed target enhancement but no distractor suppression below the neutral baseline. Speech tracking of the target (not distractor or neutral speech) explained single-trial accuracy in repeat detection. In sum, the enhanced neural representation of target speech is specific to processes of attentional gain for behaviorally relevant target speech rather than neural suppression of distraction.
Collapse
Affiliation(s)
- Martin Orf
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| | - Malte Wöstmann
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| | | | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| |
Collapse
|
59
|
Raghavan VS, O’Sullivan J, Bickel S, Mehta AD, Mesgarani N. Distinct neural encoding of glimpsed and masked speech in multitalker situations. PLoS Biol 2023; 21:e3002128. [PMID: 37279203 PMCID: PMC10243639 DOI: 10.1371/journal.pbio.3002128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/19/2023] [Indexed: 06/08/2023] Open
Abstract
Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.
Collapse
Affiliation(s)
- Vinay S Raghavan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - James O’Sullivan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - Stephan Bickel
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
- Department of Neurology, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Ashesh D. Mehta
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
60
|
Karunathilake IMD, Dunlap JL, Perera J, Presacco A, Decruy L, Anderson S, Kuchinsky SE, Simon JZ. Effects of aging on cortical representations of continuous speech. J Neurophysiol 2023; 129:1359-1377. [PMID: 37096924 PMCID: PMC10202479 DOI: 10.1152/jn.00356.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/04/2023] [Accepted: 04/20/2023] [Indexed: 04/26/2023] Open
Abstract
Understanding speech in a noisy environment is crucial in day-to-day interactions and yet becomes more challenging with age, even for healthy aging. Age-related changes in the neural mechanisms that enable speech-in-noise listening have been investigated previously; however, the extent to which age affects the timing and fidelity of encoding of target and interfering speech streams is not well understood. Using magnetoencephalography (MEG), we investigated how continuous speech is represented in auditory cortex in the presence of interfering speech in younger and older adults. Cortical representations were obtained from neural responses that time-locked to the speech envelopes with speech envelope reconstruction and temporal response functions (TRFs). TRFs showed three prominent peaks corresponding to auditory cortical processing stages: early (∼50 ms), middle (∼100 ms), and late (∼200 ms). Older adults showed exaggerated speech envelope representations compared with younger adults. Temporal analysis revealed both that the age-related exaggeration starts as early as ∼50 ms and that older adults needed a substantially longer integration time window to achieve their better reconstruction of the speech envelope. As expected, with increased speech masking envelope reconstruction for the attended talker decreased and all three TRF peaks were delayed, with aging contributing additionally to the reduction. Interestingly, for older adults the late peak was delayed, suggesting that this late peak may receive contributions from multiple sources. Together these results suggest that there are several mechanisms at play compensating for age-related temporal processing deficits at several stages but which are not able to fully reestablish unimpaired speech perception.NEW & NOTEWORTHY We observed age-related changes in cortical temporal processing of continuous speech that may be related to older adults' difficulty in understanding speech in noise. These changes occur in both timing and strength of the speech representations at different cortical processing stages and depend on both noise condition and selective attention. Critically, their dependence on noise condition changes dramatically among the early, middle, and late cortical processing stages, underscoring how aging differentially affects these stages.
Collapse
Affiliation(s)
- I M Dushyanthi Karunathilake
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
| | - Jason L Dunlap
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Janani Perera
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Alessandro Presacco
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
| | - Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, United States
| | - Stefanie E Kuchinsky
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland, United States
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States
- Department of Biology, University of Maryland, College Park, Maryland, United States
| |
Collapse
|
61
|
Viswanathan V, Bharadwaj HM, Heinz MG, Shinn-Cunningham BG. Induced Alpha And Beta Electroencephalographic Rhythms Covary With Single-Trial Speech Intelligibility In Competition. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2022.12.31.522365. [PMID: 36712081 PMCID: PMC9884507 DOI: 10.1101/2022.12.31.522365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Neurophysiological studies suggest that intrinsic brain oscillations influence sensory processing, especially of rhythmic stimuli like speech. Prior work suggests that brain rhythms may mediate perceptual grouping and selective attention to speech amidst competing sound, as well as more linguistic aspects of speech processing like predictive coding. However, we know of no prior studies that have directly tested, at the single-trial level, whether brain oscillations relate to speech-in-noise outcomes. Here, we combined electroencephalography while simultaneously measuring intelligibility of spoken sentences amidst two different interfering sounds: multi-talker babble or speech-shaped noise. We find that induced parieto-occipital alpha (7-15 Hz; thought to modulate attentional focus) and frontal beta (13-30 Hz; associated with maintenance of the current sensorimotor state and predictive coding) oscillations covary with trial-wise percent-correct scores; importantly, alpha and beta power provide significant independent contributions to predicting single-trial behavioral outcomes. These results can inform models of speech processing and guide noninvasive measures to index different neural processes that together support complex listening.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Neuroscience Institute, Carnegie Mellon University, Pitttsburgh, PA 15213
| | - Hari M. Bharadwaj
- Department of Communication Science and Disorders, University of Pittsburgh, Pitttsburgh, PA 15260
| | - Michael G. Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907
| | | |
Collapse
|
62
|
Van Hirtum T, Somers B, Verschueren E, Dieudonné B, Francart T. Delta-band neural envelope tracking predicts speech intelligibility in noise in preschoolers. Hear Res 2023; 434:108785. [PMID: 37172414 DOI: 10.1016/j.heares.2023.108785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/24/2023] [Accepted: 05/05/2023] [Indexed: 05/15/2023]
Abstract
Behavioral tests are currently the gold standard in measuring speech intelligibility. However, these tests can be difficult to administer in young children due to factors such as motivation, linguistic knowledge and cognitive skills. It has been shown that measures of neural envelope tracking can be used to predict speech intelligibility and overcome these issues. However, its potential as an objective measure for speech intelligibility in noise remains to be investigated in preschool children. Here, we evaluated neural envelope tracking as a function of signal-to-noise ratio (SNR) in 14 5-year-old children. We examined EEG responses to natural, continuous speech presented at different SNRs ranging from -8 (very difficult) to 8 dB SNR (very easy). As expected delta band (0.5-4 Hz) tracking increased with increasing stimulus SNR. However, this increase was not strictly monotonic as neural tracking reached a plateau between 0 and 4 dB SNR, similarly to the behavioral speech intelligibility outcomes. These findings indicate that neural tracking in the delta band remains stable, as long as the acoustical degradation of the speech signal does not reflect significant changes in speech intelligibility. Theta band tracking (4-8 Hz), on the other hand, was found to be drastically reduced and more easily affected by noise in children, making it less reliable as a measure of speech intelligibility. By contrast, neural envelope tracking in the delta band was directly associated with behavioral measures of speech intelligibility. This suggests that neural envelope tracking in the delta band is a valuable tool for evaluating speech-in-noise intelligibility in preschoolers, highlighting its potential as an objective measure of speech in difficult-to-test populations.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium.
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| |
Collapse
|
63
|
Pennington JR, David SV. A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex. PLoS Comput Biol 2023; 19:e1011110. [PMID: 37146065 DOI: 10.1371/journal.pcbi.1011110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 05/17/2023] [Accepted: 04/17/2023] [Indexed: 05/07/2023] Open
Abstract
Convolutional neural networks (CNNs) can provide powerful and flexible models of neural sensory processing. However, the utility of CNNs in studying the auditory system has been limited by their requirement for large datasets and the complex response properties of single auditory neurons. To address these limitations, we developed a population encoding model: a CNN that simultaneously predicts activity of several hundred neurons recorded during presentation of a large set of natural sounds. This approach defines a shared spectro-temporal space and pools statistical power across neurons. Population models of varying architecture performed consistently and substantially better than traditional linear-nonlinear models on data from primary and non-primary auditory cortex. Moreover, population models were highly generalizable. The output layer of a model pre-trained on one population of neurons could be fit to data from novel single units, achieving performance equivalent to that of neurons in the original fit data. This ability to generalize suggests that population encoding models capture a complete representational space across neurons in an auditory cortical field.
Collapse
Affiliation(s)
- Jacob R Pennington
- Washington State University, Vancouver, Washington, United States of America
| | - Stephen V David
- Oregon Hearing Research Center, Oregon Health and Science University, Oregon, United States of America
| |
Collapse
|
64
|
Takai S, Kanno A, Kawase T, Shirakura M, Suzuki J, Nakasato N, Kawashima R, Katori Y. Possibility of additive effects by the presentation of visual information related to distractor sounds on the contra-sound effects of the N100m responses. Hear Res 2023; 434:108778. [PMID: 37105052 DOI: 10.1016/j.heares.2023.108778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/13/2023] [Accepted: 04/21/2023] [Indexed: 04/29/2023]
Abstract
Auditory-evoked responses can be affected by different types of contralateral sounds or by attention modulation. The present study examined the additive effects of presenting visual information about contralateral sounds as distractions during dichotic listening tasks on the contralateral effects of N100m responses in the auditory-evoked cortex in 16 subjects (12 males and 4 females). In magnetoencephalography, a tone-burst of 500 ms duration at a frequency of 1000 Hz was played to the left ear at a level of 70 dB as a stimulus to elicit the N100m response, and a movie clip was used as a distractor stimulus under audio-only, visual-only, and audio-visual conditions. Subjects were instructed to pay attention to the left ear and press the response button each time they heard a tone-burst stimulus in their left ear. The results suggest that the presentation of visual information related to the contralateral sound, which worked as a distractor, significantly suppressed the amplitude of the N100m response compared with only the contralateral sound condition. In contrast, the presentation of visual information related to contralateral sound did not affect the latency of the N100m response. These results suggest that the integration of contralateral sounds and related movies may have resulted in a more perceptually loaded stimulus and reduced the intensity of attention to tone-bursts. Our findings suggest that selective attention and saliency mechanisms may have cross-modal effects on other modes of perception.
Collapse
Affiliation(s)
- Shunsuke Takai
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan.
| | - Akitake Kanno
- Department of Advanced Spintronics Medical Engineering, Graduate School of Engineering, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan; Department of Epileptology, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Tetsuaki Kawase
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan; Laboratory of Rehabilitative Auditory Science, Tohoku University Graduate School of Biomedical Engineering, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan; Department of Audiology, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan
| | - Masayuki Shirakura
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan
| | - Jun Suzuki
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan
| | - Nobukatsu Nakasato
- Department of Advanced Spintronics Medical Engineering, Graduate School of Engineering, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan; Department of Epileptology, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Ryuta Kawashima
- Institute of Development, Aging and Cancer, Tohoku University, 4-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Yukio Katori
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, 1-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8574, Japan
| |
Collapse
|
65
|
Park JJ, Baek SC, Suh MW, Choi J, Kim SJ, Lim Y. The effect of topic familiarity and volatility of auditory scene on selective auditory attention. Hear Res 2023; 433:108770. [PMID: 37104990 DOI: 10.1016/j.heares.2023.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/06/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023]
Abstract
Selective auditory attention has been shown to modulate the cortical representation of speech. This effect has been well documented in acoustically more challenging environments. However, the influence of top-down factors, in particular topic familiarity, on this process remains unclear, despite evidence that semantic information can promote speech-in-noise perception. Apart from individual features forming a static listening condition, dynamic and irregular changes of auditory scenes-volatile listening environments-have been less studied. To address these gaps, we explored the influence of topic familiarity and volatile listening on the selective auditory attention process during dichotic listening using electroencephalography. When stories with unfamiliar topics were presented, participants' comprehension was severely degraded. However, their cortical activity selectively tracked the speech of the target story well. This implies that topic familiarity hardly influences the speech tracking neural index, possibly when the bottom-up information is sufficient. However, when the listening environment was volatile and the listeners had to re-engage in new speech whenever auditory scenes altered, the neural correlates of the attended speech were degraded. In particular, the cortical response to the attended speech and the spatial asymmetry of the response to the left and right attention were significantly attenuated around 100-200 ms after the speech onset. These findings suggest that volatile listening environments could adversely affect the modulation effect of selective attention, possibly by hampering proper attention due to increased perceptual load.
Collapse
Affiliation(s)
- Jonghwa Jeonglok Park
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul 03080, South Korea
| | - Jongsuk Choi
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of AI Robotics, KIST School, Korea University of Science and Technology, Seoul 02792, South Korea
| | - Sung June Kim
- Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, South Korea.
| |
Collapse
|
66
|
Kaufman M, Zion Golumbic E. Listening to two speakers: Capacity and tradeoffs in neural speech tracking during Selective and Distributed Attention. Neuroimage 2023; 270:119984. [PMID: 36854352 DOI: 10.1016/j.neuroimage.2023.119984] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 02/06/2023] [Accepted: 02/24/2023] [Indexed: 02/27/2023] Open
Abstract
Speech comprehension is severely compromised when several people talk at once, due to limited perceptual and cognitive resources. In such circumstances, top-down attention mechanisms can actively prioritize processing of task-relevant speech. However, behavioral and neural evidence suggest that this selection is not exclusive, and the system may have sufficient capacity to process additional speech input as well. Here we used a data-driven approach to contrast two opposing hypotheses regarding the system's capacity to co-represent competing speech: Can the brain represent two speakers equally or is the system fundamentally limited, resulting in tradeoffs between them? Neural activity was measured using magnetoencephalography (MEG) as human participants heard concurrent speech narratives and engaged in two tasks: Selective Attention, where only one speaker was task-relevant and Distributed Attention, where both speakers were equally relevant. Analysis of neural speech-tracking revealed that both tasks engaged a similar network of brain regions involved in auditory processing, attentional control and speech processing. Interestingly, during both Selective and Distributed Attention the neural representation of competing speech showed a bias towards one speaker. This is in line with proposed 'bottlenecks' for co-representation of concurrent speech and suggests that good performance on distributed attention tasks may be achieved by toggling attention between speakers over time.
Collapse
Affiliation(s)
- Maya Kaufman
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan, Israel.
| |
Collapse
|
67
|
Kovács P, Tóth B, Honbolygó F, Szalárdy O, Kohári A, Mády K, Magyari L, Winkler I. Speech prosody supports speaker selection and auditory stream segregation in a multi-talker situation. Brain Res 2023; 1805:148246. [PMID: 36657631 DOI: 10.1016/j.brainres.2023.148246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 01/06/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023]
Abstract
To process speech in a multi-talker environment, listeners need to segregate the mixture of incoming speech streams and focus their attention on one of them. Potentially, speech prosody could aid the segregation of different speakers, the selection of the desired speech stream, and detecting targets within the attended stream. For testing these issues, we recorded behavioral responses and extracted event-related potentials and functional brain networks from electroencephalographic signals recorded while participants listened to two concurrent speech streams, performing a lexical detection and a recognition memory task in parallel. Prosody manipulation was applied to the attended speech stream in one group of participants and to the ignored speech stream in another group. Naturally recorded speech stimuli were either intact, synthetically F0-flattened, or prosodically suppressed by the speaker. Results show that prosody - especially the parsing cues mediated by speech rate - facilitates stream selection, while playing a smaller role in auditory stream segmentation and target detection.
Collapse
Affiliation(s)
- Petra Kovács
- Department of Cognitive Science, Budapest University of Technology and Economics, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary.
| | - Ferenc Honbolygó
- Brain Imaging Center, Research Center for Natural Sciences, Hungary
| | - Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary; Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Anna Kohári
- Research Group of Phonetics, Institute for General and Hungarian Linguistics, Hungarian Research Centre for Linguistics, Hungary
| | - Katalin Mády
- Research Group of Phonetics, Institute for General and Hungarian Linguistics, Hungarian Research Centre for Linguistics, Hungary
| | - Lilla Magyari
- Department of Social Studies, Faculty of Social Sciences, University of Stavanger, Stavanger, Norway; Norwegian Centre for Reading Education and Research, Faculty of Arts and Education, University of Stavanger, Stavanger, Norway
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary
| |
Collapse
|
68
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
69
|
Manting CL, Gulyas B, Ullén F, Lundqvist D. Steady-state responses to concurrent melodies: source distribution, top-down, and bottom-up attention. Cereb Cortex 2023; 33:3053-3066. [PMID: 35858223 PMCID: PMC10016039 DOI: 10.1093/cercor/bhac260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 06/03/2022] [Accepted: 06/03/2022] [Indexed: 11/13/2022] Open
Abstract
Humans can direct attentional resources to a single sound occurring simultaneously among others to extract the most behaviourally relevant information present. To investigate this cognitive phenomenon in a precise manner, we used frequency-tagging to separate neural auditory steady-state responses (ASSRs) that can be traced back to each auditory stimulus, from the neural mix elicited by multiple simultaneous sounds. Using a mixture of 2 frequency-tagged melody streams, we instructed participants to selectively attend to one stream or the other while following the development of the pitch contour. Bottom-up attention towards either stream was also manipulated with salient changes in pitch. Distributed source analyses of magnetoencephalography measurements showed that the effect of ASSR enhancement from top-down driven attention was strongest at the left frontal cortex, while that of bottom-up driven attention was dominant at the right temporal cortex. Furthermore, the degree of ASSR suppression from simultaneous stimuli varied across cortical lobes and hemisphere. The ASSR source distribution changes from temporal-dominance during single-stream perception, to proportionally more activity in the frontal and centro-parietal cortical regions when listening to simultaneous streams. These findings are a step forward to studying cognition in more complex and naturalistic soundscapes using frequency-tagging.
Collapse
Affiliation(s)
| | - Balazs Gulyas
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
- Cognitive Neuroimaging Centre (CoNiC), Lee Kong Chien School of Medicine, Nanyang Technological University, Singapore 636921, Singapore
| | - Fredrik Ullén
- Department of Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt 60322, Germany
| | - Daniel Lundqvist
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm 17177, Sweden
| |
Collapse
|
70
|
Chen YP, Schmidt F, Keitel A, Rösch S, Hauswald A, Weisz N. Speech intelligibility changes the temporal evolution of neural speech tracking. Neuroimage 2023; 268:119894. [PMID: 36693596 DOI: 10.1016/j.neuroimage.2023.119894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/13/2022] [Accepted: 01/20/2023] [Indexed: 01/22/2023] Open
Abstract
Listening to speech with poor signal quality is challenging. Neural speech tracking of degraded speech has been used to advance the understanding of how brain processes and speech intelligibility are interrelated. However, the temporal dynamics of neural speech tracking and their relation to speech intelligibility are not clear. In the present MEG study, we exploited temporal response functions (TRFs), which has been used to describe the time course of speech tracking on a gradient from intelligible to unintelligible degraded speech. In addition, we used inter-related facets of neural speech tracking (e.g., speech envelope reconstruction, speech-brain coherence, and components of broadband coherence spectra) to endorse our findings in TRFs. Our TRF analysis yielded marked temporally differential effects of vocoding: ∼50-110 ms (M50TRF), ∼175-230 ms (M200TRF), and ∼315-380 ms (M350TRF). Reduction of intelligibility went along with large increases of early peak responses M50TRF, but strongly reduced responses in M200TRF. In the late responses M350TRF, the maximum response occurred for degraded speech that was still comprehensible then declined with reduced intelligibility. Furthermore, we related the TRF components to our other neural "tracking" measures and found that M50TRF and M200TRF play a differential role in the shifting center frequency of the broadband coherence spectra. Overall, our study highlights the importance of time-resolved computation of neural speech tracking and decomposition of coherence spectra and provides a better understanding of degraded speech processing.
Collapse
Affiliation(s)
- Ya-Ping Chen
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria.
| | - Fabian Schmidt
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Anne Keitel
- Psychology, School of Social Sciences, University of Dundee, DD1 4HN Dundee, UK
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Paracelsus Medical University, 5020 Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience, University of Salzburg, 5020 Salzburg, Austria; Department of Psychology, University of Salzburg, 5020 Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, 5020 Salzburg, Austria
| |
Collapse
|
71
|
Xu N, Zhao B, Luo L, Zhang K, Shao X, Luan G, Wang Q, Hu W, Wang Q. Two stages of speech envelope tracking in human auditory cortex modulated by speech intelligibility. Cereb Cortex 2023; 33:2215-2228. [PMID: 35695785 DOI: 10.1093/cercor/bhac203] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/01/2022] [Accepted: 05/02/2022] [Indexed: 11/13/2022] Open
Abstract
The envelope is essential for speech perception. Recent studies have shown that cortical activity can track the acoustic envelope. However, whether the tracking strength reflects the extent of speech intelligibility processing remains controversial. Here, using stereo-electroencephalogram technology, we directly recorded the activity in human auditory cortex while subjects listened to either natural or noise-vocoded speech. These 2 stimuli have approximately identical envelopes, but the noise-vocoded speech does not have speech intelligibility. According to the tracking lags, we revealed 2 stages of envelope tracking: an early high-γ (60-140 Hz) power stage that preferred the noise-vocoded speech and a late θ (4-8 Hz) phase stage that preferred the natural speech. Furthermore, the decoding performance of high-γ power was better in primary auditory cortex than in nonprimary auditory cortex, consistent with its short tracking delay, while θ phase showed better decoding performance in right auditory cortex. In addition, high-γ responses with sustained temporal profiles in nonprimary auditory cortex were dominant in both envelope tracking and decoding. In sum, we suggested a functional dissociation between high-γ power and θ phase: the former reflects fast and automatic processing of brief acoustic features, while the latter correlates to slow build-up processing facilitated by speech intelligibility.
Collapse
Affiliation(s)
- Na Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China.,National Clinical Research Center for Neurological Diseases, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Baotian Zhao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Lu Luo
- School of Psychology, Beijing Sport University, No. 48 Xinxi Road, Haidian District, Beijing 100084, China
| | - Kai Zhang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Xiaoqiu Shao
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Guoming Luan
- Beijing Key Laboratory of Epilepsy, Epilepsy Center, Sanbo Brain Hospital, Capital Medical University, No. 50 Yikesong Xiangshan Road, Haidian District, Beijing 100093, China.,Beijing Institute of Brain Disorders, Collaborative Innovation Center for Brain Disorders, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing 100069, China
| | - Qian Wang
- Beijing Key Laboratory of Epilepsy, Epilepsy Center, Sanbo Brain Hospital, Capital Medical University, No. 50 Yikesong Xiangshan Road, Haidian District, Beijing 100093, China.,School of Psychological and Cognitive Sciences, Beijing Key Laboratory of Behavior and Mental Health, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China.,IDG/McGovern Institute for Brain Research, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Wenhan Hu
- Beijing Neurosurgical Institute, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China
| | - Qun Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China.,National Clinical Research Center for Neurological Diseases, No. 119 South Fourth Ring West Road, Fengtai District, Beijing 100070, China.,Beijing Institute of Brain Disorders, Collaborative Innovation Center for Brain Disorders, Capital Medical University, No.10 Xitoutiao, You An Men, Beijing 100069, China
| |
Collapse
|
72
|
Makov S, Pinto D, Har-Shai Yahav P, Miller LM, Zion Golumbic E. "Unattended, distracting or irrelevant": Theoretical implications of terminological choices in auditory selective attention research. Cognition 2023; 231:105313. [PMID: 36344304 DOI: 10.1016/j.cognition.2022.105313] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 09/30/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022]
Abstract
For seventy years, auditory selective attention research has focused on studying the cognitive mechanisms of prioritizing the processing a 'main' task-relevant stimulus, in the presence of 'other' stimuli. However, a closer look at this body of literature reveals deep empirical inconsistencies and theoretical confusion regarding the extent to which this 'other' stimulus is processed. We argue that many key debates regarding attention arise, at least in part, from inappropriate terminological choices for experimental variables that may not accurately map onto the cognitive constructs they are meant to describe. Here we critically review the more common or disruptive terminological ambiguities, differentiate between methodology-based and theory-derived terms, and unpack the theoretical assumptions underlying different terminological choices. Particularly, we offer an in-depth analysis of the terms 'unattended' and 'distractor' and demonstrate how their use can lead to conflicting theoretical inferences. We also offer a framework for thinking about terminology in a more productive and precise way, in hope of fostering more productive debates and promoting more nuanced and accurate cognitive models of selective attention.
Collapse
Affiliation(s)
- Shiri Makov
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Paz Har-Shai Yahav
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel
| | - Lee M Miller
- The Center for Mind and Brain, University of California, Davis, CA, United States of America; Department of Neurobiology, Physiology, & Behavior, University of California, Davis, CA, United States of America; Department of Otolaryngology / Head and Neck Surgery, University of California, Davis, CA, United States of America
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Israel.
| |
Collapse
|
73
|
Carta S, Mangiacotti AMA, Valdes AL, Reilly RB, Franco F, Di Liberto GM. The impact of temporal synchronisation imprecision on TRF analyses. J Neurosci Methods 2023; 385:109765. [PMID: 36481165 DOI: 10.1016/j.jneumeth.2022.109765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/17/2022] [Accepted: 12/02/2022] [Indexed: 12/12/2022]
Affiliation(s)
- Sara Carta
- ADAPT Centre, Trinity College, The University of Dublin, Ireland; School of Computer Science and Statistics, Trinity College, The University of Dublin, Ireland
| | - Anthony M A Mangiacotti
- Department of Psychology, Middlesex University, London, United Kingdom; FISPPA Department, University of Padova, Padova, Italy
| | - Alejandro Lopez Valdes
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Ireland; Global Brain Health Institute, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland; School of Engineering, Trinity College, The University of Dublin, Ireland
| | - Richard B Reilly
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland; School of Engineering, Trinity College, The University of Dublin, Ireland; School of Medicine, Trinity College, The University of Dublin, Ireland
| | - Fabia Franco
- Department of Psychology, Middlesex University, London, United Kingdom
| | - Giovanni M Di Liberto
- ADAPT Centre, Trinity College, The University of Dublin, Ireland; School of Computer Science and Statistics, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland.
| |
Collapse
|
74
|
Dolhopiatenko H, Nogueira W. Selective attention decoding in bimodal cochlear implant users. Front Neurosci 2023; 16:1057605. [PMID: 36711138 PMCID: PMC9874229 DOI: 10.3389/fnins.2022.1057605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/20/2022] [Indexed: 01/12/2023] Open
Abstract
The growing group of cochlear implant (CI) users includes subjects with preserved acoustic hearing on the opposite side to the CI. The use of both listening sides results in improved speech perception in comparison to listening with one side alone. However, large variability in the measured benefit is observed. It is possible that this variability is associated with the integration of speech across electric and acoustic stimulation modalities. However, there is a lack of established methods to assess speech integration between electric and acoustic stimulation and consequently to adequately program the devices. Moreover, existing methods do not provide information about the underlying physiological mechanisms of this integration or are based on simple stimuli that are difficult to relate to speech integration. Electroencephalography (EEG) to continuous speech is promising as an objective measure of speech perception, however, its application in CIs is challenging because it is influenced by the electrical artifact introduced by these devices. For this reason, the main goal of this work is to investigate a possible electrophysiological measure of speech integration between electric and acoustic stimulation in bimodal CI users. For this purpose, a selective attention decoding paradigm has been designed and validated in bimodal CI users. The current study included behavioral and electrophysiological measures. The behavioral measure consisted of a speech understanding test, where subjects repeated words to a target speaker in the presence of a competing voice listening with the CI side (CIS) only, with the acoustic side (AS) only or with both listening sides (CIS+AS). Electrophysiological measures included cortical auditory evoked potentials (CAEPs) and selective attention decoding through EEG. CAEPs were recorded to broadband stimuli to confirm the feasibility to record cortical responses with CIS only, AS only, and CIS+AS listening modes. In the selective attention decoding paradigm a co-located target and a competing speech stream were presented to the subjects using the three listening modes (CIS only, AS only, and CIS+AS). The main hypothesis of the current study is that selective attention can be decoded in CI users despite the presence of CI electrical artifact. If selective attention decoding improves combining electric and acoustic stimulation with respect to electric stimulation alone, the hypothesis can be confirmed. No significant difference in behavioral speech understanding performance when listening with CIS+AS and AS only was found, mainly due to the ceiling effect observed with these two listening modes. The main finding of the current study is the possibility to decode selective attention in CI users even if continuous artifact is present. Moreover, an amplitude reduction of the forward transfer response function (TRF) of selective attention decoding was observed when listening with CIS+AS compared to AS only. Further studies to validate selective attention decoding as an electrophysiological measure of electric acoustic speech integration are required.
Collapse
|
75
|
Kulasingham JP, Simon JZ. Algorithms for Estimating Time-Locked Neural Response Components in Cortical Processing of Continuous Speech. IEEE Trans Biomed Eng 2023; 70:88-96. [PMID: 35727788 PMCID: PMC9946293 DOI: 10.1109/tbme.2022.3185005] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE The Temporal Response Function (TRF) is a linear model of neural activity time-locked to continuous stimuli, including continuous speech. TRFs based on speech envelopes typically have distinct components that have provided remarkable insights into the cortical processing of speech. However, current methods may lead to less than reliable estimates of single-subject TRF components. Here, we compare two established methods, in TRF component estimation, and also propose novel algorithms that utilize prior knowledge of these components, bypassing the full TRF estimation. METHODS We compared two established algorithms, ridge and boosting, and two novel algorithms based on Subspace Pursuit (SP) and Expectation Maximization (EM), which directly estimate TRF components given plausible assumptions regarding component characteristics. Single-channel, multi-channel, and source-localized TRFs were fit on simulations and real magnetoencephalographic data. Performance metrics included model fit and component estimation accuracy. RESULTS Boosting and ridge have comparable performance in component estimation. The novel algorithms outperformed the others in simulations, but not on real data, possibly due to the plausible assumptions not actually being met. Ridge had slightly better model fits on real data compared to boosting, but also more spurious TRF activity. CONCLUSION Results indicate that both smooth (ridge) and sparse (boosting) algorithms perform comparably at TRF component estimation. The SP and EM algorithms may be accurate, but rely on assumptions of component characteristics. SIGNIFICANCE This systematic comparison establishes the suitability of widely used and novel algorithms for estimating robust TRF components, which is essential for improved subject-specific investigations into the cortical processing of speech.
Collapse
|
76
|
Luo C, Gao Y, Fan J, Liu Y, Yu Y, Zhang X. Compromised word-level neural tracking in the high-gamma band for children with attention deficit hyperactivity disorder. Front Hum Neurosci 2023; 17:1174720. [PMID: 37213926 PMCID: PMC10196181 DOI: 10.3389/fnhum.2023.1174720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 04/18/2023] [Indexed: 05/23/2023] Open
Abstract
Children with attention deficit hyperactivity disorder (ADHD) exhibit pervasive difficulties in speech perception. Given that speech processing involves both acoustic and linguistic stages, it remains unclear which stage of speech processing is impaired in children with ADHD. To investigate this issue, we measured neural tracking of speech at syllable and word levels using electroencephalography (EEG), and evaluated the relationship between neural responses and ADHD symptoms in 6-8 years old children. Twenty-three children participated in the current study, and their ADHD symptoms were assessed with SNAP-IV questionnaires. In the experiment, the children listened to hierarchical speech sequences in which syllables and words were, respectively, repeated at 2.5 and 1.25 Hz. Using frequency domain analyses, reliable neural tracking of syllables and words was observed in both the low-frequency band (<4 Hz) and the high-gamma band (70-160 Hz). However, the neural tracking of words in the high-gamma band showed an anti-correlation with the ADHD symptom scores of the children. These results indicate that ADHD prominently impairs cortical encoding of linguistic information (e.g., words) in speech perception.
Collapse
Affiliation(s)
- Cheng Luo
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou, China
- Cheng Luo,
| | - Yayue Gao
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
- *Correspondence: Yayue Gao,
| | - Jianing Fan
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
| | - Yang Liu
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
| | - Yonglin Yu
- Department of Rehabilitation, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
- Yonglin Yu,
| | - Xin Zhang
- Department of Neurology, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
- Xin Zhang,
| |
Collapse
|
77
|
Simon JZ, Commuri V, Kulasingham JP. Time-locked auditory cortical responses in the high-gamma band: A window into primary auditory cortex. Front Neurosci 2022; 16:1075369. [PMID: 36570848 PMCID: PMC9773383 DOI: 10.3389/fnins.2022.1075369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022] Open
Abstract
Primary auditory cortex is a critical stage in the human auditory pathway, a gateway between subcortical and higher-level cortical areas. Receiving the output of all subcortical processing, it sends its output on to higher-level cortex. Non-invasive physiological recordings of primary auditory cortex using electroencephalography (EEG) and magnetoencephalography (MEG), however, may not have sufficient specificity to separate responses generated in primary auditory cortex from those generated in underlying subcortical areas or neighboring cortical areas. This limitation is important for investigations of effects of top-down processing (e.g., selective-attention-based) on primary auditory cortex: higher-level areas are known to be strongly influenced by top-down processes, but subcortical areas are often assumed to perform strictly bottom-up processing. Fortunately, recent advances have made it easier to isolate the neural activity of primary auditory cortex from other areas. In this perspective, we focus on time-locked responses to stimulus features in the high gamma band (70-150 Hz) and with early cortical latency (∼40 ms), intermediate between subcortical and higher-level areas. We review recent findings from physiological studies employing either repeated simple sounds or continuous speech, obtaining either a frequency following response (FFR) or temporal response function (TRF). The potential roles of top-down processing are underscored, and comparisons with invasive intracranial EEG (iEEG) and animal model recordings are made. We argue that MEG studies employing continuous speech stimuli may offer particular benefits, in that only a few minutes of speech generates robust high gamma responses from bilateral primary auditory cortex, and without measurable interference from subcortical or higher-level areas.
Collapse
Affiliation(s)
- Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States,Department of Biology, University of Maryland, College Park, College Park, MD, United States,Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States,*Correspondence: Jonathan Z. Simon,
| | - Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
| | | |
Collapse
|
78
|
Understanding why infant-directed speech supports learning: A dynamic attention perspective. DEVELOPMENTAL REVIEW 2022. [DOI: 10.1016/j.dr.2022.101047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
79
|
Liu Y, Luo C, Zheng J, Liang J, Ding N. Working memory asymmetrically modulates auditory and linguistic processing of speech. Neuroimage 2022; 264:119698. [PMID: 36270622 DOI: 10.1016/j.neuroimage.2022.119698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/11/2022] [Accepted: 10/17/2022] [Indexed: 11/09/2022] Open
Abstract
Working memory load can modulate speech perception. However, since speech perception and working memory are both complex functions, it remains elusive how each component of the working memory system interacts with each speech processing stage. To investigate this issue, we concurrently measure how the working memory load modulates neural activity tracking three levels of linguistic units, i.e., syllables, phrases, and sentences, using a multiscale frequency-tagging approach. Participants engage in a sentence comprehension task and the working memory load is manipulated by asking them to memorize either auditory verbal sequences or visual patterns. It is found that verbal and visual working memory load modulate speech processing in similar manners: Higher working memory load attenuates neural activity tracking of phrases and sentences but enhances neural activity tracking of syllables. Since verbal and visual WM load similarly influence the neural responses to speech, such influences may derive from the domain-general component of WM system. More importantly, working memory load asymmetrically modulates lower-level auditory encoding and higher-level linguistic processing of speech, possibly reflecting reallocation of attention induced by mnemonic load.
Collapse
Affiliation(s)
- Yiguang Liu
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China
| | - Cheng Luo
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China
| | - Jing Zheng
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Junying Liang
- Department of Linguistics, School of International Studies, Zhejiang University, Hangzhou 310058, China
| | - Nai Ding
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China; Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou 310012, China.
| |
Collapse
|
80
|
Neurodevelopmental oscillatory basis of speech processing in noise. Dev Cogn Neurosci 2022; 59:101181. [PMID: 36549148 PMCID: PMC9792357 DOI: 10.1016/j.dcn.2022.101181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/31/2022] [Accepted: 11/25/2022] [Indexed: 11/27/2022] Open
Abstract
Humans' extraordinary ability to understand speech in noise relies on multiple processes that develop with age. Using magnetoencephalography (MEG), we characterize the underlying neuromaturational basis by quantifying how cortical oscillations in 144 participants (aged 5-27 years) track phrasal and syllabic structures in connected speech mixed with different types of noise. While the extraction of prosodic cues from clear speech was stable during development, its maintenance in a multi-talker background matured rapidly up to age 9 and was associated with speech comprehension. Furthermore, while the extraction of subtler information provided by syllables matured at age 9, its maintenance in noisy backgrounds progressively matured until adulthood. Altogether, these results highlight distinct behaviorally relevant maturational trajectories for the neuronal signatures of speech perception. In accordance with grain-size proposals, neuromaturational milestones are reached increasingly late for linguistic units of decreasing size, with further delays incurred by noise.
Collapse
|
81
|
Xiu B, Paul BT, Chen JM, Le TN, Lin VY, Dimitrijevic A. Neural responses to naturalistic audiovisual speech are related to listening demand in cochlear implant users. Front Hum Neurosci 2022; 16:1043499. [DOI: 10.3389/fnhum.2022.1043499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
There is a weak relationship between clinical and self-reported speech perception outcomes in cochlear implant (CI) listeners. Such poor correspondence may be due to differences in clinical and “real-world” listening environments and stimuli. Speech in the real world is often accompanied by visual cues, background environmental noise, and is generally in a conversational context, all factors that could affect listening demand. Thus, our objectives were to determine if brain responses to naturalistic speech could index speech perception and listening demand in CI users. Accordingly, we recorded high-density electroencephalogram (EEG) while CI users listened/watched a naturalistic stimulus (i.e., the television show, “The Office”). We used continuous EEG to quantify “speech neural tracking” (i.e., TRFs, temporal response functions) to the show’s soundtrack and 8–12 Hz (alpha) brain rhythms commonly related to listening effort. Background noise at three different signal-to-noise ratios (SNRs), +5, +10, and +15 dB were presented to vary the difficulty of following the television show, mimicking a natural noisy environment. The task also included an audio-only (no video) condition. After each condition, participants subjectively rated listening demand and the degree of words and conversations they felt they understood. Fifteen CI users reported progressively higher degrees of listening demand and less words and conversation with increasing background noise. Listening demand and conversation understanding in the audio-only condition was comparable to that of the highest noise condition (+5 dB). Increasing background noise affected speech neural tracking at a group level, in addition to eliciting strong individual differences. Mixed effect modeling showed that listening demand and conversation understanding were correlated to early cortical speech tracking, such that high demand and low conversation understanding occurred with lower amplitude TRFs. In the high noise condition, greater listening demand was negatively correlated to parietal alpha power, where higher demand was related to lower alpha power. No significant correlations were observed between TRF/alpha and clinical speech perception scores. These results are similar to previous findings showing little relationship between clinical speech perception and quality-of-life in CI users. However, physiological responses to complex natural speech may provide an objective measure of aspects of quality-of-life measures like self-perceived listening demand.
Collapse
|
82
|
Pinto D, Kaufman M, Brown A, Zion Golumbic E. An ecological investigation of the capacity to follow simultaneous speech and preferential detection of ones’ own name. Cereb Cortex 2022; 33:5361-5374. [PMID: 36331339 DOI: 10.1093/cercor/bhac424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/11/2022] [Accepted: 09/12/2022] [Indexed: 11/06/2022] Open
Abstract
Abstract
Many situations require focusing attention on one speaker, while monitoring the environment for potentially important information. Some have proposed that dividing attention among 2 speakers involves behavioral trade-offs, due to limited cognitive resources. However the severity of these trade-offs, particularly under ecologically-valid circumstances, is not well understood. We investigated the capacity to process simultaneous speech using a dual-task paradigm simulating task-demands and stimuli encountered in real-life. Participants listened to conversational narratives (Narrative Stream) and monitored a stream of announcements (Barista Stream), to detect when their order was called. We measured participants’ performance, neural activity, and skin conductance as they engaged in this dual-task. Participants achieved extremely high dual-task accuracy, with no apparent behavioral trade-offs. Moreover, robust neural and physiological responses were observed for target-stimuli in the Barista Stream, alongside significant neural speech-tracking of the Narrative Stream. These results suggest that humans have substantial capacity to process simultaneous speech and do not suffer from insufficient processing resources, at least for this highly ecological task-combination and level of perceptual load. Results also confirmed the ecological validity of the advantage for detecting ones’ own name at the behavioral, neural, and physiological level, highlighting the contribution of personal relevance when processing simultaneous speech.
Collapse
Affiliation(s)
- Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Maya Kaufman
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Adi Brown
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| |
Collapse
|
83
|
Pei C, Qiu Y, Li F, Huang X, Si Y, Li Y, Zhang X, Chen C, Liu Q, Cao Z, Ding N, Gao S, Alho K, Yao D, Xu P. The different brain areas occupied for integrating information of hierarchical linguistic units: a study based on EEG and TMS. Cereb Cortex 2022; 33:4740-4751. [PMID: 36178127 DOI: 10.1093/cercor/bhac376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/13/2022] Open
Abstract
Human language units are hierarchical, and reading acquisition involves integrating multisensory information (typically from auditory and visual modalities) to access meaning. However, it is unclear how the brain processes and integrates language information at different linguistic units (words, phrases, and sentences) provided simultaneously in auditory and visual modalities. To address the issue, we presented participants with sequences of short Chinese sentences through auditory, visual, or combined audio-visual modalities while electroencephalographic responses were recorded. With a frequency tagging approach, we analyzed the neural representations of basic linguistic units (i.e. characters/monosyllabic words) and higher-level linguistic structures (i.e. phrases and sentences) across the 3 modalities separately. We found that audio-visual integration occurs in all linguistic units, and the brain areas involved in the integration varied across different linguistic levels. In particular, the integration of sentences activated the local left prefrontal area. Therefore, we used continuous theta-burst stimulation to verify that the left prefrontal cortex plays a vital role in the audio-visual integration of sentence information. Our findings suggest the advantage of bimodal language comprehension at hierarchical stages in language-related information processing and provide evidence for the causal role of the left prefrontal regions in processing information of audio-visual sentences.
Collapse
Affiliation(s)
- Changfu Pei
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Yuan Qiu
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Fali Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China.,Research Unit of Neuroscience, Chinese Academy of Medical Science, 2019RU035, Chengdu, China
| | - Xunan Huang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
| | - Yajing Si
- School of Psychology, Xinxiang Medical University, Xinxiang, 453003, China
| | - Yuqin Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Xiabing Zhang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Chunli Chen
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Qiang Liu
- Institute of Brain and Psychological Sciences, Sichuan Normal University, Chengdu, Sichuan, 610066, China
| | - Zehong Cao
- STEM, Mawson Lakes Campus, University of South Australia, Adelaide, SA 5095, Australia
| | - Nai Ding
- College of Biomedical Engineering and Instrument Sciences, Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, 310007, China
| | - Shan Gao
- School of Foreign Languages, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
| | - Kimmo Alho
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, FI 00014, Finland
| | - Dezhong Yao
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China.,Research Unit of Neuroscience, Chinese Academy of Medical Science, 2019RU035, Chengdu, China
| | - Peng Xu
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu, 611731, China.,Research Unit of Neuroscience, Chinese Academy of Medical Science, 2019RU035, Chengdu, China.,Radiation Oncology Key Laboratory of Sichuan Province, Chengdu, 610041, China
| |
Collapse
|
84
|
Brown JA, Bidelman GM. Familiarity of Background Music Modulates the Cortical Tracking of Target Speech at the "Cocktail Party". Brain Sci 2022; 12:brainsci12101320. [PMID: 36291252 PMCID: PMC9599198 DOI: 10.3390/brainsci12101320] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 09/23/2022] [Accepted: 09/27/2022] [Indexed: 11/23/2022] Open
Abstract
The "cocktail party" problem-how a listener perceives speech in noisy environments-is typically studied using speech (multi-talker babble) or noise maskers. However, realistic cocktail party scenarios often include background music (e.g., coffee shops, concerts). Studies investigating music's effects on concurrent speech perception have predominantly used highly controlled synthetic music or shaped noise, which do not reflect naturalistic listening environments. Behaviorally, familiar background music and songs with vocals/lyrics inhibit concurrent speech recognition. Here, we investigated the neural bases of these effects. While recording multichannel EEG, participants listened to an audiobook while popular songs (or silence) played in the background at a 0 dB signal-to-noise ratio. Songs were either familiar or unfamiliar to listeners and featured either vocals or isolated instrumentals from the original audio recordings. Comprehension questions probed task engagement. We used temporal response functions (TRFs) to isolate cortical tracking to the target speech envelope and analyzed neural responses around 100 ms (i.e., auditory N1 wave). We found that speech comprehension was, expectedly, impaired during background music compared to silence. Target speech tracking was further hindered by the presence of vocals. When masked by familiar music, response latencies to speech were less susceptible to informational masking, suggesting concurrent neural tracking of speech was easier during music known to the listener. These differential effects of music familiarity were further exacerbated in listeners with less musical ability. Our neuroimaging results and their dependence on listening skills are consistent with early attentional-gain mechanisms where familiar music is easier to tune out (listeners already know the song's expectancies) and thus can allocate fewer attentional resources to the background music to better monitor concurrent speech material.
Collapse
Affiliation(s)
- Jane A. Brown
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN 38152, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA
| | - Gavin M. Bidelman
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN 38152, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN 38152, USA
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN 47408, USA
- Program in Neuroscience, Indiana University, Bloomington, IN 47405, USA
- Correspondence:
| |
Collapse
|
85
|
Verschueren E, Gillis M, Decruy L, Vanthornhout J, Francart T. Speech Understanding Oppositely Affects Acoustic and Linguistic Neural Tracking in a Speech Rate Manipulation Paradigm. J Neurosci 2022; 42:7442-7453. [PMID: 36041851 PMCID: PMC9525161 DOI: 10.1523/jneurosci.0259-22.2022] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 06/29/2022] [Accepted: 07/17/2022] [Indexed: 11/21/2022] Open
Abstract
When listening to continuous speech, the human brain can track features of the presented speech signal. It has been shown that neural tracking of acoustic features is a prerequisite for speech understanding and can predict speech understanding in controlled circumstances. However, the brain also tracks linguistic features of speech, which may be more directly related to speech understanding. We investigated acoustic and linguistic speech processing as a function of varying speech understanding by manipulating the speech rate. In this paradigm, acoustic and linguistic speech processing is affected simultaneously but in opposite directions: When the speech rate increases, more acoustic information per second is present. In contrast, the tracking of linguistic information becomes more challenging when speech is less intelligible at higher speech rates. We measured the EEG of 18 participants (4 male) who listened to speech at various speech rates. As expected and confirmed by the behavioral results, speech understanding decreased with increasing speech rate. Accordingly, linguistic neural tracking decreased with increasing speech rate, but acoustic neural tracking increased. This indicates that neural tracking of linguistic representations can capture the gradual effect of decreasing speech understanding. In addition, increased acoustic neural tracking does not necessarily imply better speech understanding. This suggests that, although more challenging to measure because of the low signal-to-noise ratio, linguistic neural tracking may be a more direct predictor of speech understanding.SIGNIFICANCE STATEMENT An increasingly popular method to investigate neural speech processing is to measure neural tracking. Although much research has been done on how the brain tracks acoustic speech features, linguistic speech features have received less attention. In this study, we disentangled acoustic and linguistic characteristics of neural speech tracking via manipulating the speech rate. A proper way of objectively measuring auditory and language processing paves the way toward clinical applications: An objective measure of speech understanding would allow for behavioral-free evaluation of speech understanding, which allows to evaluate hearing loss and adjust hearing aids based on brain responses. This objective measure would benefit populations from whom obtaining behavioral measures may be complex, such as young children or people with cognitive impairments.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Marlies Gillis
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742
| | - Jonas Vanthornhout
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Tom Francart
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| |
Collapse
|
86
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
87
|
Zhou LF, Zhao D, Cui X, Guo B, Zhu F, Feng C, Wang J, Meng M. Separate neural subsystems support goal-directed speech listening. Neuroimage 2022; 263:119613. [PMID: 36075539 DOI: 10.1016/j.neuroimage.2022.119613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 08/30/2022] [Accepted: 09/04/2022] [Indexed: 10/31/2022] Open
Abstract
How do humans excel at tracking the narrative of a particular speaker with a distracting noisy background? This feat places great demands on the collaboration between speech processing and goal-related regulatory functions. Here, we propose that separate subsystems with different cross-task dynamic activity properties and distinct functional purposes support goal-directed speech listening. We adopted a naturalistic dichotic speech listening paradigm in which listeners were instructed to attend to only one narrative from two competing inputs. Using functional magnetic resonance imaging with inter- and intra-subject correlation techniques, we discovered a dissociation in response consistency in temporal, parietal and frontal brain areas as the task demand varied. Specifically, some areas in the bilateral temporal cortex (SomMotB_Aud and TempPar) and lateral prefrontal cortex (DefaultB_PFCl and ContA_PFCl) always showed consistent activation across subjects and across scan runs, regardless of the task demand. In contrast, some areas in the parietal cortex (DefaultA_pCunPCC and ContC_pCun) responded reliably only when the task goal remained the same. These results suggested two dissociated functional neural networks that were independently validated by performing a data-driven clustering analysis of voxelwise functional connectivity patterns. A subsequent meta-analysis revealed distinct functional profiles for these two brain correlation maps. The different-task correlation map was strongly associated with language-related processes (e.g., listening, speech and sentences), whereas the same-task versus different-task correlation map was linked to self-referencing functions (e.g., default mode, theory of mind and autobiographical topics). Altogether, the three-pronged findings revealed two anatomically and functionally dissociated subsystems supporting goal-directed speech listening.
Collapse
Affiliation(s)
- Liu-Fang Zhou
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China; Key Lab of BI-AI Collaborated Information Behavior, School of Business and Management, Shanghai International Studies University, Shanghai, 201620, China
| | - Dan Zhao
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China; Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China
| | - Xuan Cui
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China
| | - Bingbing Guo
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China; School of Teacher Education, Nanjing Xiaozhuang University, Nanjing, 211171, China
| | - Fangwei Zhu
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China; Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China
| | - Chunliang Feng
- Guangdong Key Laboratory of Mental Health and Cognitive Science, School of Psychology, South China Normal University, Guangzhou, 510631, China
| | - Jinhui Wang
- Key Laboratory of Brain, Cognition and Education Sciences (South China Normal University), Ministry of Education, China; Center for Studies of Psychological Application, Guangdong Key Laboratory of Mental Health and Cognitive Science, Institute for Brain Research and Rehabilitation, South China Normal University, Guangzhou, 510631, China.
| | - Ming Meng
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, China.
| |
Collapse
|
88
|
Xu Z, Bai Y, Zhao R, Zheng Q, Ni G, Ming D. Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction. Hear Res 2022; 422:108552. [PMID: 35714555 DOI: 10.1016/j.heares.2022.108552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 06/01/2022] [Accepted: 06/08/2022] [Indexed: 11/23/2022]
Abstract
In the cocktail party circumstance, the human auditory system extracts the information from a specific speaker of interest and ignores others. Many studies have focused on auditory attention decoding (AAD), but the stimulation materials were mainly non-tonal languages. We used a tonal language (Mandarin) as the speech stimulus and constructed a Long Short-Term Memory (LSTM) architecture for speech envelope reconstruction based on electroencephalogram (EEG) data. The correlation coefficient between the reconstructed and candidate envelopes was calculated to determine the subject's auditory attention. The proposed LSTM architecture outperformed the linear models. The average decoding accuracy in cross-subject and inter-subject cases varies from 63.02 to 74.29%, with the highest accuracy rate of 89.1% in a decision window of 0.15 s. In addition, the beta-band rhythm was found to play an essential role in identifying the attention and the non-attention state. These results provide a new AAD architecture to help develop neuro-steered hearing devices, especially for tonal languages.
Collapse
Affiliation(s)
- Zihao Xu
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Yanru Bai
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Ran Zhao
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China
| | - Qi Zheng
- Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Guangjian Ni
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China; Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China.
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072, China; Department of Biomedical Engineering, College of Precision Instruments and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
89
|
Encoding speech rate in challenging listening conditions: White noise and reverberation. Atten Percept Psychophys 2022; 84:2303-2318. [PMID: 35996057 PMCID: PMC9481500 DOI: 10.3758/s13414-022-02554-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 11/08/2022]
Abstract
Temporal contrasts in speech are perceived relative to the speech rate of the surrounding context. That is, following a fast context sentence, listeners interpret a given target sound as longer than following a slow context, and vice versa. This rate effect, often referred to as "rate-dependent speech perception," has been suggested to be the result of a robust, low-level perceptual process, typically examined in quiet laboratory settings. However, speech perception often occurs in more challenging listening conditions. Therefore, we asked whether rate-dependent perception would be (partially) compromised by signal degradation relative to a clear listening condition. Specifically, we tested effects of white noise and reverberation, with the latter specifically distorting temporal information. We hypothesized that signal degradation would reduce the precision of encoding the speech rate in the context and thereby reduce the rate effect relative to a clear context. This prediction was borne out for both types of degradation in Experiment 1, where the context sentences but not the subsequent target words were degraded. However, in Experiment 2, which compared rate effects when contexts and targets were coherent in terms of signal quality, no reduction of the rate effect was found. This suggests that, when confronted with coherently degraded signals, listeners adapt to challenging listening situations, eliminating the difference between rate-dependent perception in clear and degraded conditions. Overall, the present study contributes towards understanding the consequences of different types of listening environments on the functioning of low-level perceptual processes that listeners use during speech perception.
Collapse
|
90
|
Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception. Curr Biol 2022; 32:3971-3986.e4. [PMID: 35973430 DOI: 10.1016/j.cub.2022.07.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/08/2022] [Accepted: 07/19/2022] [Indexed: 11/20/2022]
Abstract
How the human auditory cortex represents spatially separated simultaneous talkers and how talkers' locations and voices modulate the neural representations of attended and unattended speech are unclear. Here, we measured the neural responses from electrodes implanted in neurosurgical patients as they performed single-talker and multi-talker speech perception tasks. We found that spatial separation between talkers caused a preferential encoding of the contralateral speech in Heschl's gyrus (HG), planum temporale (PT), and superior temporal gyrus (STG). Location and spectrotemporal features were encoded in different aspects of the neural response. Specifically, the talker's location changed the mean response level, whereas the talker's spectrotemporal features altered the variation of response around response's baseline. These components were differentially modulated by the attended talker's voice or location, which improved the population decoding of attended speech features. Attentional modulation due to the talker's voice only appeared in the auditory areas with longer latencies, but attentional modulation due to location was present throughout. Our results show that spatial multi-talker speech perception relies upon a separable pre-attentive neural representation, which could be further tuned by top-down attention to the location and voice of the talker.
Collapse
|
91
|
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
|
92
|
Raghavendra S, Lee S, Chun H, Martin BA, Tan CT. Cortical entrainment to speech produced by cochlear implant talkers and normal-hearing talkers. Front Neurosci 2022; 16:927872. [PMID: 36017176 PMCID: PMC9396306 DOI: 10.3389/fnins.2022.927872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/01/2022] [Indexed: 11/13/2022] Open
Abstract
Cochlear implants (CIs) are commonly used to restore the ability to hear in those with severe or profound hearing loss. CIs provide the necessary auditory feedback for them to monitor and control speech production. However, the speech produced by CI users may not be fully restored to achieve similar perceived sound quality to that produced by normal-hearing talkers and this difference is easily noticeable in their daily conversation. In this study, we attempt to address this difference as perceived by normal-hearing listeners, when listening to continuous speech produced by CI talkers and normal-hearing talkers. We used a regenerative model to decode and reconstruct the speech envelope from the single-trial electroencephalogram (EEG) recorded on the scalp of the normal-hearing listeners. Bootstrap Spearman correlation between the actual speech envelope and the envelope reconstructed from the EEG was computed as a metric to quantify the difference in response to the speech produced by the two talker groups. The same listeners were asked to rate the perceived sound quality of the speech produced by the two talker groups as a behavioral sound quality assessment. The results show that both the perceived sound quality ratings and the computed metric, which can be seen as the degree of cortical entrainment to the actual speech envelope across the normal-hearing listeners, were higher in value for speech produced by normal hearing talkers than that for CI talkers. The first purpose of the study was to determine how well the envelope of speech is represented neurophysiologically via its similarity to the envelope reconstructed from EEG. The second purpose was to show how well this representation of speech for both CI and normal hearing talker groups differentiates in term of perceived sound quality.
Collapse
Affiliation(s)
- Shruthi Raghavendra
- Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX, United States
- *Correspondence: Shruthi Raghavendra,
| | - Sungmin Lee
- Department of Speech-Language Pathology and Audiology, Tongmyong University, Busan, South Korea
| | - Hyungi Chun
- Graduate Center, City University of New York, New York City, NY, United States
| | - Brett A. Martin
- Graduate Center, City University of New York, New York City, NY, United States
| | - Chin-Tuan Tan
- Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX, United States
| |
Collapse
|
93
|
Heilbron M, Armeni K, Schoffelen JM, Hagoort P, de Lange FP. A hierarchy of linguistic predictions during natural language comprehension. Proc Natl Acad Sci U S A 2022; 119:e2201968119. [PMID: 35921434 DOI: 10.1101/2020.12.03.410399] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
Affiliation(s)
- Micha Heilbron
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Kristijan Armeni
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| | | | - Peter Hagoort
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| |
Collapse
|
94
|
Hervé E, Mento G, Desnous B, François C. Challenges and new perspectives of developmental cognitive EEG studies. Neuroimage 2022; 260:119508. [PMID: 35882267 DOI: 10.1016/j.neuroimage.2022.119508] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 07/07/2022] [Accepted: 07/22/2022] [Indexed: 10/16/2022] Open
Abstract
Despite shared procedures with adults, electroencephalography (EEG) in early development presents many specificities that need to be considered for good quality data collection. In this paper, we provide an overview of the most representative early cognitive developmental EEG studies focusing on the specificities of this neuroimaging technique in young participants, such as attrition and artifacts. We also summarize the most representative results in developmental EEG research obtained in the time and time-frequency domains and use more advanced signal processing methods. Finally, we briefly introduce three recent standardized pipelines that will help promote replicability and comparability across experiments and ages. While this paper does not claim to be exhaustive, it aims to give a sufficiently large overview of the challenges and solutions available to conduct robust cognitive developmental EEG studies.
Collapse
Affiliation(s)
- Estelle Hervé
- CNRS, LPL, Aix-Marseille University, 5 Avenue Pasteur, Aix-en-Provence 13100, France
| | - Giovanni Mento
- Department of General Psychology, University of Padova, Padova 35131, Italy; Padua Neuroscience Center (PNC), University of Padova, Padova 35131, Italy
| | - Béatrice Desnous
- APHM, Reference Center for Rare Epilepsies, Timone Children Hospital, Aix-Marseille University, Marseille 13005, France; Inserm, INS, Aix-Marseille University, Marseille 13005, France
| | - Clément François
- CNRS, LPL, Aix-Marseille University, 5 Avenue Pasteur, Aix-en-Provence 13100, France.
| |
Collapse
|
95
|
Bai F, Meyer AS, Martin AE. Neural dynamics differentially encode phrases and sentences during spoken language comprehension. PLoS Biol 2022; 20:e3001713. [PMID: 35834569 PMCID: PMC9282610 DOI: 10.1371/journal.pbio.3001713] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/14/2022] [Indexed: 11/19/2022] Open
Abstract
Human language stands out in the natural world as a biological signal that uses a structured system to combine the meanings of small linguistic units (e.g., words) into larger constituents (e.g., phrases and sentences). However, the physical dynamics of speech (or sign) do not stand in a one-to-one relationship with the meanings listeners perceive. Instead, listeners infer meaning based on their knowledge of the language. The neural readouts of the perceptual and cognitive processes underlying these inferences are still poorly understood. In the present study, we used scalp electroencephalography (EEG) to compare the neural response to phrases (e.g., the red vase) and sentences (e.g., the vase is red), which were close in semantic meaning and had been synthesized to be physically indistinguishable. Differences in structure were well captured in the reorganization of neural phase responses in delta (approximately <2 Hz) and theta bands (approximately 2 to 7 Hz),and in power and power connectivity changes in the alpha band (approximately 7.5 to 13.5 Hz). Consistent with predictions from a computational model, sentences showed more power, more power connectivity, and more phase synchronization than phrases did. Theta–gamma phase–amplitude coupling occurred, but did not differ between the syntactic structures. Spectral–temporal response function (STRF) modeling revealed different encoding states for phrases and sentences, over and above the acoustically driven neural response. Our findings provide a comprehensive description of how the brain encodes and separates linguistic structures in the dynamics of neural responses. They imply that phase synchronization and strength of connectivity are readouts for the constituent structure of language. The results provide a novel basis for future neurophysiological research on linguistic structure representation in the brain, and, together with our simulations, support time-based binding as a mechanism of structure encoding in neural dynamics.
Collapse
Affiliation(s)
- Fan Bai
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Antje S. Meyer
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Andrea E. Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
- * E-mail:
| |
Collapse
|
96
|
Hauswald A, Keitel A, Chen Y, Rösch S, Weisz N. Degradation levels of continuous speech affect neural speech tracking and alpha power differently. Eur J Neurosci 2022; 55:3288-3302. [PMID: 32687616 PMCID: PMC9540197 DOI: 10.1111/ejn.14912] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 07/12/2020] [Accepted: 07/13/2020] [Indexed: 11/26/2022]
Abstract
Making sense of a poor auditory signal can pose a challenge. Previous attempts to quantify speech intelligibility in neural terms have usually focused on one of two measures, namely low-frequency speech-brain synchronization or alpha power modulations. However, reports have been mixed concerning the modulation of these measures, an issue aggravated by the fact that they have normally been studied separately. We present two MEG studies analyzing both measures. In study 1, participants listened to unimodal auditory speech with three different levels of degradation (original, 7-channel and 3-channel vocoding). Intelligibility declined with declining clarity, but speech was still intelligible to some extent even for the lowest clarity level (3-channel vocoding). Low-frequency (1-7 Hz) speech tracking suggested a U-shaped relationship with strongest effects for the medium-degraded speech (7-channel) in bilateral auditory and left frontal regions. To follow up on this finding, we implemented three additional vocoding levels (5-channel, 2-channel and 1-channel) in a second MEG study. Using this wider range of degradation, the speech-brain synchronization showed a similar pattern as in study 1, but further showed that when speech becomes unintelligible, synchronization declines again. The relationship differed for alpha power, which continued to decrease across vocoding levels reaching a floor effect for 5-channel vocoding. Predicting subjective intelligibility based on models either combining both measures or each measure alone showed superiority of the combined model. Our findings underline that speech tracking and alpha power are modified differently by the degree of degradation of continuous speech but together contribute to the subjective speech understanding.
Collapse
Affiliation(s)
- Anne Hauswald
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Anne Keitel
- Psychology, School of Social SciencesUniversity of DundeeDundeeUK
- Centre for Cognitive NeuroimagingUniversity of GlasgowGlasgowUK
| | - Ya‐Ping Chen
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Sebastian Rösch
- Department of OtorhinolaryngologyParacelsus Medical UniversitySalzburgAustria
| | - Nathan Weisz
- Center of Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| |
Collapse
|
97
|
Muncke J, Kuruvila I, Hoppe U. Prediction of Speech Intelligibility by Means of EEG Responses to Sentences in Noise. Front Neurosci 2022; 16:876421. [PMID: 35720724 PMCID: PMC9198593 DOI: 10.3389/fnins.2022.876421] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/13/2022] [Indexed: 11/13/2022] Open
Abstract
Objective Understanding speech in noisy conditions is challenging even for people with mild hearing loss, and intelligibility for an individual person is usually evaluated by using several subjective test methods. In the last few years, a method has been developed to determine a temporal response function (TRF) between speech envelope and simultaneous electroencephalographic (EEG) measurements. By using this TRF it is possible to predict the EEG signal for any speech signal. Recent studies have suggested that the accuracy of this prediction varies with the level of noise added to the speech signal and can predict objectively the individual speech intelligibility. Here we assess the variations of the TRF itself when it is calculated for measurements with different signal-to-noise ratios and apply these variations to predict speech intelligibility. Methods For 18 normal hearing subjects the individual threshold of 50% speech intelligibility was determined by using a speech in noise test. Additionally, subjects listened passively to speech material of the speech in noise test at different signal-to-noise ratios close to individual threshold of 50% speech intelligibility while an EEG was recorded. Afterwards the shape of TRFs for each signal-to-noise ratio and subject were compared with the derived intelligibility. Results The strongest effect of variations in stimulus signal-to-noise ratio on the TRF shape occurred close to 100 ms after the stimulus presentation, and was located in the left central scalp region. The investigated variations in TRF morphology showed a strong correlation with speech intelligibility, and we were able to predict the individual threshold of 50% speech intelligibility with a mean deviation of less then 1.5 dB. Conclusion The intelligibility of speech in noise can be predicted by analyzing the shape of the TRF derived from different stimulus signal-to-noise ratios. Because TRFs are interpretable, in a manner similar to auditory evoked potentials, this method offers new options for clinical diagnostics.
Collapse
Affiliation(s)
- Jan Muncke
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
| | - Ivine Kuruvila
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
- WS Audiology, Erlangen, Germany
| | - Ulrich Hoppe
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
| |
Collapse
|
98
|
Distracting Linguistic Information Impairs Neural Tracking of Attended Speech. CURRENT RESEARCH IN NEUROBIOLOGY 2022; 3:100043. [DOI: 10.1016/j.crneur.2022.100043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/27/2022] [Accepted: 05/24/2022] [Indexed: 11/20/2022] Open
|
99
|
Phelps J, Attaheri A, Bozic M. How bilingualism modulates selective attention in children. Sci Rep 2022; 12:6381. [PMID: 35430617 PMCID: PMC9013372 DOI: 10.1038/s41598-022-09989-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 03/31/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractThere is substantial evidence that learning and using multiple languages modulates selective attention in children. The current study investigated the mechanisms that drive this modification. Specifically, we asked whether the need for constant management of competing languages in bilinguals increases attentional capacity, or draws on the available resources such that they need to be economised to support optimal task performance. Monolingual and bilingual children aged 7–12 attended to a narrative presented in one ear, while ignoring different types of interference in the other ear. We used EEG to capture the neural encoding of attended and unattended speech envelopes, and assess how well they can be reconstructed from the responses of the neuronal populations that encode them. Despite equivalent behavioral performance, monolingual and bilingual children encoded attended speech differently, with the pattern of encoding across conditions in bilinguals suggesting a redistribution of the available attentional capacity, rather than its enhancement.
Collapse
|
100
|
Asymmetrical cross-modal influence on neural encoding of auditory and visual features in natural scenes. Neuroimage 2022; 255:119182. [PMID: 35395403 DOI: 10.1016/j.neuroimage.2022.119182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 03/24/2022] [Accepted: 04/04/2022] [Indexed: 11/22/2022] Open
Abstract
Natural scenes contain multi-modal information, which is integrated to form a coherent perception. Previous studies have demonstrated that cross-modal information can modulate neural encoding of low-level sensory features. These studies, however, mostly focus on the processing of single sensory events or rhythmic sensory sequences. Here, we investigate how the neural encoding of basic auditory and visual features is modulated by cross-modal information when the participants watch movie clips primarily composed of non-rhythmic events. We presented audiovisual congruent and audiovisual incongruent movie clips, and since attention can modulate cross-modal interactions, we separately analyzed high- and low-arousal movie clips. We recorded neural responses using electroencephalography (EEG), and employed the temporal response function (TRF) to quantify the neural encoding of auditory and visual features. The neural encoding of sound envelope is enhanced in the audiovisual congruent condition than the incongruent condition, but this effect is only significant for high-arousal movie clips. In contrast, audiovisual congruency does not significantly modulate the neural encoding of visual features, e.g., luminance or visual motion. In summary, our findings demonstrate asymmetrical cross-modal interactions during the processing of natural scenes that lack rhythmicity: Congruent visual information enhances low-level auditory processing, while congruent auditory information does not significantly modulate low-level visual processing.
Collapse
|