1
|
Piña Méndez Á, Taitz A, Palacios Rodríguez O, Rodríguez Leyva I, Assaneo MF. Speech's syllabic rhythm and articulatory features produced under different auditory feedback conditions identify Parkinsonism. Sci Rep 2024; 14:15787. [PMID: 38982177 PMCID: PMC11233651 DOI: 10.1038/s41598-024-65974-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 06/25/2024] [Indexed: 07/11/2024] Open
Abstract
Diagnostic tests for Parkinsonism based on speech samples have shown promising results. Although abnormal auditory feedback integration during speech production and impaired rhythmic organization of speech are known in Parkinsonism, these aspects have not been incorporated into diagnostic tests. This study aimed to identify Parkinsonism using a novel speech behavioral test that involved rhythmically repeating syllables under different auditory feedback conditions. The study included 30 individuals with Parkinson's disease (PD) and 30 healthy subjects. Participants were asked to rhythmically repeat the PA-TA-KA syllable sequence, both whispering and speaking aloud under various listening conditions. The results showed that individuals with PD had difficulties in whispering and articulating under altered auditory feedback conditions, exhibited delayed speech onset, and demonstrated inconsistent rhythmic structure across trials compared to controls. These parameters were then fed into a supervised machine-learning algorithm to differentiate between the two groups. The algorithm achieved an accuracy of 85.4%, a sensitivity of 86.5%, and a specificity of 84.3%. This pilot study highlights the potential of the proposed behavioral paradigm as an objective and accessible (both in cost and time) test for identifying individuals with Parkinson's disease.
Collapse
Affiliation(s)
- Ángeles Piña Méndez
- Faculty of Psychology, Autonomous University of San Luis Potosí, San Luis Potosí, Mexico
| | | | | | | | - M Florencia Assaneo
- Institute of Neurobiology, National Autonomous University of Mexico, Querétaro, Mexico.
| |
Collapse
|
2
|
Ershaid H, Lizarazu M, McLaughlin D, Cooke M, Simantiraki O, Koutsogiannaki M, Lallier M. Contributions of listening effort and intelligibility to cortical tracking of speech in adverse listening conditions. Cortex 2024; 172:54-71. [PMID: 38215511 DOI: 10.1016/j.cortex.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/05/2023] [Accepted: 11/14/2023] [Indexed: 01/14/2024]
Abstract
Cortical tracking of speech is vital for speech segmentation and is linked to speech intelligibility. However, there is no clear consensus as to whether reduced intelligibility leads to a decrease or an increase in cortical speech tracking, warranting further investigation of the factors influencing this relationship. One such factor is listening effort, defined as the cognitive resources necessary for speech comprehension, and reported to have a strong negative correlation with speech intelligibility. Yet, no studies have examined the relationship between speech intelligibility, listening effort, and cortical tracking of speech. The aim of the present study was thus to examine these factors in quiet and distinct adverse listening conditions. Forty-nine normal hearing adults listened to sentences produced casually, presented in quiet and two adverse listening conditions: cafeteria noise and reverberant speech. Electrophysiological responses were registered with electroencephalogram, and listening effort was estimated subjectively using self-reported scores and objectively using pupillometry. Results indicated varying impacts of adverse conditions on intelligibility, listening effort, and cortical tracking of speech, depending on the preservation of the speech temporal envelope. The more distorted envelope in the reverberant condition led to higher listening effort, as reflected in higher subjective scores, increased pupil diameter, and stronger cortical tracking of speech in the delta band. These findings suggest that using measures of listening effort in addition to those of intelligibility is useful for interpreting cortical tracking of speech results. Moreover, reading and phonological skills of participants were positively correlated with listening effort in the cafeteria condition, suggesting a special role of expert language skills in processing speech in this noisy condition. Implications for future research and theories linking atypical cortical tracking of speech and reading disorders are further discussed.
Collapse
Affiliation(s)
- Hadeel Ershaid
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Mikel Lizarazu
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Drew McLaughlin
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain.
| | - Martin Cooke
- Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| | | | | | - Marie Lallier
- Basque Center on Cognition, Brain and Language, San Sebastian, Spain; Ikerbasque, Basque Science Foundation, Bilbao, Spain.
| |
Collapse
|
3
|
Schüller A, Schilling A, Krauss P, Reichenbach T. The Early Subcortical Response at the Fundamental Frequency of Speech Is Temporally Separated from Later Cortical Contributions. J Cogn Neurosci 2024; 36:475-491. [PMID: 38165737 DOI: 10.1162/jocn_a_02103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
Most parts of speech are voiced, exhibiting a degree of periodicity with a fundamental frequency and many higher harmonics. Some neural populations respond to this temporal fine structure, in particular at the fundamental frequency. This frequency-following response to speech consists of both subcortical and cortical contributions and can be measured through EEG as well as through magnetoencephalography (MEG), although both differ in the aspects of neural activity that they capture: EEG is sensitive to both radial and tangential sources as well as to deep sources, whereas MEG is more restrained to the measurement of tangential and superficial neural activity. EEG responses to continuous speech have shown an early subcortical contribution, at a latency of around 9 msec, in agreement with MEG measurements in response to short speech tokens, whereas MEG responses to continuous speech have not yet revealed such an early component. Here, we analyze MEG responses to long segments of continuous speech. We find an early subcortical response at latencies of 4-11 msec, followed by later right-lateralized cortical activities at delays of 20-58 msec as well as potential subcortical activities. Our results show that the early subcortical component of the FFR to continuous speech can be measured from MEG in populations of participants and that its latency agrees with that measured with EEG. They furthermore show that the early subcortical component is temporally well separated from later cortical contributions, enabling an independent assessment of both components toward further aspects of speech processing.
Collapse
Affiliation(s)
| | | | - Patrick Krauss
- Friedrich-Alexander-Universität Erlangen-Nürnberg
- Universitätsklinikum Erlangen
| | | |
Collapse
|
4
|
Panela RA, Copelli F, Herrmann B. Reliability and generalizability of neural speech tracking in younger and older adults. Neurobiol Aging 2024; 134:165-180. [PMID: 38103477 DOI: 10.1016/j.neurobiolaging.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/09/2023] [Accepted: 11/16/2023] [Indexed: 12/19/2023]
Abstract
Neural tracking of spoken speech is considered a potential clinical biomarker for speech-processing difficulties, but the reliability of neural speech tracking is unclear. Here, younger and older adults listened to stories in two sessions while electroencephalography was recorded to investigate the reliability and generalizability of neural speech tracking. Speech tracking amplitude was larger for older than younger adults, consistent with an age-related loss of inhibition. The reliability of neural speech tracking was moderate (ICC ∼0.5-0.75) and tended to be higher for older adults. However, reliability was lower for speech tracking than for neural responses to noise bursts (ICC >0.8), which we used as a benchmark for maximum reliability. Neural speech tracking generalized moderately across different stories (ICC ∼0.5-0.6), which appeared greatest for audiobook-like stories spoken by the same person. Hence, a variety of stories could possibly be used for clinical assessments. Overall, the current data are important for developing a biomarker of speech processing but suggest that further work is needed to increase the reliability to meet clinical standards.
Collapse
Affiliation(s)
- Ryan A Panela
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada
| | - Francesca Copelli
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada
| | - Björn Herrmann
- Rotman Research Institute, Baycrest Academy for Research and Education, M6A 2E1 North York, ON, Canada; Department of Psychology, University of Toronto, M5S 1A1 Toronto, ON, Canada.
| |
Collapse
|
5
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
6
|
McClaskey CM. Neural hyperactivity and altered envelope encoding in the central auditory system: Changes with advanced age and hearing loss. Hear Res 2024; 442:108945. [PMID: 38154191 PMCID: PMC10942735 DOI: 10.1016/j.heares.2023.108945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/04/2023] [Accepted: 12/22/2023] [Indexed: 12/30/2023]
Abstract
Temporal modulations are ubiquitous features of sound signals that are important for auditory perception. The perception of temporal modulations, or temporal processing, is known to decline with aging and hearing loss and negatively impact auditory perception in general and speech recognition specifically. However, neurophysiological literature also provides evidence of exaggerated or enhanced encoding of specifically temporal envelopes in aging and hearing loss, which may arise from changes in inhibitory neurotransmission and neuronal hyperactivity. This review paper describes the physiological changes to the neural encoding of temporal envelopes that have been shown to occur with age and hearing loss and discusses the role of disinhibition and neural hyperactivity in contributing to these changes. Studies in both humans and animal models suggest that aging and hearing loss are associated with stronger neural representations of both periodic amplitude modulation envelopes and of naturalistic speech envelopes, but primarily for low-frequency modulations (<80 Hz). Although the frequency dependence of these results is generally taken as evidence of amplified envelope encoding at the cortex and impoverished encoding at the midbrain and brainstem, there is additional evidence to suggest that exaggerated envelope encoding may also occur subcortically, though only for envelopes with low modulation rates. A better understanding of how temporal envelope encoding is altered in aging and hearing loss, and the contexts in which neural responses are exaggerated/diminished, may aid in the development of interventions, assistive devices, and treatment strategies that work to ameliorate age- and hearing-loss-related auditory perceptual deficits.
Collapse
Affiliation(s)
- Carolyn M McClaskey
- Department of Otolaryngology - Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Ave, MSC 550, Charleston, SC 29425, United States.
| |
Collapse
|
7
|
Bachmann FL, Kulasingham JP, Eskelund K, Enqvist M, Alickovic E, Innes-Brown H. Extending Subcortical EEG Responses to Continuous Speech to the Sound-Field. Trends Hear 2024; 28:23312165241246596. [PMID: 38738341 DOI: 10.1177/23312165241246596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024] Open
Abstract
The auditory brainstem response (ABR) is a valuable clinical tool for objective hearing assessment, which is conventionally detected by averaging neural responses to thousands of short stimuli. Progressing beyond these unnatural stimuli, brainstem responses to continuous speech presented via earphones have been recently detected using linear temporal response functions (TRFs). Here, we extend earlier studies by measuring subcortical responses to continuous speech presented in the sound-field, and assess the amount of data needed to estimate brainstem TRFs. Electroencephalography (EEG) was recorded from 24 normal hearing participants while they listened to clicks and stories presented via earphones and loudspeakers. Subcortical TRFs were computed after accounting for non-linear processing in the auditory periphery by either stimulus rectification or an auditory nerve model. Our results demonstrated that subcortical responses to continuous speech could be reliably measured in the sound-field. TRFs estimated using auditory nerve models outperformed simple rectification, and 16 minutes of data was sufficient for the TRFs of all participants to show clear wave V peaks for both earphones and sound-field stimuli. Subcortical TRFs to continuous speech were highly consistent in both earphone and sound-field conditions, and with click ABRs. However, sound-field TRFs required slightly more data (16 minutes) to achieve clear wave V peaks compared to earphone TRFs (12 minutes), possibly due to effects of room acoustics. By investigating subcortical responses to sound-field speech stimuli, this study lays the groundwork for bringing objective hearing assessment closer to real-life conditions, which may lead to improved hearing evaluations and smart hearing technologies.
Collapse
Affiliation(s)
| | - Joshua P Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | | | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Emina Alickovic
- Eriksholm Research Centre, Snekkersten, Denmark
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
8
|
Commuri V, Kulasingham JP, Simon JZ. Cortical responses time-locked to continuous speech in the high-gamma band depend on selective attention. Front Neurosci 2023; 17:1264453. [PMID: 38156264 PMCID: PMC10752935 DOI: 10.3389/fnins.2023.1264453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/21/2023] [Indexed: 12/30/2023] Open
Abstract
Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of ~40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.
Collapse
Affiliation(s)
- Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, MD, United States
| |
Collapse
|
9
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
10
|
Gwilliams L, Flick G, Marantz A, Pylkkänen L, Poeppel D, King JR. Introducing MEG-MASC a high-quality magneto-encephalography dataset for evaluating natural speech processing. Sci Data 2023; 10:862. [PMID: 38049487 PMCID: PMC10695966 DOI: 10.1038/s41597-023-02752-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/16/2023] [Indexed: 12/06/2023] Open
Abstract
The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers who listened to two hours of naturalistic stories. Each participant performed two identical sessions, involving listening to four fictional stories from the Manually Annotated Sub-Corpus (MASC) intermixed with random word lists and comprehension questions. We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS). This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech. We provide the Python code to replicate several validations analyses of the MEG evoked responses such as the temporal decoding of phonetic features and word frequency. All code and MEG, audio and text data are publicly available to keep with best practices in transparent and reproducible research.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, Stanford University, Stanford, USA.
- Department of Psychology, New York University, New York, USA.
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates.
| | - Graham Flick
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
- Rotman Research Institute, Baycrest Hospital, Toronto, Canada
| | - Alec Marantz
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
| | - Liina Pylkkänen
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, USA
- Ernst Struengmann Institute for Neuroscience, Frankfurt, Germany
| | - Jean-Rémi King
- Department of Psychology, New York University, New York, USA
- LSP, École normale supérieure, PSL University, CNRS, 75005, Paris, France
| |
Collapse
|
11
|
Brodbeck C, Das P, Gillis M, Kulasingham JP, Bhattasali S, Gaston P, Resnik P, Simon JZ. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife 2023; 12:e85012. [PMID: 38018501 PMCID: PMC10783870 DOI: 10.7554/elife.85012] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 11/24/2023] [Indexed: 11/30/2023] Open
Abstract
Even though human experience unfolds continuously in time, it is not strictly linear; instead, it entails cascading processes building hierarchical cognitive structures. For instance, during speech perception, humans transform a continuously varying acoustic signal into phonemes, words, and meaning, and these levels all have distinct but interdependent temporal structures. Time-lagged regression using temporal response functions (TRFs) has recently emerged as a promising tool for disentangling electrophysiological brain responses related to such complex models of perception. Here, we introduce the Eelbrain Python toolkit, which makes this kind of analysis easy and accessible. We demonstrate its use, using continuous speech as a sample paradigm, with a freely available EEG dataset of audiobook listening. A companion GitHub repository provides the complete source code for the analysis, from raw data to group-level statistics. More generally, we advocate a hypothesis-driven approach in which the experimenter specifies a hierarchy of time-continuous representations that are hypothesized to have contributed to brain responses, and uses those as predictor variables for the electrophysiological signal. This is analogous to a multiple regression problem, but with the addition of a time dimension. TRF analysis decomposes the brain signal into distinct responses associated with the different predictor variables by estimating a multivariate TRF (mTRF), quantifying the influence of each predictor on brain responses as a function of time(-lags). This allows asking two questions about the predictor variables: (1) Is there a significant neural representation corresponding to this predictor variable? And if so, (2) what are the temporal characteristics of the neural response associated with it? Thus, different predictor variables can be systematically combined and evaluated to jointly model neural processing at multiple hierarchical levels. We discuss applications of this approach, including the potential for linking algorithmic/representational theories at different cognitive levels to brain responses through computational models with appropriate linking hypotheses.
Collapse
Affiliation(s)
| | - Proloy Das
- Stanford UniversityStanfordUnited States
| | | | | | | | | | - Philip Resnik
- University of Maryland, College ParkCollege ParkUnited States
| | | |
Collapse
|
12
|
Zhang X, Li J, Li Z, Hong B, Diao T, Ma X, Nolte G, Engel AK, Zhang D. Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension. Neuroimage 2023; 282:120404. [PMID: 37806465 DOI: 10.1016/j.neuroimage.2023.120404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/19/2023] [Accepted: 10/05/2023] [Indexed: 10/10/2023] Open
Abstract
Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.
Collapse
Affiliation(s)
- Xinmiao Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Jiawei Li
- Department of Education and Psychology, Freie Universität Berlin, Berlin 14195, Federal Republic of Germany
| | - Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China; Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Tongxiang Diao
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Xin Ma
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
13
|
Van Hirtum T, Somers B, Dieudonné B, Verschueren E, Wouters J, Francart T. Neural envelope tracking predicts speech intelligibility and hearing aid benefit in children with hearing loss. Hear Res 2023; 439:108893. [PMID: 37806102 DOI: 10.1016/j.heares.2023.108893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/01/2023] [Accepted: 09/27/2023] [Indexed: 10/10/2023]
Abstract
Early assessment of hearing aid benefit is crucial, as the extent to which hearing aids provide audible speech information predicts speech and language outcomes. A growing body of research has proposed neural envelope tracking as an objective measure of speech intelligibility, particularly for individuals unable to provide reliable behavioral feedback. However, its potential for evaluating speech intelligibility and hearing aid benefit in children with hearing loss remains unexplored. In this study, we investigated neural envelope tracking in children with permanent hearing loss through two separate experiments. EEG data were recorded while children listened to age-appropriate stories (Experiment 1) or an animated movie (Experiment 2) under aided and unaided conditions (using personal hearing aids) at multiple stimulus intensities. Neural envelope tracking was evaluated using a linear decoder reconstructing the speech envelope from the EEG in the delta band (0.5-4 Hz). Additionally, we calculated temporal response functions (TRFs) to investigate the spatio-temporal dynamics of the response. In both experiments, neural tracking increased with increasing stimulus intensity, but only in the unaided condition. In the aided condition, neural tracking remained stable across a wide range of intensities, as long as speech intelligibility was maintained. Similarly, TRF amplitudes increased with increasing stimulus intensity in the unaided condition, while in the aided condition significant differences were found in TRF latency rather than TRF amplitude. This suggests that decreasing stimulus intensity does not necessarily impact neural tracking. Furthermore, the use of personal hearing aids significantly enhanced neural envelope tracking, particularly in challenging speech conditions that would be inaudible when unaided. Finally, we found a strong correlation between neural envelope tracking and behaviorally measured speech intelligibility for both narrated stories (Experiment 1) and movie stimuli (Experiment 2). Altogether, these findings indicate that neural envelope tracking could be a valuable tool for predicting speech intelligibility benefits derived from personal hearing aids in hearing-impaired children. Incorporating narrated stories or engaging movies expands the accessibility of these methods even in clinical settings, offering new avenues for using objective speech measures to guide pediatric audiology decision-making.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Jan Wouters
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, 3000 Leuven, Belgium.
| |
Collapse
|
14
|
Commuri V, Kulasingham JP, Simon JZ. Cortical Responses Time-Locked to Continuous Speech in the High-Gamma Band Depend on Selective Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.20.549567. [PMID: 37546895 PMCID: PMC10401961 DOI: 10.1101/2023.07.20.549567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of approximately 40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.
Collapse
Affiliation(s)
- Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, MD, United States
| |
Collapse
|
15
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
16
|
Puffay C, Vanthornhout J, Gillis M, Accou B, Van Hamme H, Francart T. Robust neural tracking of linguistic speech representations using a convolutional neural network. J Neural Eng 2023; 20:046040. [PMID: 37595606 DOI: 10.1088/1741-2552/acf1ce] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/18/2023] [Indexed: 08/20/2023]
Abstract
Objective.When listening to continuous speech, populations of neurons in the brain track different features of the signal. Neural tracking can be measured by relating the electroencephalography (EEG) and the speech signal. Recent studies have shown a significant contribution of linguistic features over acoustic neural tracking using linear models. However, linear models cannot model the nonlinear dynamics of the brain. To overcome this, we use a convolutional neural network (CNN) that relates EEG to linguistic features using phoneme or word onsets as a control and has the capacity to model non-linear relations.Approach.We integrate phoneme- and word-based linguistic features (phoneme surprisal, cohort entropy (CE), word surprisal (WS) and word frequency (WF)) in our nonlinear CNN model and investigate if they carry additional information on top of lexical features (phoneme and word onsets). We then compare the performance of our nonlinear CNN with that of a linear encoder and a linearized CNN.Main results.For the non-linear CNN, we found a significant contribution of CE over phoneme onsets and of WS and WF over word onsets. Moreover, the non-linear CNN outperformed the linear baselines.Significance.Measuring coding of linguistic features in the brain is important for auditory neuroscience research and applications that involve objectively measuring speech understanding. With linear models, this is measurable, but the effects are very small. The proposed non-linear CNN model yields larger differences between linguistic and lexical models and, therefore, could show effects that would otherwise be unmeasurable and may, in the future, lead to improved within-subject measures and shorter recordings.
Collapse
Affiliation(s)
- Corentin Puffay
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | | | - Marlies Gillis
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Bernd Accou
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| |
Collapse
|
17
|
Gong XL, Huth AG, Deniz F, Johnson K, Gallant JL, Theunissen FE. Phonemic segmentation of narrative speech in human cerebral cortex. Nat Commun 2023; 14:4309. [PMID: 37463907 PMCID: PMC10354060 DOI: 10.1038/s41467-023-39872-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 06/29/2023] [Indexed: 07/20/2023] Open
Abstract
Speech processing requires extracting meaning from acoustic patterns using a set of intermediate representations based on a dynamic segmentation of the speech stream. Using whole brain mapping obtained in fMRI, we investigate the locus of cortical phonemic processing not only for single phonemes but also for short combinations made of diphones and triphones. We find that phonemic processing areas are much larger than previously described: they include not only the classical areas in the dorsal superior temporal gyrus but also a larger region in the lateral temporal cortex where diphone features are best represented. These identified phonemic regions overlap with the lexical retrieval region, but we show that short word retrieval is not sufficient to explain the observed responses to diphones. Behavioral studies have shown that phonemic processing and lexical retrieval are intertwined. Here, we also have identified candidate regions within the speech cortical network where this joint processing occurs.
Collapse
Affiliation(s)
- Xue L Gong
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA.
| | - Alexander G Huth
- Departments of Neuroscience and Computer Science, University of Texas, Austin, Austin, 78712, TX, USA
| | - Fatma Deniz
- Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, 10587, Berlin, Germany
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, 94720, CA, USA
| | - Jack L Gallant
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA
- Department of Psychology, University of California, Berkeley, Berkeley, 94720, CA, USA
| | - Frédéric E Theunissen
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, 94720, CA, USA.
- Department of Psychology, University of California, Berkeley, Berkeley, 94720, CA, USA.
- Department of Integrative Biology, University of California, Berkeley, Berkeley, 94720, CA, USA.
| |
Collapse
|
18
|
Gillis M, Vanthornhout J, Francart T. Heard or Understood? Neural Tracking of Language Features in a Comprehensible Story, an Incomprehensible Story and a Word List. eNeuro 2023; 10:ENEURO.0075-23.2023. [PMID: 37451862 DOI: 10.1523/eneuro.0075-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/21/2023] [Accepted: 06/25/2023] [Indexed: 07/18/2023] Open
Abstract
Speech comprehension is a complex neural process on which relies on activation and integration of multiple brain regions. In the current study, we evaluated whether speech comprehension can be investigated by neural tracking. Neural tracking is the phenomenon in which the brain responses time-lock to the rhythm of specific features in continuous speech. These features can be acoustic, i.e., acoustic tracking, or derived from the content of the speech using language properties, i.e., language tracking. We evaluated whether neural tracking of speech differs between a comprehensible story, an incomprehensible story, and a word list. We evaluated the neural responses to speech of 19 participants (six men). No significant difference regarding acoustic tracking was found. However, significant language tracking was only found for the comprehensible story. The most prominent effect was visible to word surprisal, a language feature at the word level. The neural response to word surprisal showed a prominent negativity between 300 and 400 ms, similar to the N400 in evoked response paradigms. This N400 was significantly more negative when the story was comprehended, i.e., when words could be integrated in the context of previous words. These results show that language tracking can capture the effect of speech comprehension.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Katholieke Universiteit Leuven, Leuven 3000, Belgium
| |
Collapse
|
19
|
Lindboom E, Nidiffer A, Carney LH, Lalor EC. Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. Hear Res 2023; 433:108767. [PMID: 37060895 PMCID: PMC10559335 DOI: 10.1016/j.heares.2023.108767] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/29/2023] [Accepted: 04/09/2023] [Indexed: 04/17/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyzes resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.
Collapse
Affiliation(s)
- Elsa Lindboom
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Aaron Nidiffer
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA.
| | - Edmund C Lalor
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| |
Collapse
|
20
|
Van Hirtum T, Somers B, Verschueren E, Dieudonné B, Francart T. Delta-band neural envelope tracking predicts speech intelligibility in noise in preschoolers. Hear Res 2023; 434:108785. [PMID: 37172414 DOI: 10.1016/j.heares.2023.108785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/24/2023] [Accepted: 05/05/2023] [Indexed: 05/15/2023]
Abstract
Behavioral tests are currently the gold standard in measuring speech intelligibility. However, these tests can be difficult to administer in young children due to factors such as motivation, linguistic knowledge and cognitive skills. It has been shown that measures of neural envelope tracking can be used to predict speech intelligibility and overcome these issues. However, its potential as an objective measure for speech intelligibility in noise remains to be investigated in preschool children. Here, we evaluated neural envelope tracking as a function of signal-to-noise ratio (SNR) in 14 5-year-old children. We examined EEG responses to natural, continuous speech presented at different SNRs ranging from -8 (very difficult) to 8 dB SNR (very easy). As expected delta band (0.5-4 Hz) tracking increased with increasing stimulus SNR. However, this increase was not strictly monotonic as neural tracking reached a plateau between 0 and 4 dB SNR, similarly to the behavioral speech intelligibility outcomes. These findings indicate that neural tracking in the delta band remains stable, as long as the acoustical degradation of the speech signal does not reflect significant changes in speech intelligibility. Theta band tracking (4-8 Hz), on the other hand, was found to be drastically reduced and more easily affected by noise in children, making it less reliable as a measure of speech intelligibility. By contrast, neural envelope tracking in the delta band was directly associated with behavioral measures of speech intelligibility. This suggests that neural envelope tracking in the delta band is a valuable tool for evaluating speech-in-noise intelligibility in preschoolers, highlighting its potential as an objective measure of speech in difficult-to-test populations.
Collapse
Affiliation(s)
- Tilde Van Hirtum
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium.
| | - Ben Somers
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Eline Verschueren
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Benjamin Dieudonné
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, Leuven 3000, Belgium
| |
Collapse
|
21
|
Park JJ, Baek SC, Suh MW, Choi J, Kim SJ, Lim Y. The effect of topic familiarity and volatility of auditory scene on selective auditory attention. Hear Res 2023; 433:108770. [PMID: 37104990 DOI: 10.1016/j.heares.2023.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/06/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023]
Abstract
Selective auditory attention has been shown to modulate the cortical representation of speech. This effect has been well documented in acoustically more challenging environments. However, the influence of top-down factors, in particular topic familiarity, on this process remains unclear, despite evidence that semantic information can promote speech-in-noise perception. Apart from individual features forming a static listening condition, dynamic and irregular changes of auditory scenes-volatile listening environments-have been less studied. To address these gaps, we explored the influence of topic familiarity and volatile listening on the selective auditory attention process during dichotic listening using electroencephalography. When stories with unfamiliar topics were presented, participants' comprehension was severely degraded. However, their cortical activity selectively tracked the speech of the target story well. This implies that topic familiarity hardly influences the speech tracking neural index, possibly when the bottom-up information is sufficient. However, when the listening environment was volatile and the listeners had to re-engage in new speech whenever auditory scenes altered, the neural correlates of the attended speech were degraded. In particular, the cortical response to the attended speech and the spatial asymmetry of the response to the left and right attention were significantly attenuated around 100-200 ms after the speech onset. These findings suggest that volatile listening environments could adversely affect the modulation effect of selective attention, possibly by hampering proper attention due to increased perceptual load.
Collapse
Affiliation(s)
- Jonghwa Jeonglok Park
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul 03080, South Korea
| | - Jongsuk Choi
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of AI Robotics, KIST School, Korea University of Science and Technology, Seoul 02792, South Korea
| | - Sung June Kim
- Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, South Korea.
| |
Collapse
|
22
|
Xie Z, Brodbeck C, Chandrasekaran B. Cortical Tracking of Continuous Speech Under Bimodal Divided Attention. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:318-343. [PMID: 37229509 PMCID: PMC10205152 DOI: 10.1162/nol_a_00100] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/11/2023] [Indexed: 05/27/2023]
Abstract
Speech processing often occurs amid competing inputs from other modalities, for example, listening to the radio while driving. We examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) impacts neural processing of natural continuous speech from acoustic to linguistic levels of representation. We recorded electroencephalographic (EEG) responses when human participants performed a challenging primary visual task, imposing low or high cognitive load while listening to audiobook stories as a secondary task. The two dual-task conditions were contrasted with an auditory single-task condition in which participants attended to stories while ignoring visual stimuli. Behaviorally, the high load dual-task condition was associated with lower speech comprehension accuracy relative to the other two conditions. We fitted multivariate temporal response function encoding models to predict EEG responses from acoustic and linguistic speech features at different representation levels, including auditory spectrograms and information-theoretic models of sublexical-, word-form-, and sentence-level representations. Neural tracking of most acoustic and linguistic features remained unchanged with increasing dual-task load, despite unambiguous behavioral and neural evidence of the high load dual-task condition being more demanding. Compared to the auditory single-task condition, dual-task conditions selectively reduced neural tracking of only some acoustic and linguistic features, mainly at latencies >200 ms, while earlier latencies were surprisingly unaffected. These findings indicate that behavioral effects of bimodal divided attention on continuous speech processing occur not because of impaired early sensory representations but likely at later cognitive processing stages. Crossmodal attention-related mechanisms may not be uniform across different speech processing levels.
Collapse
Affiliation(s)
- Zilong Xie
- School of Communication Science and Disorders, Florida State University, Tallahassee, FL, USA
| | - Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
23
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
24
|
Gillis M, Kries J, Vandermosten M, Francart T. Neural tracking of linguistic and acoustic speech representations decreases with advancing age. Neuroimage 2023; 267:119841. [PMID: 36584758 PMCID: PMC9878439 DOI: 10.1016/j.neuroimage.2022.119841] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 12/21/2022] [Accepted: 12/26/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Older adults process speech differently, but it is not yet clear how aging affects different levels of processing natural, continuous speech, both in terms of bottom-up acoustic analysis and top-down generation of linguistic-based predictions. We studied natural speech processing across the adult lifespan via electroencephalography (EEG) measurements of neural tracking. GOALS Our goals are to analyze the unique contribution of linguistic speech processing across the adult lifespan using natural speech, while controlling for the influence of acoustic processing. Moreover, we also studied acoustic processing across age. In particular, we focus on changes in spatial and temporal activation patterns in response to natural speech across the lifespan. METHODS 52 normal-hearing adults between 17 and 82 years of age listened to a naturally spoken story while the EEG signal was recorded. We investigated the effect of age on acoustic and linguistic processing of speech. Because age correlated with hearing capacity and measures of cognition, we investigated whether the observed age effect is mediated by these factors. Furthermore, we investigated whether there is an effect of age on hemisphere lateralization and on spatiotemporal patterns of the neural responses. RESULTS Our EEG results showed that linguistic speech processing declines with advancing age. Moreover, as age increased, the neural response latency to certain aspects of linguistic speech processing increased. Also acoustic neural tracking (NT) decreased with increasing age, which is at odds with the literature. In contrast to linguistic processing, older subjects showed shorter latencies for early acoustic responses to speech. No evidence was found for hemispheric lateralization in neither younger nor older adults during linguistic speech processing. Most of the observed aging effects on acoustic and linguistic processing were not explained by age-related decline in hearing capacity or cognition. However, our results suggest that the effect of decreasing linguistic neural tracking with advancing age at word-level is also partially due to an age-related decline in cognition than a robust effect of age. CONCLUSION Spatial and temporal characteristics of the neural responses to continuous speech change across the adult lifespan for both acoustic and linguistic speech processing. These changes may be traces of structural and/or functional change that occurs with advancing age.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jill Kries
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | | | | |
Collapse
|
25
|
Herrmann B, Maess B, Johnsrude IS. Sustained responses and neural synchronization to amplitude and frequency modulation in sound change with age. Hear Res 2023; 428:108677. [PMID: 36580732 DOI: 10.1016/j.heares.2022.108677] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 12/09/2022] [Accepted: 12/16/2022] [Indexed: 12/23/2022]
Abstract
Perception of speech requires sensitivity to features, such as amplitude and frequency modulations, that are often temporally regular. Previous work suggests age-related changes in neural responses to temporally regular features, but little work has focused on age differences for different types of modulations. We recorded magnetoencephalography in younger (21-33 years) and older adults (53-73 years) to investigate age differences in neural responses to slow (2-6 Hz sinusoidal and non-sinusoidal) modulations in amplitude, frequency, or combined amplitude and frequency. Audiometric pure-tone average thresholds were elevated in older compared to younger adults, indicating subclinical hearing impairment in the recruited older-adult sample. Neural responses to sound onset (independent of temporal modulations) were increased in magnitude in older compared to younger adults, suggesting hyperresponsivity and a loss of inhibition in the aged auditory system. Analyses of neural activity to modulations revealed greater neural synchronization with amplitude, frequency, and combined amplitude-frequency modulations for older compared to younger adults. This potentiated response generalized across different degrees of temporal regularity (sinusoidal and non-sinusoidal), although neural synchronization was generally lower for non-sinusoidal modulation. Despite greater synchronization, sustained neural activity was reduced in older compared to younger adults for sounds modulated both sinusoidally and non-sinusoidally in frequency. Our results suggest age differences in the sensitivity of the auditory system to features present in speech and other natural sounds.
Collapse
Affiliation(s)
- Björn Herrmann
- Rotman Research Institute, Baycrest, North York, ON M6A 2E1, Canada; Department of Psychology, University of Toronto, Toronto, ON M5S 1A1, Canada; Department of Psychology & Brain and Mind Institute, The University of Western Ontario, London, ON N6A 3K7, Canada.
| | - Burkhard Maess
- Max Planck Institute for Human Cognitive and Brain Sciences, Brain Networks Unit, Leipzig 04103, Germany
| | - Ingrid S Johnsrude
- Department of Psychology & Brain and Mind Institute, The University of Western Ontario, London, ON N6A 3K7, Canada; School of Communication Sciences & Disorders, The University of Western Ontario, London, ON N6A 5B7, Canada
| |
Collapse
|
26
|
Chalas N, Daube C, Kluger DS, Abbasi O, Nitsch R, Gross J. Speech onsets and sustained speech contribute differentially to delta and theta speech tracking in auditory cortex. Cereb Cortex 2023; 33:6273-6281. [PMID: 36627246 DOI: 10.1093/cercor/bhac502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 11/21/2022] [Accepted: 11/22/2022] [Indexed: 01/12/2023] Open
Abstract
When we attentively listen to an individual's speech, our brain activity dynamically aligns to the incoming acoustic input at multiple timescales. Although this systematic alignment between ongoing brain activity and speech in auditory brain areas is well established, the acoustic events that drive this phase-locking are not fully understood. Here, we use magnetoencephalographic recordings of 24 human participants (12 females) while they were listening to a 1 h story. We show that whereas speech-brain coupling is associated with sustained acoustic fluctuations in the speech envelope in the theta-frequency range (4-7 Hz), speech tracking in the low-frequency delta (below 1 Hz) was strongest around onsets of speech, like the beginning of a sentence. Crucially, delta tracking in bilateral auditory areas was not sustained after onsets, proposing a delta tracking during continuous speech perception that is driven by speech onsets. We conclude that both onsets and sustained components of speech contribute differentially to speech tracking in delta- and theta-frequency bands, orchestrating sampling of continuous speech. Thus, our results suggest a temporal dissociation of acoustically driven oscillatory activity in auditory areas during speech tracking, providing valuable implications for orchestration of speech tracking at multiple time scales.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany.,Institute for Translational Neuroscience, University of Münster, Albert-Schweitzer-Campus 1, Geb. A9a, Münster, Germany
| | - Christoph Daube
- Centre for Cognitive Neuroimaging, University of Glasgow, 56-64 Hillhead Street, G12 8QB, Glasgow, United Kingdom
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Albert-Schweitzer-Campus 1, Geb. A9a, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Malmedyweg 15, 48149, Münster, Germany.,Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Fliednerstr. 21, 48149 Münster, Germany
| |
Collapse
|
27
|
Schubert J, Schmidt F, Gehmacher Q, Bresgen A, Weisz N. Cortical speech tracking is related to individual prediction tendencies. Cereb Cortex 2023:6975346. [PMID: 36617790 DOI: 10.1093/cercor/bhac528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/13/2022] [Accepted: 12/14/2022] [Indexed: 01/10/2023] Open
Abstract
Listening can be conceptualized as a process of active inference, in which the brain forms internal models to integrate auditory information in a complex interaction of bottom-up and top-down processes. We propose that individuals vary in their "prediction tendency" and that this variation contributes to experiential differences in everyday listening situations and shapes the cortical processing of acoustic input such as speech. Here, we presented tone sequences of varying entropy level, to independently quantify auditory prediction tendency (as the tendency to anticipate low-level acoustic features) for each individual. This measure was then used to predict cortical speech tracking in a multi speaker listening task, where participants listened to audiobooks narrated by a target speaker in isolation or interfered by 1 or 2 distractors. Furthermore, semantic violations were introduced into the story, to also examine effects of word surprisal during speech processing. Our results show that cortical speech tracking is related to prediction tendency. In addition, we find interactions between prediction tendency and background noise as well as word surprisal in disparate brain regions. Our findings suggest that individual prediction tendencies are generalizable across different listening situations and may serve as a valuable element to explain interindividual differences in natural listening situations.
Collapse
Affiliation(s)
- Juliane Schubert
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Fabian Schmidt
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Quirin Gehmacher
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Annika Bresgen
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Nathan Weisz
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria.,Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
28
|
Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.02.522438. [PMID: 36711934 PMCID: PMC9881851 DOI: 10.1101/2023.01.02.522438] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyses resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.x.
Collapse
|
29
|
Kulasingham JP, Simon JZ. Algorithms for Estimating Time-Locked Neural Response Components in Cortical Processing of Continuous Speech. IEEE Trans Biomed Eng 2023; 70:88-96. [PMID: 35727788 PMCID: PMC9946293 DOI: 10.1109/tbme.2022.3185005] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE The Temporal Response Function (TRF) is a linear model of neural activity time-locked to continuous stimuli, including continuous speech. TRFs based on speech envelopes typically have distinct components that have provided remarkable insights into the cortical processing of speech. However, current methods may lead to less than reliable estimates of single-subject TRF components. Here, we compare two established methods, in TRF component estimation, and also propose novel algorithms that utilize prior knowledge of these components, bypassing the full TRF estimation. METHODS We compared two established algorithms, ridge and boosting, and two novel algorithms based on Subspace Pursuit (SP) and Expectation Maximization (EM), which directly estimate TRF components given plausible assumptions regarding component characteristics. Single-channel, multi-channel, and source-localized TRFs were fit on simulations and real magnetoencephalographic data. Performance metrics included model fit and component estimation accuracy. RESULTS Boosting and ridge have comparable performance in component estimation. The novel algorithms outperformed the others in simulations, but not on real data, possibly due to the plausible assumptions not actually being met. Ridge had slightly better model fits on real data compared to boosting, but also more spurious TRF activity. CONCLUSION Results indicate that both smooth (ridge) and sparse (boosting) algorithms perform comparably at TRF component estimation. The SP and EM algorithms may be accurate, but rely on assumptions of component characteristics. SIGNIFICANCE This systematic comparison establishes the suitability of widely used and novel algorithms for estimating robust TRF components, which is essential for improved subject-specific investigations into the cortical processing of speech.
Collapse
|
30
|
Simon JZ, Commuri V, Kulasingham JP. Time-locked auditory cortical responses in the high-gamma band: A window into primary auditory cortex. Front Neurosci 2022; 16:1075369. [PMID: 36570848 PMCID: PMC9773383 DOI: 10.3389/fnins.2022.1075369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022] Open
Abstract
Primary auditory cortex is a critical stage in the human auditory pathway, a gateway between subcortical and higher-level cortical areas. Receiving the output of all subcortical processing, it sends its output on to higher-level cortex. Non-invasive physiological recordings of primary auditory cortex using electroencephalography (EEG) and magnetoencephalography (MEG), however, may not have sufficient specificity to separate responses generated in primary auditory cortex from those generated in underlying subcortical areas or neighboring cortical areas. This limitation is important for investigations of effects of top-down processing (e.g., selective-attention-based) on primary auditory cortex: higher-level areas are known to be strongly influenced by top-down processes, but subcortical areas are often assumed to perform strictly bottom-up processing. Fortunately, recent advances have made it easier to isolate the neural activity of primary auditory cortex from other areas. In this perspective, we focus on time-locked responses to stimulus features in the high gamma band (70-150 Hz) and with early cortical latency (∼40 ms), intermediate between subcortical and higher-level areas. We review recent findings from physiological studies employing either repeated simple sounds or continuous speech, obtaining either a frequency following response (FFR) or temporal response function (TRF). The potential roles of top-down processing are underscored, and comparisons with invasive intracranial EEG (iEEG) and animal model recordings are made. We argue that MEG studies employing continuous speech stimuli may offer particular benefits, in that only a few minutes of speech generates robust high gamma responses from bilateral primary auditory cortex, and without measurable interference from subcortical or higher-level areas.
Collapse
Affiliation(s)
- Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States,Department of Biology, University of Maryland, College Park, College Park, MD, United States,Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States,*Correspondence: Jonathan Z. Simon,
| | - Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
| | | |
Collapse
|
31
|
Bruce L, Peter B. Three children with different de novo BCL11A variants and diverse developmental phenotypes, but shared global motor discoordination and apraxic speech: Evidence for a functional gene network influencing the developing cerebellum and motor and auditory cortices. Am J Med Genet A 2022; 188:3401-3415. [PMID: 35856171 DOI: 10.1002/ajmg.a.62904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 06/28/2022] [Accepted: 07/04/2022] [Indexed: 01/31/2023]
Abstract
BCL11A is implicated in BCL11A-Related Intellectual Development Disorder (BCL11A-IDD). Previously reported cases had various types of BCL11A variants (copy-number variations [CNVs], singlenucleotide variants [SNVs]). Phenotypes included global, cognitive, and motor delays, autism spectrum disorder (ASD), craniofacial dysmorphology, and speech and language delays described generally, with only two reports specifying childhood apraxia of speech (CAS). Here, we present three additional children with CAS and de novo BCL11A variants, a p.Ala182Thr nonconservative missense and a p.GLu611.Ter nonsense variant, both in exon 4, and a 106 kb deletion harboring exons 1 and 2. All three children have fine and gross motor discoordination, feeding difficulties, and visual motor disorders. Intellectual and learning disabilities and disordered language skills were seen only in the child with the missense variant and the child with the deletion. These findings align with, and expand, previous findings in that BCL11A variants have significant and highly penetrant apraxic effects across motor systems, consistent with cerebellar involvement. The deletion of exons 1 and 2 is the smallest BCL11A CNV with the full phenotypic expression reported to date. The present results support previous findings in that BCL11A-IDD can result from BCL11A variants regardless of type (deletion, SNVs). A gene expression study shows that BCL11 is expressed highly in the early developing cerebellum and primary motor and auditory cortices. Significant co-expression rates in these regions with genes previously implicated in disorders of spoken language and in ASD support the phenotypic overlaps in children with BCL11A-IDD, CAS, and ASD.
Collapse
Affiliation(s)
- Laurel Bruce
- College of Health Solutions, Arizona State University, Tempe, Arizona, USA
| | - Beate Peter
- College of Health Solutions, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
32
|
Pinto D, Kaufman M, Brown A, Zion Golumbic E. An ecological investigation of the capacity to follow simultaneous speech and preferential detection of ones’ own name. Cereb Cortex 2022; 33:5361-5374. [PMID: 36331339 DOI: 10.1093/cercor/bhac424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/11/2022] [Accepted: 09/12/2022] [Indexed: 11/06/2022] Open
Abstract
Abstract
Many situations require focusing attention on one speaker, while monitoring the environment for potentially important information. Some have proposed that dividing attention among 2 speakers involves behavioral trade-offs, due to limited cognitive resources. However the severity of these trade-offs, particularly under ecologically-valid circumstances, is not well understood. We investigated the capacity to process simultaneous speech using a dual-task paradigm simulating task-demands and stimuli encountered in real-life. Participants listened to conversational narratives (Narrative Stream) and monitored a stream of announcements (Barista Stream), to detect when their order was called. We measured participants’ performance, neural activity, and skin conductance as they engaged in this dual-task. Participants achieved extremely high dual-task accuracy, with no apparent behavioral trade-offs. Moreover, robust neural and physiological responses were observed for target-stimuli in the Barista Stream, alongside significant neural speech-tracking of the Narrative Stream. These results suggest that humans have substantial capacity to process simultaneous speech and do not suffer from insufficient processing resources, at least for this highly ecological task-combination and level of perceptual load. Results also confirmed the ecological validity of the advantage for detecting ones’ own name at the behavioral, neural, and physiological level, highlighting the contribution of personal relevance when processing simultaneous speech.
Collapse
Affiliation(s)
- Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Maya Kaufman
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Adi Brown
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| |
Collapse
|
33
|
McMurray B, Sarrett ME, Chiu S, Black AK, Wang A, Canale R, Aslin RN. Decoding the temporal dynamics of spoken word and nonword processing from EEG. Neuroimage 2022; 260:119457. [PMID: 35842096 PMCID: PMC10875705 DOI: 10.1016/j.neuroimage.2022.119457] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 07/02/2022] [Accepted: 07/06/2022] [Indexed: 11/23/2022] Open
Abstract
The efficiency of spoken word recognition is essential for real-time communication. There is consensus that this efficiency relies on an implicit process of activating multiple word candidates that compete for recognition as the acoustic signal unfolds in real-time. However, few methods capture the neural basis of this dynamic competition on a msec-by-msec basis. This is crucial for understanding the neuroscience of language, and for understanding hearing, language and cognitive disorders in people for whom current behavioral methods are not suitable. We applied machine-learning techniques to standard EEG signals to decode which word was heard on each trial and analyzed the patterns of confusion over time. Results mirrored psycholinguistic findings: Early on, the decoder was equally likely to report the target (e.g., baggage) or a similar sounding competitor (badger), but by around 500 msec, competitors were suppressed. Follow up analyses show that this is robust across EEG systems (gel and saline), with fewer channels, and with fewer trials. Results are robust within individuals and show high reliability. This suggests a powerful and simple paradigm that can assess the neural dynamics of speech decoding, with potential applications for understanding lexical development in a variety of clinical disorders.
Collapse
Affiliation(s)
- Bob McMurray
- Dept. of Psychological and Brain Sciences, Dept. of Communication Sciences and Disorders, Dept. of Linguistics and Dept. of Otolaryngology, University of Iowa.
| | - McCall E Sarrett
- Interdisciplinary Graduate Program in Neuroscience, Unviersity of Iowa
| | - Samantha Chiu
- Dept. of Psychological and Brain Sciences, University of Iowa
| | - Alexis K Black
- School of Audiology and Speech Sciences, University of British Columbia, Haskins Laboratories
| | - Alice Wang
- Dept. of Psychology, University of Oregon, Haskins Laboratories
| | - Rebecca Canale
- Dept. of Psychological Sciences, University of Connecticut, Haskins Laboratories
| | - Richard N Aslin
- Haskins Laboratories, Department of Psychology and Child Study Center, Yale University, Department of Psychology, University of Connecticut
| |
Collapse
|
34
|
Broderick MP, Zuk NJ, Anderson AJ, Lalor EC. More than words: Neurophysiological correlates of semantic dissimilarity depend on comprehension of the speech narrative. Eur J Neurosci 2022; 56:5201-5214. [PMID: 35993240 DOI: 10.1111/ejn.15805] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 08/15/2022] [Accepted: 08/18/2022] [Indexed: 12/14/2022]
Abstract
Speech comprehension relies on the ability to understand words within a coherent context. Recent studies have attempted to obtain electrophysiological indices of this process by modelling how brain activity is affected by a word's semantic dissimilarity to preceding words. Although the resulting indices appear robust and are strongly modulated by attention, it remains possible that, rather than capturing the contextual understanding of words, they may actually reflect word-to-word changes in semantic content without the need for a narrative-level understanding on the part of the listener. To test this, we recorded electroencephalography from subjects who listened to speech presented in either its original, narrative form, or after scrambling the word order by varying amounts. This manipulation affected the ability of subjects to comprehend the speech narrative but not the ability to recognise individual words. Neural indices of semantic understanding and low-level acoustic processing were derived for each scrambling condition using the temporal response function. Signatures of semantic processing were observed when speech was unscrambled or minimally scrambled and subjects understood the speech. The same markers were absent for higher scrambling levels as speech comprehension dropped. In contrast, word recognition remained high and neural measures related to envelope tracking did not vary significantly across scrambling conditions. This supports the previous claim that electrophysiological indices based on the semantic dissimilarity of words to their context reflect a listener's understanding of those words relative to that context. It also highlights the relative insensitivity of neural measures of low-level speech processing to speech comprehension.
Collapse
Affiliation(s)
- Michael P Broderick
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Nathaniel J Zuk
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Andrew J Anderson
- Del Monte Institute for Neuroscience, Department of Neuroscience, Department of Biomedical Engineering, University of Rochester, Rochester, New York, USA
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland.,Del Monte Institute for Neuroscience, Department of Neuroscience, Department of Biomedical Engineering, University of Rochester, Rochester, New York, USA
| |
Collapse
|
35
|
Verschueren E, Gillis M, Decruy L, Vanthornhout J, Francart T. Speech Understanding Oppositely Affects Acoustic and Linguistic Neural Tracking in a Speech Rate Manipulation Paradigm. J Neurosci 2022; 42:7442-7453. [PMID: 36041851 PMCID: PMC9525161 DOI: 10.1523/jneurosci.0259-22.2022] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 06/29/2022] [Accepted: 07/17/2022] [Indexed: 11/21/2022] Open
Abstract
When listening to continuous speech, the human brain can track features of the presented speech signal. It has been shown that neural tracking of acoustic features is a prerequisite for speech understanding and can predict speech understanding in controlled circumstances. However, the brain also tracks linguistic features of speech, which may be more directly related to speech understanding. We investigated acoustic and linguistic speech processing as a function of varying speech understanding by manipulating the speech rate. In this paradigm, acoustic and linguistic speech processing is affected simultaneously but in opposite directions: When the speech rate increases, more acoustic information per second is present. In contrast, the tracking of linguistic information becomes more challenging when speech is less intelligible at higher speech rates. We measured the EEG of 18 participants (4 male) who listened to speech at various speech rates. As expected and confirmed by the behavioral results, speech understanding decreased with increasing speech rate. Accordingly, linguistic neural tracking decreased with increasing speech rate, but acoustic neural tracking increased. This indicates that neural tracking of linguistic representations can capture the gradual effect of decreasing speech understanding. In addition, increased acoustic neural tracking does not necessarily imply better speech understanding. This suggests that, although more challenging to measure because of the low signal-to-noise ratio, linguistic neural tracking may be a more direct predictor of speech understanding.SIGNIFICANCE STATEMENT An increasingly popular method to investigate neural speech processing is to measure neural tracking. Although much research has been done on how the brain tracks acoustic speech features, linguistic speech features have received less attention. In this study, we disentangled acoustic and linguistic characteristics of neural speech tracking via manipulating the speech rate. A proper way of objectively measuring auditory and language processing paves the way toward clinical applications: An objective measure of speech understanding would allow for behavioral-free evaluation of speech understanding, which allows to evaluate hearing loss and adjust hearing aids based on brain responses. This objective measure would benefit populations from whom obtaining behavioral measures may be complex, such as young children or people with cognitive impairments.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Marlies Gillis
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742
| | - Jonas Vanthornhout
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Tom Francart
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| |
Collapse
|
36
|
Winn MB, Wright RA. Reconsidering commonly used stimuli in speech perception experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1394. [PMID: 36182291 DOI: 10.1121/10.0013415] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 07/18/2022] [Indexed: 06/16/2023]
Abstract
This paper examines some commonly used stimuli in speech perception experiments and raises questions about their use, or about the interpretations of previous results. The takeaway messages are: 1) the Hillenbrand vowels represent a particular dialect rather than a gold standard, and English vowels contain spectral dynamics that have been largely underappreciated, 2) the /ɑ/ context is very common but not clearly superior as a context for testing consonant perception, 3) /ɑ/ is particularly problematic when testing voice-onset-time perception because it introduces strong confounds in the formant transitions, 4) /dɑ/ is grossly overrepresented in neurophysiological studies and yet is insufficient as a generalized proxy for "speech perception," and 5) digit tests and matrix sentences including the coordinate response measure are systematically insensitive to important patterns in speech perception. Each of these stimulus sets and concepts is described with careful attention to their unique value and also cases where they might be misunderstood or over-interpreted.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Richard A Wright
- Department of Linguistics, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
37
|
Kegler M, Weissbart H, Reichenbach T. The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Front Neurosci 2022; 16:915744. [PMID: 35942153 PMCID: PMC9355803 DOI: 10.3389/fnins.2022.915744] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 07/04/2022] [Indexed: 11/21/2022] Open
Abstract
Spoken language comprehension requires rapid and continuous integration of information, from lower-level acoustic to higher-level linguistic features. Much of this processing occurs in the cerebral cortex. Its neural activity exhibits, for instance, correlates of predictive processing, emerging at delays of a few 100 ms. However, the auditory pathways are also characterized by extensive feedback loops from higher-level cortical areas to lower-level ones as well as to subcortical structures. Early neural activity can therefore be influenced by higher-level cognitive processes, but it remains unclear whether such feedback contributes to linguistic processing. Here, we investigated early speech-evoked neural activity that emerges at the fundamental frequency. We analyzed EEG recordings obtained when subjects listened to a story read by a single speaker. We identified a response tracking the speaker's fundamental frequency that occurred at a delay of 11 ms, while another response elicited by the high-frequency modulation of the envelope of higher harmonics exhibited a larger magnitude and longer latency of about 18 ms with an additional significant component at around 40 ms. Notably, while the earlier components of the response likely originate from the subcortical structures, the latter presumably involves contributions from cortical regions. Subsequently, we determined the magnitude of these early neural responses for each individual word in the story. We then quantified the context-independent frequency of each word and used a language model to compute context-dependent word surprisal and precision. The word surprisal represented how predictable a word is, given the previous context, and the word precision reflected the confidence about predicting the next word from the past context. We found that the word-level neural responses at the fundamental frequency were predominantly influenced by the acoustic features: the average fundamental frequency and its variability. Amongst the linguistic features, only context-independent word frequency showed a weak but significant modulation of the neural response to the high-frequency envelope modulation. Our results show that the early neural response at the fundamental frequency is already influenced by acoustic as well as linguistic information, suggesting top-down modulation of this neural response.
Collapse
Affiliation(s)
- Mikolaj Kegler
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, London, United Kingdom
| | - Hugo Weissbart
- Donders Centre for Cognitive Neuroimaging, Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Tobias Reichenbach
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, London, United Kingdom
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany
- *Correspondence: Tobias Reichenbach
| |
Collapse
|
38
|
Haider CL, Suess N, Hauswald A, Park H, Weisz N. Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker. Neuroimage 2022; 252:119044. [PMID: 35240298 DOI: 10.1016/j.neuroimage.2022.119044] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/26/2022] [Accepted: 02/27/2022] [Indexed: 11/29/2022] Open
Abstract
Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.
Collapse
Affiliation(s)
- Chandra Leon Haider
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria.
| | - Nina Suess
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Hyojin Park
- School of Psychology & Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham, UK
| | - Nathan Weisz
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
39
|
Gillis M, Decruy L, Vanthornhout J, Francart T. Hearing loss is associated with delayed neural responses to continuous speech. Eur J Neurosci 2022; 55:1671-1690. [PMID: 35263814 DOI: 10.1111/ejn.15644] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 02/21/2022] [Accepted: 02/23/2022] [Indexed: 11/28/2022]
Abstract
We investigated the impact of hearing loss on the neural processing of speech. Using a forward modeling approach, we compared the neural responses to continuous speech of 14 adults with sensorineural hearing loss with those of age-matched normal-hearing peers. Compared to their normal-hearing peers, hearing-impaired listeners had increased neural tracking and delayed neural responses to continuous speech in quiet. The latency also increased with the degree of hearing loss. As speech understanding decreased, neural tracking decreased in both populations; however, a significantly different trend was observed for the latency of the neural responses. For normal-hearing listeners, the latency increased with increasing background noise level. However, for hearing-impaired listeners, this increase was not observed. Our results support the idea that the neural response latency indicates the efficiency of neural speech processing: more or different brain regions are involved in processing speech, which causes longer communication pathways in the brain. These longer communication pathways hamper the information integration among these brain regions, reflected in longer processing times. Altogether, this suggests decreased neural speech processing efficiency in HI listeners as more time and more or different brain regions are required to process speech. Our results suggest that this reduction in neural speech processing efficiency occurs gradually as hearing deteriorates. From our results, it is apparent that sound amplification does not solve hearing loss. Even when listening to speech in silence at a comfortable loudness, hearing-impaired listeners process speech less efficiently.
Collapse
Affiliation(s)
- Marlies Gillis
- KU Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| | | | - Tom Francart
- KU Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| |
Collapse
|
40
|
Corcoran AW, Perera R, Koroma M, Kouider S, Hohwy J, Andrillon T. Expectations boost the reconstruction of auditory features from electrophysiological responses to noisy speech. Cereb Cortex 2022; 33:691-708. [PMID: 35253871 PMCID: PMC9890472 DOI: 10.1093/cercor/bhac094] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 02/11/2022] [Accepted: 02/12/2022] [Indexed: 02/04/2023] Open
Abstract
Online speech processing imposes significant computational demands on the listening brain, the underlying mechanisms of which remain poorly understood. Here, we exploit the perceptual "pop-out" phenomenon (i.e. the dramatic improvement of speech intelligibility after receiving information about speech content) to investigate the neurophysiological effects of prior expectations on degraded speech comprehension. We recorded electroencephalography (EEG) and pupillometry from 21 adults while they rated the clarity of noise-vocoded and sine-wave synthesized sentences. Pop-out was reliably elicited following visual presentation of the corresponding written sentence, but not following incongruent or neutral text. Pop-out was associated with improved reconstruction of the acoustic stimulus envelope from low-frequency EEG activity, implying that improvements in perceptual clarity were mediated via top-down signals that enhanced the quality of cortical speech representations. Spectral analysis further revealed that pop-out was accompanied by a reduction in theta-band power, consistent with predictive coding accounts of acoustic filling-in and incremental sentence processing. Moreover, delta-band power, alpha-band power, and pupil diameter were all increased following the provision of any written sentence information, irrespective of content. Together, these findings reveal distinctive profiles of neurophysiological activity that differentiate the content-specific processes associated with degraded speech comprehension from the context-specific processes invoked under adverse listening conditions.
Collapse
Affiliation(s)
- Andrew W Corcoran
- Corresponding author: Room E672, 20 Chancellors Walk, Clayton, VIC 3800, Australia.
| | - Ricardo Perera
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Matthieu Koroma
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Sid Kouider
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Jakob Hohwy
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia,Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Thomas Andrillon
- Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia,Paris Brain Institute, Sorbonne Université, Inserm-CNRS, Paris 75013, France
| |
Collapse
|
41
|
Bachmann FL, MacDonald EN, Hjortkjær J. Neural Measures of Pitch Processing in EEG Responses to Running Speech. Front Neurosci 2022; 15:738408. [PMID: 35002597 PMCID: PMC8729880 DOI: 10.3389/fnins.2021.738408] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 11/01/2021] [Indexed: 11/13/2022] Open
Abstract
Linearized encoding models are increasingly employed to model cortical responses to running speech. Recent extensions to subcortical responses suggest clinical perspectives, potentially complementing auditory brainstem responses (ABRs) or frequency-following responses (FFRs) that are current clinical standards. However, while it is well-known that the auditory brainstem responds both to transient amplitude variations and the stimulus periodicity that gives rise to pitch, these features co-vary in running speech. Here, we discuss challenges in disentangling the features that drive the subcortical response to running speech. Cortical and subcortical electroencephalographic (EEG) responses to running speech from 19 normal-hearing listeners (12 female) were analyzed. Using forward regression models, we confirm that responses to the rectified broadband speech signal yield temporal response functions consistent with wave V of the ABR, as shown in previous work. Peak latency and amplitude of the speech-evoked brainstem response were correlated with standard click-evoked ABRs recorded at the vertex electrode (Cz). Similar responses could be obtained using the fundamental frequency (F0) of the speech signal as model predictor. However, simulations indicated that dissociating responses to temporal fine structure at the F0 from broadband amplitude variations is not possible given the high co-variance of the features and the poor signal-to-noise ratio (SNR) of subcortical EEG responses. In cortex, both simulations and data replicated previous findings indicating that envelope tracking on frontal electrodes can be dissociated from responses to slow variations in F0 (relative pitch). Yet, no association between subcortical F0-tracking and cortical responses to relative pitch could be detected. These results indicate that while subcortical speech responses are comparable to click-evoked ABRs, dissociating pitch-related processing in the auditory brainstem may be challenging with natural speech stimuli.
Collapse
Affiliation(s)
- Florine L Bachmann
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Ewen N MacDonald
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Jens Hjortkjær
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark
| |
Collapse
|
42
|
Kraus F, Tune S, Ruhe A, Obleser J, Wöstmann M. Unilateral Acoustic Degradation Delays Attentional Separation of Competing Speech. Trends Hear 2021; 25:23312165211013242. [PMID: 34184964 PMCID: PMC8246482 DOI: 10.1177/23312165211013242] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Hearing loss is often asymmetric such that hearing thresholds differ substantially between the two ears. The extreme case of such asymmetric hearing is single-sided deafness. A unilateral cochlear implant (CI) on the more severely impaired ear is an effective treatment to restore hearing. The interactive effects of unilateral acoustic degradation and spatial attention to one sound source in multitalker situations are at present unclear. Here, we simulated some features of listening with a unilateral CI in young, normal-hearing listeners (N = 22) who were presented with 8-band noise-vocoded speech to one ear and intact speech to the other ear. Neural responses were recorded in the electroencephalogram to obtain the spectrotemporal response function to speech. Listeners made more mistakes when answering questions about vocoded (vs. intact) attended speech. At the neural level, we asked how unilateral acoustic degradation would impact the attention-induced amplification of tracking target versus distracting speech. Interestingly, unilateral degradation did not per se reduce the attention-induced amplification but instead delayed it in time: Speech encoding accuracy, modelled on the basis of the spectrotemporal response function, was significantly enhanced for attended versus ignored intact speech at earlier neural response latencies (<∼250 ms). This attentional enhancement was not absent but delayed for vocoded speech. These findings suggest that attentional selection of unilateral, degraded speech is feasible but induces delayed neural separation of competing speech, which might explain listening challenges experienced by unilateral CI users.
Collapse
Affiliation(s)
- Frauke Kraus
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Sarah Tune
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Anna Ruhe
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | - Malte Wöstmann
- Department of Psychology, University of Lübeck, Lübeck, Germany
| |
Collapse
|
43
|
Wang L, Wu EX, Chen F. EEG-based auditory attention decoding using speech-level-based segmented computational models. J Neural Eng 2021; 18. [PMID: 33957606 DOI: 10.1088/1741-2552/abfeba] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
Objective.Auditory attention in complex scenarios can be decoded by electroencephalography (EEG)-based cortical speech-envelope tracking. The relative root-mean-square (RMS) intensity is a valuable cue for the decomposition of speech into distinct characteristic segments. To improve auditory attention decoding (AAD) performance, this work proposed a novel segmented AAD approach to decode target speech envelopes from different RMS-level-based speech segments.Approach.Speech was decomposed into higher- and lower-RMS-level speech segments with a threshold of -10 dB relative RMS level. A support vector machine classifier was designed to identify higher- and lower-RMS-level speech segments, using clean target and mixed speech as reference signals based on corresponding EEG signals recorded when subjects listened to target auditory streams in competing two-speaker auditory scenes. Segmented computational models were developed with the classification results of higher- and lower-RMS-level speech segments. Speech envelopes were reconstructed based on segmented decoding models for either higher- or lower-RMS-level speech segments. AAD accuracies were calculated according to the correlations between actual and reconstructed speech envelopes. The performance of the proposed segmented AAD computational model was compared to those of traditional AAD methods with unified decoding functions.Main results.Higher- and lower-RMS-level speech segments in continuous sentences could be identified robustly with classification accuracies that approximated or exceeded 80% based on corresponding EEG signals at 6 dB, 3 dB, 0 dB, -3 dB and -6 dB signal-to-mask ratios (SMRs). Compared with unified AAD decoding methods, the proposed segmented AAD approach achieved more accurate results in the reconstruction of target speech envelopes and in the detection of attentional directions. Moreover, the proposed segmented decoding method had higher information transfer rates (ITRs) and shorter minimum expected switch times compared with the unified decoder.Significance.This study revealed that EEG signals may be used to classify higher- and lower-RMS-level-based speech segments across a wide range of SMR conditions (from 6 dB to -6 dB). A novel finding was that the specific information in different RMS-level-based speech segments facilitated EEG-based decoding of auditory attention. The significantly improved AAD accuracies and ITRs of the segmented decoding method suggests that this proposed computational model may be an effective method for the application of neuro-controlled brain-computer interfaces in complex auditory scenes.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
44
|
Kegler M, Reichenbach T. Modelling the effects of transcranial alternating current stimulation on the neural encoding of speech in noise. Neuroimage 2020; 224:117427. [PMID: 33038540 DOI: 10.1016/j.neuroimage.2020.117427] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/11/2020] [Accepted: 10/01/2020] [Indexed: 11/29/2022] Open
Abstract
Transcranial alternating current stimulation (tACS) can non-invasively modulate neuronal activity in the cerebral cortex, in particular at the frequency of the applied stimulation. Such modulation can matter for speech processing, since the latter involves the tracking of slow amplitude fluctuations in speech by cortical activity. tACS with a current signal that follows the envelope of a speech stimulus has indeed been found to influence the cortical tracking and to modulate the comprehension of the speech in background noise. However, how exactly tACS influences the speech-related cortical activity, and how it causes the observed effects on speech comprehension, remains poorly understood. A computational model for cortical speech processing in a biophysically plausible spiking neural network has recently been proposed. Here we extended the model to investigate the effects of different types of stimulation waveforms, similar to those previously applied in experimental studies, on the processing of speech in noise. We assessed in particular how well speech could be decoded from the neural network activity when paired with the exogenous stimulation. We found that, in the absence of current stimulation, the speech-in-noise decoding accuracy was comparable to the comprehension of speech in background noise of human listeners. We further found that current stimulation could alter the speech decoding accuracy by a few percent, comparable to the effects of tACS on speech-in-noise comprehension. Our simulations further allowed us to identify the parameters for the stimulation waveforms that yielded the largest enhancement of speech-in-noise encoding. Our model thereby provides insight into the potential neural mechanisms by which weak alternating current stimulation may influence speech comprehension and allows to screen a large range of stimulation waveforms for their effect on speech processing.
Collapse
Affiliation(s)
- Mikolaj Kegler
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, SW7 2BU London, United Kingdom
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, SW7 2BU London, United Kingdom.
| |
Collapse
|