1
|
Sohoglu E, Beckers L, Davis MH. Convergent neural signatures of speech prediction error are a biological marker for spoken word recognition. Nat Commun 2024; 15:9984. [PMID: 39557848 PMCID: PMC11574182 DOI: 10.1038/s41467-024-53782-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 10/17/2024] [Indexed: 11/20/2024] Open
Abstract
We use MEG and fMRI to determine how predictions are combined with speech input in superior temporal cortex. We compare neural responses to words in which first syllables strongly or weakly predict second syllables (e.g., "bingo", "snigger" versus "tango", "meagre"). We further compare neural responses to the same second syllables when predictions mismatch with input during pseudoword perception (e.g., "snigo" and "meago"). Neural representations of second syllables are suppressed by strong predictions when predictions match sensory input but show the opposite effect when predictions mismatch. Computational simulations show that this interaction is consistent with prediction error but not alternative (sharpened signal) computations. Neural signatures of prediction error are observed 200 ms after second syllable onset and in early auditory regions (bilateral Heschl's gyrus and STG). These findings demonstrate prediction error computations during the identification of familiar spoken words and perception of unfamiliar pseudowords.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of Sussex, Brighton, UK.
| | - Loes Beckers
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
- Cochlear Ltd., Mechelen, Belgium
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
| |
Collapse
|
2
|
Robson H, Thomasson H, Upton E, Leff AP, Davis MH. The impact of speech rhythm and rate on comprehension in aphasia. Cortex 2024; 180:126-146. [PMID: 39427491 DOI: 10.1016/j.cortex.2024.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 07/10/2024] [Accepted: 09/01/2024] [Indexed: 10/22/2024]
Abstract
BACKGROUND Speech comprehension impairment in post-stroke aphasia is influenced by speech acoustics. This study investigated the impact of speech rhythm (syllabic isochrony) and rate on comprehension in people with aphasia (PWA). Rhythmical speech was hypothesised to support comprehension in PWA by reducing temporal variation, leading to enhanced speech tracking and more appropriate sampling of the speech stream. Speech rate was hypothesised to influence comprehension through auditory and linguistic processing time. METHODS One group of PWA (n = 19) and two groups of control participants (n = 10 and n = 18) performed a sentence-verification. Sentences were presented in two rhythm conditions (natural vs isochronous) and two rate conditions (typical, 3.6 Hz vs slow, 2.6 Hz) in a 2 × 2 factorial design. PWA and one group of controls performed the experiment with clear speech. The second group of controls performed the experiment with perceptually degraded speech. RESULTS D-prime analyses measured capacity to detect incongruent endings. Linear mixed effects models investigated the impact of group, rhythm, rate and clarity on d-prime scores. Control participants were negatively affected by isochronous rhythm in comparison to natural rhythm, likely due to alteration in linguistic cues. This negative impact remained or was exacerbated in control participants presented with degraded speech. In comparison, PWA were less affected by isochronous rhythm, despite producing d-prime scores matched to the degraded speech control group. Speech rate affected all groups, but only in interactions with rhythm, indicating that slow-rate isochronous speech was more comprehendible than typical-rate isochronous speech. CONCLUSIONS The comprehension network in PWA interacts differently with speech rhythm. Rhythmical speech may support acoustic speech tracking by enhancing predictability and ameliorate the detrimental impact of atypical rhythm on linguistic cues. Alternatively, reduced temporal prediction in aphasia may limit the impact of deviation from natural temporal structure. Reduction of speech rate below the typical range may not benefit comprehension in PWA.
Collapse
Affiliation(s)
- Holly Robson
- Language and Cognition, Psychology and Language Sciences, University College London, London, UK.
| | - Harriet Thomasson
- Language and Cognition, Psychology and Language Sciences, University College London, London, UK
| | - Emily Upton
- Language and Cognition, Psychology and Language Sciences, University College London, London, UK
| | - Alexander P Leff
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
3
|
Weissbart H, Martin AE. The structure and statistics of language jointly shape cross-frequency neural dynamics during spoken language comprehension. Nat Commun 2024; 15:8850. [PMID: 39397036 PMCID: PMC11471778 DOI: 10.1038/s41467-024-53128-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 09/30/2024] [Indexed: 10/15/2024] Open
Abstract
Humans excel at extracting structurally-determined meaning from speech despite inherent physical variability. This study explores the brain's ability to predict and understand spoken language robustly. It investigates the relationship between structural and statistical language knowledge in brain dynamics, focusing on phase and amplitude modulation. Using syntactic features from constituent hierarchies and surface statistics from a transformer model as predictors of forward encoding models, we reconstructed cross-frequency neural dynamics from MEG data during audiobook listening. Our findings challenge a strict separation of linguistic structure and statistics in the brain, with both aiding neural signal reconstruction. Syntactic features have a more temporally spread impact, and both word entropy and the number of closing syntactic constituents are linked to the phase-amplitude coupling of neural dynamics, implying a role in temporal prediction and cortical oscillation alignment during speech processing. Our results indicate that structured and statistical information jointly shape neural dynamics during spoken language comprehension and suggest an integration process via a cross-frequency coupling mechanism.
Collapse
Affiliation(s)
- Hugo Weissbart
- Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, The Netherlands.
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
| | - Andrea E Martin
- Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| |
Collapse
|
4
|
Mechtenberg H, Heffner CC, Myers EB, Guediche S. The Cerebellum Is Sensitive to the Lexical Properties of Words During Spoken Language Comprehension. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:757-773. [PMID: 39175786 PMCID: PMC11338305 DOI: 10.1162/nol_a_00126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 10/30/2023] [Indexed: 08/24/2024]
Abstract
Over the past few decades, research into the function of the cerebellum has expanded far beyond the motor domain. A growing number of studies are probing the role of specific cerebellar subregions, such as Crus I and Crus II, in higher-order cognitive functions including receptive language processing. In the current fMRI study, we show evidence for the cerebellum's sensitivity to variation in two well-studied psycholinguistic properties of words-lexical frequency and phonological neighborhood density-during passive, continuous listening of a podcast. To determine whether, and how, activity in the cerebellum correlates with these lexical properties, we modeled each word separately using an amplitude-modulated regressor, time-locked to the onset of each word. At the group level, significant effects of both lexical properties landed in expected cerebellar subregions: Crus I and Crus II. The BOLD signal correlated with variation in each lexical property, consistent with both language-specific and domain-general mechanisms. Activation patterns at the individual level also showed that effects of phonological neighborhood and lexical frequency landed in Crus I and Crus II as the most probable sites, though there was activation seen in other lobules (especially for frequency). Although the exact cerebellar mechanisms used during speech and language processing are not yet evident, these findings highlight the cerebellum's role in word-level processing during continuous listening.
Collapse
Affiliation(s)
- Hannah Mechtenberg
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - Christopher C. Heffner
- Department of Communicative Sciences and Disorders, University at Buffalo, Buffalo, NY, USA
| | - Emily B. Myers
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Department of Speech, Language and Hearing Sciences, University of Connecticut, Storrs, CT, USA
| | - Sara Guediche
- College of Science and Mathematics, Augusta University, Augusta, GA, USA
| |
Collapse
|
5
|
Zou D, Guo J. Parallel translation process in consecutive interpreting: Differences between beginning and advanced interpreting students. Acta Psychol (Amst) 2024; 248:104358. [PMID: 38878473 DOI: 10.1016/j.actpsy.2024.104358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 04/16/2024] [Accepted: 06/12/2024] [Indexed: 08/24/2024] Open
Abstract
This study examines the parallel translation process of 30 beginning and 30 advanced consecutive interpreting students (BISs and AISs). It investigates the characteristics and efficiency of parallel translation processing employed during the comprehension phase across proficiency levels. Quantitative and qualitative analyses reveal AISs demonstrated superior information reformulation capabilities, particularly for high-density content, along with more efficient note-taking and utilization compared to BISs. A key finding is that AISs interpreted more propositions than they recorded in their notes, reflecting stronger encoding and memory retention. In contrast, BISs relied heavily on external notation instead of internal memory retrieval, only interpreting a portion of their notes while failing to decipher the rest. These findings uncover complex yet differentiated parallel processing approaches across skill levels, with more successful comprehension-to-production strategies utilized by AISs. Enhancing students' metacognitive awareness regarding parallel processing techniques may improve information processing efficiency during source language comprehension, ultimately enhancing target language reformulation accuracy and completeness.
Collapse
Affiliation(s)
- Deyan Zou
- School of Advanced Translation and Interpretation, Dalian University of Foreign Languages, Dalian, China.
| | - Jiahao Guo
- School of Advanced Translation and Interpretation, Dalian University of Foreign Languages, Dalian, China
| |
Collapse
|
6
|
Pérez-Navarro J, Klimovich-Gray A, Lizarazu M, Piazza G, Molinaro N, Lallier M. Early language experience modulates the tradeoff between acoustic-temporal and lexico-semantic cortical tracking of speech. iScience 2024; 27:110247. [PMID: 39006483 PMCID: PMC11246002 DOI: 10.1016/j.isci.2024.110247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 03/14/2024] [Accepted: 06/07/2024] [Indexed: 07/16/2024] Open
Abstract
Cortical tracking of speech is relevant for the development of speech perception skills. However, no study to date has explored whether and how cortical tracking of speech is shaped by accumulated language experience, the central question of this study. In 35 bilingual children (6-year-old) with considerably bigger experience in one language, we collected electroencephalography data while they listened to continuous speech in their two languages. Cortical tracking of speech was assessed at acoustic-temporal and lexico-semantic levels. Children showed more robust acoustic-temporal tracking in the least experienced language, and more sensitive cortical tracking of semantic information in the most experienced language. Additionally, and only for the most experienced language, acoustic-temporal tracking was specifically linked to phonological abilities, and lexico-semantic tracking to vocabulary knowledge. Our results indicate that accumulated linguistic experience is a relevant maturational factor for the cortical tracking of speech at different levels during early language acquisition.
Collapse
Affiliation(s)
- Jose Pérez-Navarro
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
| | | | - Mikel Lizarazu
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
| | - Giorgio Piazza
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
| | - Nicola Molinaro
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
- Ikerbasque, Basque Foundation for Science, 48009 Bilbao, Spain
| | - Marie Lallier
- Basque Center on Cognition, Brain and Language (BCBL), 20009 Donostia-San Sebastian, Spain
| |
Collapse
|
7
|
Lavan N, Rinke P, Scharinger M. The time course of person perception from voices in the brain. Proc Natl Acad Sci U S A 2024; 121:e2318361121. [PMID: 38889147 PMCID: PMC11214051 DOI: 10.1073/pnas.2318361121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 04/26/2024] [Indexed: 06/20/2024] Open
Abstract
When listeners hear a voice, they rapidly form a complex first impression of who the person behind that voice might be. We characterize how these multivariate first impressions from voices emerge over time across different levels of abstraction using electroencephalography and representational similarity analysis. We find that for eight perceived physical (gender, age, and health), trait (attractiveness, dominance, and trustworthiness), and social characteristics (educatedness and professionalism), representations emerge early (~80 ms after stimulus onset), with voice acoustics contributing to those representations between ~100 ms and 400 ms. While impressions of person characteristics are highly correlated, we can find evidence for highly abstracted, independent representations of individual person characteristics. These abstracted representationse merge gradually over time. That is, representations of physical characteristics (age, gender) arise early (from ~120 ms), while representations of some trait and social characteristics emerge later (~360 ms onward). The findings align with recent theoretical models and shed light on the computations underpinning person perception from voices.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, LondonE1 4NS, United Kingdom
| | - Paula Rinke
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Marburg35037, Germany
| | - Mathias Scharinger
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Marburg35037, Germany
- Research Center “Deutscher Sprachatlas”, Philipps-University Marburg, Marburg35037, Germany
- Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Marburg35032, Germany
| |
Collapse
|
8
|
Gastaldon S, Bonfiglio N, Vespignani F, Peressotti F. Predictive language processing: integrating comprehension and production, and what atypical populations can tell us. Front Psychol 2024; 15:1369177. [PMID: 38836235 PMCID: PMC11148270 DOI: 10.3389/fpsyg.2024.1369177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 05/06/2024] [Indexed: 06/06/2024] Open
Abstract
Predictive processing, a crucial aspect of human cognition, is also relevant for language comprehension. In everyday situations, we exploit various sources of information to anticipate and therefore facilitate processing of upcoming linguistic input. In the literature, there are a variety of models that aim at accounting for such ability. One group of models propose a strict relationship between prediction and language production mechanisms. In this review, we first introduce very briefly the concept of predictive processing during language comprehension. Secondly, we focus on models that attribute a prominent role to language production and sensorimotor processing in language prediction ("prediction-by-production" models). Contextually, we provide a summary of studies that investigated the role of speech production and auditory perception on language comprehension/prediction tasks in healthy, typical participants. Then, we provide an overview of the limited existing literature on specific atypical/clinical populations that may represent suitable testing ground for such models-i.e., populations with impaired speech production and auditory perception mechanisms. Ultimately, we suggest a more widely and in-depth testing of prediction-by-production accounts, and the involvement of atypical populations both for model testing and as targets for possible novel speech/language treatment approaches.
Collapse
Affiliation(s)
- Simone Gastaldon
- Dipartimento di Psicologia dello Sviluppo e della Socializzazione, University of Padua, Padua, Italy
- Padova Neuroscience Center, University of Padua, Padua, Italy
| | - Noemi Bonfiglio
- Dipartimento di Psicologia dello Sviluppo e della Socializzazione, University of Padua, Padua, Italy
- BCBL-Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
| | - Francesco Vespignani
- Dipartimento di Psicologia dello Sviluppo e della Socializzazione, University of Padua, Padua, Italy
- Centro Interdipartimentale di Ricerca "I-APPROVE-International Auditory Processing Project in Venice", University of Padua, Padua, Italy
| | - Francesca Peressotti
- Dipartimento di Psicologia dello Sviluppo e della Socializzazione, University of Padua, Padua, Italy
- Padova Neuroscience Center, University of Padua, Padua, Italy
- Centro Interdipartimentale di Ricerca "I-APPROVE-International Auditory Processing Project in Venice", University of Padua, Padua, Italy
| |
Collapse
|
9
|
Fedorenko E, Ivanova AA, Regev TI. The language network as a natural kind within the broader landscape of the human brain. Nat Rev Neurosci 2024; 25:289-312. [PMID: 38609551 DOI: 10.1038/s41583-024-00802-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 04/14/2024]
Abstract
Language behaviour is complex, but neuroscientific evidence disentangles it into distinct components supported by dedicated brain areas or networks. In this Review, we describe the 'core' language network, which includes left-hemisphere frontal and temporal areas, and show that it is strongly interconnected, independent of input and output modalities, causally important for language and language-selective. We discuss evidence that this language network plausibly stores language knowledge and supports core linguistic computations related to accessing words and constructions from memory and combining them to interpret (decode) or generate (encode) linguistic messages. We emphasize that the language network works closely with, but is distinct from, both lower-level - perceptual and motor - mechanisms and higher-level systems of knowledge and reasoning. The perceptual and motor mechanisms process linguistic signals, but, in contrast to the language network, are sensitive only to these signals' surface properties, not their meanings; the systems of knowledge and reasoning (such as the system that supports social reasoning) are sometimes engaged during language use but are not language-selective. This Review lays a foundation both for in-depth investigations of these different components of the language processing pipeline and for probing inter-component interactions.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- The Program in Speech and Hearing in Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| | - Anna A Ivanova
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
10
|
Hosseini EA, Schrimpf M, Zhang Y, Bowman S, Zaslavsky N, Fedorenko E. Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:43-63. [PMID: 38645622 PMCID: PMC11025646 DOI: 10.1162/nol_a_00137] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 01/09/2024] [Indexed: 04/23/2024]
Abstract
Artificial neural networks have emerged as computationally plausible models of human language processing. A major criticism of these models is that the amount of training data they receive far exceeds that of humans during language learning. Here, we use two complementary approaches to ask how the models' ability to capture human fMRI responses to sentences is affected by the amount of training data. First, we evaluate GPT-2 models trained on 1 million, 10 million, 100 million, or 1 billion words against an fMRI benchmark. We consider the 100-million-word model to be developmentally plausible in terms of the amount of training data given that this amount is similar to what children are estimated to be exposed to during the first 10 years of life. Second, we test the performance of a GPT-2 model trained on a 9-billion-token dataset to reach state-of-the-art next-word prediction performance on the human benchmark at different stages during training. Across both approaches, we find that (i) the models trained on a developmentally plausible amount of data already achieve near-maximal performance in capturing fMRI responses to sentences. Further, (ii) lower perplexity-a measure of next-word prediction performance-is associated with stronger alignment with human data, suggesting that models that have received enough training to achieve sufficiently high next-word prediction performance also acquire representations of sentences that are predictive of human fMRI responses. In tandem, these findings establish that although some training is necessary for the models' predictive ability, a developmentally realistic amount of training (∼100 million words) may suffice.
Collapse
Affiliation(s)
- Eghbal A. Hosseini
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Martin Schrimpf
- The MIT Quest for Intelligence Initiative, Cambridge, MA, USA
- Swiss Federal Institute of Technology, Lausanne, Switzerland
| | - Yian Zhang
- Computer Science Department, Stanford University, Stanford, CA, USA
| | - Samuel Bowman
- Center for Data Science, New York University, New York, NY, USA
- Department of Linguistics, New York University, New York, NY, USA
- Department of Computer Science, New York University, New York, NY, USA
| | - Noga Zaslavsky
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Language Science, University of California, Irvine, CA, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- The MIT Quest for Intelligence Initiative, Cambridge, MA, USA
- Speech and Hearing Bioscience and Technology Program, Harvard University, Boston, MA, USA
| |
Collapse
|
11
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
12
|
Tolkacheva V, Brownsett SLE, McMahon KL, de Zubicaray GI. Perceiving and misperceiving speech: lexical and sublexical processing in the superior temporal lobes. Cereb Cortex 2024; 34:bhae087. [PMID: 38494418 PMCID: PMC10944697 DOI: 10.1093/cercor/bhae087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/19/2024] Open
Abstract
Listeners can use prior knowledge to predict the content of noisy speech signals, enhancing perception. However, this process can also elicit misperceptions. For the first time, we employed a prime-probe paradigm and transcranial magnetic stimulation to investigate causal roles for the left and right posterior superior temporal gyri (pSTG) in the perception and misperception of degraded speech. Listeners were presented with spectrotemporally degraded probe sentences preceded by a clear prime. To produce misperceptions, we created partially mismatched pseudo-sentence probes via homophonic nonword transformations (e.g. The little girl was excited to lose her first tooth-Tha fittle girmn wam expited du roos har derst cooth). Compared to a control site (vertex), inhibitory stimulation of the left pSTG selectively disrupted priming of real but not pseudo-sentences. Conversely, inhibitory stimulation of the right pSTG enhanced priming of misperceptions with pseudo-sentences, but did not influence perception of real sentences. These results indicate qualitatively different causal roles for the left and right pSTG in perceiving degraded speech, supporting bilateral models that propose engagement of the right pSTG in sublexical processing.
Collapse
Affiliation(s)
- Valeriya Tolkacheva
- Queensland University of Technology, School of Psychology and Counselling, O Block, Kelvin Grove, Queensland, 4059, Australia
| | - Sonia L E Brownsett
- Queensland Aphasia Research Centre, School of Health and Rehabilitation Sciences, University of Queensland, Surgical Treatment and Rehabilitation Services, Herston, Queensland, 4006, Australia
- Centre of Research Excellence in Aphasia Recovery and Rehabilitation, La Trobe University, Melbourne, Health Sciences Building 1, 1 Kingsbury Drive, Bundoora, Victoria, 3086, Australia
| | - Katie L McMahon
- Herston Imaging Research Facility, Royal Brisbane & Women’s Hospital, Building 71/918, Royal Brisbane & Women’s Hospital, Herston, Queensland, 4006, Australia
- Queensland University of Technology, School of Clinical Sciences and Centre for Biomedical Technologies, 60 Musk Avenue, Kelvin Grove, Queensland, 4059, Australia
| | - Greig I de Zubicaray
- Queensland University of Technology, School of Psychology and Counselling, O Block, Kelvin Grove, Queensland, 4059, Australia
| |
Collapse
|
13
|
Hjortdal A, Frid J, Novén M, Roll M. Swift Prosodic Modulation of Lexical Access: Brain Potentials From Three North Germanic Language Varieties. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:400-414. [PMID: 38306498 DOI: 10.1044/2023_jslhr-23-00193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2024]
Abstract
PURPOSE According to most models of spoken word recognition, listeners probabilistically activate a set of lexical candidates, which is incrementally updated as the speech signal unfolds. Speech carries segmental (speech sound) as well as suprasegmental (prosodic) information. The role of the latter in spoken word recognition is less clear. We investigated how suprasegments (tone and voice quality) in three North Germanic language varieties affected lexical access by scrutinizing temporally fine-grained neurophysiological effects of lexical uncertainty and information gain. METHOD Three event-related potential (ERP) studies were reanalyzed. In all varieties investigated, suprasegments are associated with specific word endings. Swedish has two lexical "word accents" realized as pitch falls with different timings across dialects. In Danish, the distinction is in voice quality. We combined pronunciation lexica and frequency lists to calculate estimates of lexical uncertainty about an unfolding word and information gain upon hearing a suprasegmental cue and the segment upon which it manifests. We used single-trial mixed-effects regression models run every 4 ms. RESULTS Only lexical uncertainty showed solid results: a frontal effect at 150-400 ms after suprasegmental cue onset and a later posterior effect after 200 ms. While a model including only segmental information mostly performed better, it was outperformed by the suprasegmental model at 200-330 ms at frontal sites. CONCLUSIONS The study points to suprasegmental cues contributing to lexical access over and beyond segments after around 200 ms in the North Germanic varieties investigated. Furthermore, the findings indicate that a previously reported "pre-activation negativity" predominantly reflects forward-looking processing. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.25016486.
Collapse
Affiliation(s)
- Anna Hjortdal
- Centre for Languages and Literature, Lund University, Sweden
| | - Johan Frid
- Lund University Humanities Lab, Lund University, Sweden
| | - Mikael Novén
- Department of Nutrition, Exercise and Sports, University of Copenhagen, Denmark
| | - Mikael Roll
- Centre for Languages and Literature, Lund University, Sweden
| |
Collapse
|
14
|
Shan T, Cappelloni MS, Maddox RK. Subcortical responses to music and speech are alike while cortical responses diverge. Sci Rep 2024; 14:789. [PMID: 38191488 PMCID: PMC10774448 DOI: 10.1038/s41598-023-50438-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 12/20/2023] [Indexed: 01/10/2024] Open
Abstract
Music and speech are encountered daily and are unique to human beings. Both are transformed by the auditory pathway from an initial acoustical encoding to higher level cognition. Studies of cortex have revealed distinct brain responses to music and speech, but differences may emerge in the cortex or may be inherited from different subcortical encoding. In the first part of this study, we derived the human auditory brainstem response (ABR), a measure of subcortical encoding, to recorded music and speech using two analysis methods. The first method, described previously and acoustically based, yielded very different ABRs between the two sound classes. The second method, however, developed here and based on a physiological model of the auditory periphery, gave highly correlated responses to music and speech. We determined the superiority of the second method through several metrics, suggesting there is no appreciable impact of stimulus class (i.e., music vs speech) on the way stimulus acoustics are encoded subcortically. In this study's second part, we considered the cortex. Our new analysis method resulted in cortical music and speech responses becoming more similar but with remaining differences. The subcortical and cortical results taken together suggest that there is evidence for stimulus-class dependent processing of music and speech at the cortical but not subcortical level.
Collapse
Affiliation(s)
- Tong Shan
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Madeline S Cappelloni
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Ross K Maddox
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA.
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA.
- Center for Visual Science, University of Rochester, Rochester, NY, USA.
- Department of Neuroscience, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
15
|
Brodbeck C, Kandylaki KD, Scharenborg O. Neural Representations of Non-native Speech Reflect Proficiency and Interference from Native Language Knowledge. J Neurosci 2024; 44:e0666232023. [PMID: 37963763 PMCID: PMC10851685 DOI: 10.1523/jneurosci.0666-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/23/2023] [Accepted: 08/01/2023] [Indexed: 11/16/2023] Open
Abstract
Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from representations of their native language. However, much of the evidence for such interference comes from tasks that may inadvertently increase the salience of native language competitors. Here we tested for neural evidence of proficiency and native language interference in a naturalistic story listening task. We studied electroencephalography responses of 39 native speakers of Dutch (14 male) to an English short story, spoken by a native speaker of either American English or Dutch. We modeled brain responses with multivariate temporal response functions, using acoustic and language models. We found evidence for activation of Dutch language statistics when listening to English, but only when it was spoken with a Dutch accent. This suggests that a naturalistic, monolingual setting decreases the interference from native language representations, whereas an accent in the listener's own native language may increase native language interference, by increasing the salience of the native language and activating native language phonetic and lexical representations. Brain responses suggest that such interference stems from words from the native language competing with the foreign language in a single word recognition system, rather than being activated in a parallel lexicon. We further found that secondary acoustic representations of speech (after 200 ms latency) decreased with increasing proficiency. This may reflect improved acoustic-phonetic models in more proficient listeners.Significance Statement Behavioral experiments suggest that native language knowledge interferes with foreign language listening, but such effects may be sensitive to task manipulations, as tasks that increase metalinguistic awareness may also increase native language interference. This highlights the need for studying non-native speech processing using naturalistic tasks. We measured neural responses unobtrusively while participants listened for comprehension and characterized the influence of proficiency at multiple levels of representation. We found that salience of the native language, as manipulated through speaker accent, affected activation of native language representations: significant evidence for activation of native language (Dutch) categories was only obtained when the speaker had a Dutch accent, whereas no significant interference was found to a speaker with a native (American) accent.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut 06269
| | - Katerina Danae Kandylaki
- Department of Neuropsychology and Psychopharmacology, Maastricht University, 6200 MD, Maastricht, The Netherlands
| | - Odette Scharenborg
- Multimedia Computing Group, Delft University of Technology, 2628 XE, Delft, The Netherlands
| |
Collapse
|
16
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
17
|
Brodbeck C, Das P, Gillis M, Kulasingham JP, Bhattasali S, Gaston P, Resnik P, Simon JZ. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife 2023; 12:e85012. [PMID: 38018501 PMCID: PMC10783870 DOI: 10.7554/elife.85012] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 11/24/2023] [Indexed: 11/30/2023] Open
Abstract
Even though human experience unfolds continuously in time, it is not strictly linear; instead, it entails cascading processes building hierarchical cognitive structures. For instance, during speech perception, humans transform a continuously varying acoustic signal into phonemes, words, and meaning, and these levels all have distinct but interdependent temporal structures. Time-lagged regression using temporal response functions (TRFs) has recently emerged as a promising tool for disentangling electrophysiological brain responses related to such complex models of perception. Here, we introduce the Eelbrain Python toolkit, which makes this kind of analysis easy and accessible. We demonstrate its use, using continuous speech as a sample paradigm, with a freely available EEG dataset of audiobook listening. A companion GitHub repository provides the complete source code for the analysis, from raw data to group-level statistics. More generally, we advocate a hypothesis-driven approach in which the experimenter specifies a hierarchy of time-continuous representations that are hypothesized to have contributed to brain responses, and uses those as predictor variables for the electrophysiological signal. This is analogous to a multiple regression problem, but with the addition of a time dimension. TRF analysis decomposes the brain signal into distinct responses associated with the different predictor variables by estimating a multivariate TRF (mTRF), quantifying the influence of each predictor on brain responses as a function of time(-lags). This allows asking two questions about the predictor variables: (1) Is there a significant neural representation corresponding to this predictor variable? And if so, (2) what are the temporal characteristics of the neural response associated with it? Thus, different predictor variables can be systematically combined and evaluated to jointly model neural processing at multiple hierarchical levels. We discuss applications of this approach, including the potential for linking algorithmic/representational theories at different cognitive levels to brain responses through computational models with appropriate linking hypotheses.
Collapse
Affiliation(s)
| | - Proloy Das
- Stanford UniversityStanfordUnited States
| | | | | | | | | | - Philip Resnik
- University of Maryland, College ParkCollege ParkUnited States
| | | |
Collapse
|
18
|
Inbar M, Genzer S, Perry A, Grossman E, Landau AN. Intonation Units in Spontaneous Speech Evoke a Neural Response. J Neurosci 2023; 43:8189-8200. [PMID: 37793909 PMCID: PMC10697392 DOI: 10.1523/jneurosci.0235-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/16/2023] [Accepted: 08/29/2023] [Indexed: 10/06/2023] Open
Abstract
Spontaneous speech is produced in chunks called intonation units (IUs). IUs are defined by a set of prosodic cues and presumably occur in all human languages. Recent work has shown that across different grammatical and sociocultural conditions IUs form rhythms of ∼1 unit per second. Linguistic theory suggests that IUs pace the flow of information in the discourse. As a result, IUs provide a promising and hitherto unexplored theoretical framework for studying the neural mechanisms of communication. In this article, we identify a neural response unique to the boundary defined by the IU. We measured the EEG of human participants (of either sex), who listened to different speakers recounting an emotional life event. We analyzed the speech stimuli linguistically and modeled the EEG response at word offset using a GLM approach. We find that the EEG response to IU-final words differs from the response to IU-nonfinal words even when equating acoustic boundary strength. Finally, we relate our findings to the body of research on rhythmic brain mechanisms in speech processing. We study the unique contribution of IUs and acoustic boundary strength in predicting delta-band EEG. This analysis suggests that IU-related neural activity, which is tightly linked to the classic Closure Positive Shift (CPS), could be a time-locked component that captures the previously characterized delta-band neural speech tracking.SIGNIFICANCE STATEMENT Linguistic communication is central to human experience, and its neural underpinnings are a topic of much research in recent years. Neuroscientific research has benefited from studying human behavior in naturalistic settings, an endeavor that requires explicit models of complex behavior. Usage-based linguistic theory suggests that spoken language is prosodically structured in intonation units. We reveal that the neural system is attuned to intonation units by explicitly modeling their impact on the EEG response beyond mere acoustics. To our understanding, this is the first time this is demonstrated in spontaneous speech under naturalistic conditions and under a theoretical framework that connects the prosodic chunking of speech, on the one hand, with the flow of information during communication, on the other.
Collapse
Affiliation(s)
- Maya Inbar
- Department of Linguistics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Shir Genzer
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Anat Perry
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Eitan Grossman
- Department of Linguistics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Ayelet N Landau
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| |
Collapse
|
19
|
Zhang X, Li J, Li Z, Hong B, Diao T, Ma X, Nolte G, Engel AK, Zhang D. Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension. Neuroimage 2023; 282:120404. [PMID: 37806465 DOI: 10.1016/j.neuroimage.2023.120404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/19/2023] [Accepted: 10/05/2023] [Indexed: 10/10/2023] Open
Abstract
Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.
Collapse
Affiliation(s)
- Xinmiao Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Jiawei Li
- Department of Education and Psychology, Freie Universität Berlin, Berlin 14195, Federal Republic of Germany
| | - Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China; Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Tongxiang Diao
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Xin Ma
- Department of Otolaryngology, Head and Neck Surgery, Peking University, People's Hospital, Beijing 100044, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Federal Republic of Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China; Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
20
|
Li J, Hong B, Nolte G, Engel AK, Zhang D. EEG-based speaker-listener neural coupling reflects speech-selective attentional mechanisms beyond the speech stimulus. Cereb Cortex 2023; 33:11080-11091. [PMID: 37814353 DOI: 10.1093/cercor/bhad347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/01/2023] [Accepted: 09/04/2023] [Indexed: 10/11/2023] Open
Abstract
When we pay attention to someone, do we focus only on the sound they make, the word they use, or do we form a mental space shared with the speaker we want to pay attention to? Some would argue that the human language is no other than a simple signal, but others claim that human beings understand each other because they form a shared mental ground between the speaker and the listener. Our study aimed to explore the neural mechanisms of speech-selective attention by investigating the electroencephalogram-based neural coupling between the speaker and the listener in a cocktail party paradigm. The temporal response function method was employed to reveal how the listener was coupled to the speaker at the neural level. The results showed that the neural coupling between the listener and the attended speaker peaked 5 s before speech onset at the delta band over the left frontal region, and was correlated with speech comprehension performance. In contrast, the attentional processing of speech acoustics and semantics occurred primarily at a later stage after speech onset and was not significantly correlated with comprehension performance. These findings suggest a predictive mechanism to achieve speaker-listener neural coupling for successful speech comprehension.
Collapse
Affiliation(s)
- Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee, Berlin 14195, Germany
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| |
Collapse
|
21
|
Michaelov JA, Bergen BK. Ignoring the alternatives: The N400 is sensitive to stimulus preactivation alone. Cortex 2023; 168:82-101. [PMID: 37678069 DOI: 10.1016/j.cortex.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 05/12/2023] [Accepted: 08/03/2023] [Indexed: 09/09/2023]
Abstract
The N400 component of the event-related brain potential is a neural signal of processing difficulty. In the language domain, it is widely believed to be sensitive to the degree to which a given word or its semantic features have been preactivated in the brain based on the preceding context. However, it has also been shown that the brain often preactivates many words in parallel. It is currently unknown whether the N400 is also affected by the preactivations of alternative words other than the stimulus that is actually presented. This leaves a weak link in the derivation chain-how can we use the N400 to understand the mechanisms of preactivation if we do not know what it indexes? This study directly addresses this gap. We estimate the extent to which all words in a lexicon are preactivated in a given context using the predictions of contemporary large language models. We then directly compare two competing possibilities: that the amplitude of the N400 is sensitive only to the extent to which the stimulus is preactivated, and that it is also sensitive to the preactivation states of the alternatives. We find evidence of the former. This result allows for better grounded inferences about the mechanisms underlying the N400, lexical preactivation in the brain, and language processing more generally.
Collapse
Affiliation(s)
- James A Michaelov
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, USA.
| | - Benjamin K Bergen
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
22
|
Goldman E, Bou-Dargham S, Lai M, Guda A, Fallon J, Hauptman M, Reinoso A, Phillips S, Abrams E, Parrish A, Pylkkänen L. MEG correlates of speech planning in simple vs. interactive picture naming in children and adults. PLoS One 2023; 18:e0292316. [PMID: 37847686 PMCID: PMC10581494 DOI: 10.1371/journal.pone.0292316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 09/18/2023] [Indexed: 10/19/2023] Open
Abstract
The picture naming task is common both as a clinical task and as a method to study the neural bases of speech production in the healthy brain. However, this task is not reflective of most naturally occurring productions, which tend to happen within a context, typically in dialogue in response to someone else's production. How the brain basis of the classic "confrontation picture naming" task compares to the planning of utterances in dialogue is not known. Here we used magnetoencephalography (MEG) to measure neural activity associated with language production using the classic picture naming task as well as a minimal variant of the task, intended as more interactive or dialogue-like. We assessed how neural activity is affected by the interactive context in children, teenagers, and adults. The general pattern was that in adults, the interactive task elicited a robust sustained increase of activity in frontal and temporal cortices bilaterally, as compared to simple picture naming. This increase was present only in the left hemisphere in teenagers and was absent in children, who, in fact, showed the reverse effect. Thus our findings suggest a robustly bilateral neural basis for the coordination of interaction and a very slow developmental timeline for this network.
Collapse
Affiliation(s)
- Ebony Goldman
- Department of Psychology, New York University, New York, NY, United States of America
| | | | - Marco Lai
- Department of Psychology, New York University, New York, NY, United States of America
| | - Anvita Guda
- Department of Linguistics, New York University, New York, NY, United States of America
| | - Jacqui Fallon
- Department of Psychology, New York University, New York, NY, United States of America
| | - Miriam Hauptman
- Department of Psychology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Alejandra Reinoso
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, United States of America
| | - Sarah Phillips
- Department of Linguistics, New York University, New York, NY, United States of America
- Center for Brain Plasticity and Recovery, Georgetown University, Washington, DC, United States of America
| | - Ellie Abrams
- Department of Psychology, New York University, New York, NY, United States of America
| | - Alicia Parrish
- Department of Linguistics, New York University, New York, NY, United States of America
| | - Liina Pylkkänen
- Department of Psychology, New York University, New York, NY, United States of America
- NYUAD Research Institute, New York University Abu Dhabi, Abu Dhabi, UAE
- Department of Linguistics, New York University, New York, NY, United States of America
| |
Collapse
|
23
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
24
|
Heilbron M, van Haren J, Hagoort P, de Lange FP. Lexical Processing Strongly Affects Reading Times But Not Skipping During Natural Reading. Open Mind (Camb) 2023; 7:757-783. [PMID: 37840763 PMCID: PMC10575561 DOI: 10.1162/opmi_a_00099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 07/27/2023] [Indexed: 10/17/2023] Open
Abstract
In a typical text, readers look much longer at some words than at others, even skipping many altogether. Historically, researchers explained this variation via low-level visual or oculomotor factors, but today it is primarily explained via factors determining a word's lexical processing ease, such as how well word identity can be predicted from context or discerned from parafoveal preview. While the existence of these effects is well established in controlled experiments, the relative importance of prediction, preview and low-level factors in natural reading remains unclear. Here, we address this question in three large naturalistic reading corpora (n = 104, 1.5 million words), using deep neural networks and Bayesian ideal observers to model linguistic prediction and parafoveal preview from moment to moment in natural reading. Strikingly, neither prediction nor preview was important for explaining word skipping-the vast majority of explained variation was explained by a simple oculomotor model, using just fixation position and word length. For reading times, by contrast, we found strong but independent contributions of prediction and preview, with effect sizes matching those from controlled experiments. Together, these results challenge dominant models of eye movements in reading, and instead support alternative models that describe skipping (but not reading times) as largely autonomous from word identification, and mostly determined by low-level oculomotor information.
Collapse
Affiliation(s)
- Micha Heilbron
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- University of Amsterdam, Amsterdam, The Netherlands
| | - Jorie van Haren
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Peter Hagoort
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Floris P. de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
25
|
Li KE, Dimitrijevic A, Gordon KA, Pang EW, Greiner HM, Kadis DS. Age-related increases in right hemisphere support for prosodic processing in children. Sci Rep 2023; 13:15849. [PMID: 37740012 PMCID: PMC10516972 DOI: 10.1038/s41598-023-43027-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023] Open
Abstract
Language comprehension is a complex process involving an extensive brain network. Brain regions responsible for prosodic processing have been studied in adults; however, much less is known about the neural bases of prosodic processing in children. Using magnetoencephalography (MEG), we mapped regions supporting speech envelope tracking (a marker of prosodic processing) in 80 typically developing children, ages 4-18 years, completing a stories listening paradigm. Neuromagnetic signals coherent with the speech envelope were localized using dynamic imaging of coherent sources (DICS). Across the group, we observed coherence in bilateral perisylvian cortex. We observed age-related increases in coherence to the speech envelope in the right superior temporal gyrus (r = 0.31, df = 78, p = 0.0047) and primary auditory cortex (r = 0.27, df = 78, p = 0.016); age-related decreases in coherence to the speech envelope were observed in the left superior temporal gyrus (r = - 0.25, df = 78, p = 0.026). This pattern may indicate a refinement of the networks responsible for prosodic processing during development, where language areas in the right hemisphere become increasingly specialized for prosodic processing. Altogether, these results reveal a distinct neurodevelopmental trajectory for the processing of prosodic cues, highlighting the presence of supportive language functions in the right hemisphere. Findings from this dataset of typically developing children may serve as a potential reference timeline for assessing children with neurodevelopmental hearing and speech disorders.
Collapse
Affiliation(s)
- Kristen E Li
- Department of Physiology, University of Toronto, Toronto, ON, Canada
- Neurosciences and Mental Health, Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada
| | - Andrew Dimitrijevic
- Department of Physiology, University of Toronto, Toronto, ON, Canada
- Department of Otolaryngology, Sunnybrook Health Sciences Centre, Toronto, ON, Canada
- Department of Otolaryngology, University of Toronto, Toronto, ON, Canada
| | - Karen A Gordon
- Neurosciences and Mental Health, Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada
- Department of Otolaryngology, University of Toronto, Toronto, ON, Canada
| | - Elizabeth W Pang
- Neurosciences and Mental Health, Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada
- Division of Neurology, Hospital for Sick Children, Toronto, ON, Canada
| | - Hansel M Greiner
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Darren S Kadis
- Department of Physiology, University of Toronto, Toronto, ON, Canada.
- Neurosciences and Mental Health, Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.
| |
Collapse
|
26
|
Liang B, Li Y, Zhao W, Du Y. Bilateral human laryngeal motor cortex in perceptual decision of lexical tone and voicing of consonant. Nat Commun 2023; 14:4710. [PMID: 37543659 PMCID: PMC10404239 DOI: 10.1038/s41467-023-40445-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 07/27/2023] [Indexed: 08/07/2023] Open
Abstract
Speech perception is believed to recruit the left motor cortex. However, the exact role of the laryngeal subregion and its right counterpart in speech perception, as well as their temporal patterns of involvement remain unclear. To address these questions, we conducted a hypothesis-driven study, utilizing transcranial magnetic stimulation on the left or right dorsal laryngeal motor cortex (dLMC) when participants performed perceptual decision on Mandarin lexical tone or consonant (voicing contrast) presented with or without noise. We used psychometric function and hierarchical drift-diffusion model to disentangle perceptual sensitivity and dynamic decision-making parameters. Results showed that bilateral dLMCs were engaged with effector specificity, and this engagement was left-lateralized with right upregulation in noise. Furthermore, the dLMC contributed to various decision stages depending on the hemisphere and task difficulty. These findings substantially advance our understanding of the hemispherical lateralization and temporal dynamics of bilateral dLMC in sensorimotor integration during speech perceptual decision-making.
Collapse
Affiliation(s)
- Baishen Liang
- Institute of Psychology, CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, 100101, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yanchang Li
- Institute of Psychology, CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, 100101, China
| | - Wanying Zhao
- Institute of Psychology, CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yi Du
- Institute of Psychology, CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, 100101, China.
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, 200031, China.
- Chinese Institute for Brain Research, Beijing, 102206, China.
| |
Collapse
|
27
|
Tezcan F, Weissbart H, Martin AE. A tradeoff between acoustic and linguistic feature encoding in spoken language comprehension. eLife 2023; 12:e82386. [PMID: 37417736 PMCID: PMC10328533 DOI: 10.7554/elife.82386] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 06/18/2023] [Indexed: 07/08/2023] Open
Abstract
When we comprehend language from speech, the phase of the neural response aligns with particular features of the speech input, resulting in a phenomenon referred to as neural tracking. In recent years, a large body of work has demonstrated the tracking of the acoustic envelope and abstract linguistic units at the phoneme and word levels, and beyond. However, the degree to which speech tracking is driven by acoustic edges of the signal, or by internally-generated linguistic units, or by the interplay of both, remains contentious. In this study, we used naturalistic story-listening to investigate (1) whether phoneme-level features are tracked over and above acoustic edges, (2) whether word entropy, which can reflect sentence- and discourse-level constraints, impacted the encoding of acoustic and phoneme-level features, and (3) whether the tracking of acoustic edges was enhanced or suppressed during comprehension of a first language (Dutch) compared to a statistically familiar but uncomprehended language (French). We first show that encoding models with phoneme-level linguistic features, in addition to acoustic features, uncovered an increased neural tracking response; this signal was further amplified in a comprehended language, putatively reflecting the transformation of acoustic features into internally generated phoneme-level representations. Phonemes were tracked more strongly in a comprehended language, suggesting that language comprehension functions as a neural filter over acoustic edges of the speech signal as it transforms sensory signals into abstract linguistic units. We then show that word entropy enhances neural tracking of both acoustic and phonemic features when sentence- and discourse-context are less constraining. When language was not comprehended, acoustic features, but not phonemic ones, were more strongly modulated, but in contrast, when a native language is comprehended, phoneme features are more strongly modulated. Taken together, our findings highlight the flexible modulation of acoustic, and phonemic features by sentence and discourse-level constraint in language comprehension, and document the neural transformation from speech perception to language comprehension, consistent with an account of language processing as a neural filter from sensory to abstract representations.
Collapse
Affiliation(s)
- Filiz Tezcan
- Language and Computation in Neural Systems Group, Max Planck Institute for PsycholinguisticsNijmegenNetherlands
| | - Hugo Weissbart
- Donders Centre for Cognitive Neuroimaging, Radboud UniversityNijmegenNetherlands
| | - Andrea E Martin
- Language and Computation in Neural Systems Group, Max Planck Institute for PsycholinguisticsNijmegenNetherlands
- Donders Centre for Cognitive Neuroimaging, Radboud UniversityNijmegenNetherlands
| |
Collapse
|
28
|
Lindboom E, Nidiffer A, Carney LH, Lalor EC. Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. Hear Res 2023; 433:108767. [PMID: 37060895 PMCID: PMC10559335 DOI: 10.1016/j.heares.2023.108767] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/29/2023] [Accepted: 04/09/2023] [Indexed: 04/17/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyzes resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.
Collapse
Affiliation(s)
- Elsa Lindboom
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Aaron Nidiffer
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA.
| | - Edmund C Lalor
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| |
Collapse
|
29
|
Xie Z, Brodbeck C, Chandrasekaran B. Cortical Tracking of Continuous Speech Under Bimodal Divided Attention. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:318-343. [PMID: 37229509 PMCID: PMC10205152 DOI: 10.1162/nol_a_00100] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/11/2023] [Indexed: 05/27/2023]
Abstract
Speech processing often occurs amid competing inputs from other modalities, for example, listening to the radio while driving. We examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) impacts neural processing of natural continuous speech from acoustic to linguistic levels of representation. We recorded electroencephalographic (EEG) responses when human participants performed a challenging primary visual task, imposing low or high cognitive load while listening to audiobook stories as a secondary task. The two dual-task conditions were contrasted with an auditory single-task condition in which participants attended to stories while ignoring visual stimuli. Behaviorally, the high load dual-task condition was associated with lower speech comprehension accuracy relative to the other two conditions. We fitted multivariate temporal response function encoding models to predict EEG responses from acoustic and linguistic speech features at different representation levels, including auditory spectrograms and information-theoretic models of sublexical-, word-form-, and sentence-level representations. Neural tracking of most acoustic and linguistic features remained unchanged with increasing dual-task load, despite unambiguous behavioral and neural evidence of the high load dual-task condition being more demanding. Compared to the auditory single-task condition, dual-task conditions selectively reduced neural tracking of only some acoustic and linguistic features, mainly at latencies >200 ms, while earlier latencies were surprisingly unaffected. These findings indicate that behavioral effects of bimodal divided attention on continuous speech processing occur not because of impaired early sensory representations but likely at later cognitive processing stages. Crossmodal attention-related mechanisms may not be uniform across different speech processing levels.
Collapse
Affiliation(s)
- Zilong Xie
- School of Communication Science and Disorders, Florida State University, Tallahassee, FL, USA
| | - Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
30
|
Gaston P, Brodbeck C, Phillips C, Lau E. Auditory Word Comprehension Is Less Incremental in Isolated Words. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:29-52. [PMID: 37229141 PMCID: PMC10205071 DOI: 10.1162/nol_a_00084] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 09/26/2022] [Indexed: 05/27/2023]
Abstract
Partial speech input is often understood to trigger rapid and automatic activation of successively higher-level representations of words, from sound to meaning. Here we show evidence from magnetoencephalography that this type of incremental processing is limited when words are heard in isolation as compared to continuous speech. This suggests a less unified and automatic word recognition process than is often assumed. We present evidence from isolated words that neural effects of phoneme probability, quantified by phoneme surprisal, are significantly stronger than (statistically null) effects of phoneme-by-phoneme lexical uncertainty, quantified by cohort entropy. In contrast, we find robust effects of both cohort entropy and phoneme surprisal during perception of connected speech, with a significant interaction between the contexts. This dissociation rules out models of word recognition in which phoneme surprisal and cohort entropy are common indicators of a uniform process, even though these closely related information-theoretic measures both arise from the probability distribution of wordforms consistent with the input. We propose that phoneme surprisal effects reflect automatic access of a lower level of representation of the auditory input (e.g., wordforms) while the occurrence of cohort entropy effects is task sensitive, driven by a competition process or a higher-level representation that is engaged late (or not at all) during the processing of single words.
Collapse
Affiliation(s)
- Phoebe Gaston
- Department of Linguistics, University of Maryland, College Park, MD, USA
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - Christian Brodbeck
- Institute for Systems Research, University of Maryland, College Park, MD, USA
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - Colin Phillips
- Department of Linguistics, University of Maryland, College Park, MD, USA
| | - Ellen Lau
- Department of Linguistics, University of Maryland, College Park, MD, USA
| |
Collapse
|
31
|
Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.02.522438. [PMID: 36711934 PMCID: PMC9881851 DOI: 10.1101/2023.01.02.522438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyses resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.x.
Collapse
|
32
|
Youssofzadeh V, Conant L, Stout J, Ustine C, Humphries C, Gross WL, Shah-Basak P, Mathis J, Awe E, Allen L, DeYoe EA, Carlson C, Anderson CT, Maganti R, Hermann B, Nair VA, Prabhakaran V, Meyerand B, Binder JR, Raghavan M. Late dominance of the right hemisphere during narrative comprehension. Neuroimage 2022; 264:119749. [PMID: 36379420 PMCID: PMC9772156 DOI: 10.1016/j.neuroimage.2022.119749] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 10/12/2022] [Accepted: 11/11/2022] [Indexed: 11/15/2022] Open
Abstract
PET and fMRI studies suggest that auditory narrative comprehension is supported by a bilateral multilobar cortical network. The superior temporal resolution of magnetoencephalography (MEG) makes it an attractive tool to investigate the dynamics of how different neuroanatomic substrates engage during narrative comprehension. Using beta-band power changes as a marker of cortical engagement, we studied MEG responses during an auditory story comprehension task in 31 healthy adults. The protocol consisted of two runs, each interleaving 7 blocks of the story comprehension task with 15 blocks of an auditorily presented math task as a control for phonological processing, working memory, and attention processes. Sources at the cortical surface were estimated with a frequency-resolved beamformer. Beta-band power was estimated in the frequency range of 16-24 Hz over 1-sec epochs starting from 400 msec after stimulus onset until the end of a story or math problem presentation. These power estimates were compared to 1-second epochs of data before the stimulus block onset. The task-related cortical engagement was inferred from beta-band power decrements. Group-level source activations were statistically compared using non-parametric permutation testing. A story-math contrast of beta-band power changes showed greater bilateral cortical engagement within the fusiform gyrus, inferior and middle temporal gyri, parahippocampal gyrus, and left inferior frontal gyrus (IFG) during story comprehension. A math-story contrast of beta power decrements showed greater bilateral but left-lateralized engagement of the middle frontal gyrus and superior parietal lobule. The evolution of cortical engagement during five temporal windows across the presentation of stories showed significant involvement during the first interval of the narrative of bilateral opercular and insular regions as well as the ventral and lateral temporal cortex, extending more posteriorly on the left and medially on the right. Over time, there continued to be sustained right anterior ventral temporal engagement, with increasing involvement of the right anterior parahippocampal gyrus, STG, MTG, posterior superior temporal sulcus, inferior parietal lobule, frontal operculum, and insula, while left hemisphere engagement decreased. Our findings are consistent with prior imaging studies of narrative comprehension, but in addition, they demonstrate increasing right-lateralized engagement over the course of narratives, suggesting an important role for these right-hemispheric regions in semantic integration as well as social and pragmatic inference processing.
Collapse
Affiliation(s)
- Vahab Youssofzadeh
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Corresponding author. (V. Youssofzadeh)
| | - Lisa Conant
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Jeffrey Stout
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Candida Ustine
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - William L. Gross
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Anesthesiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - Jed Mathis
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Radiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Elizabeth Awe
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Linda Allen
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Edgar A. DeYoe
- Radiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Chad Carlson
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - Rama Maganti
- Neurology, University of Wisconsin-Madison, Madison, WI, USA
| | - Bruce Hermann
- Neurology, University of Wisconsin-Madison, Madison, WI, USA
| | - Veena A. Nair
- Radiology, University of Wisconsin-Madison, Madison, WI, USA
| | - Vivek Prabhakaran
- Radiology, University of Wisconsin-Madison, Madison, WI, USA,Medical Physics, University of Wisconsin-Madison, Madison, WI, USA,Psychiatry, University of Wisconsin-Madison, Madison, WI, USA
| | - Beth Meyerand
- Radiology, University of Wisconsin-Madison, Madison, WI, USA,Medical Physics, University of Wisconsin-Madison, Madison, WI, USA,Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | | | - Manoj Raghavan
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|
33
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|