1
|
Bicknell K, Bushong W, Tanenhaus MK, Jaeger TF. Maintenance of subcategorical information during speech perception: revisiting misunderstood limitations. JOURNAL OF MEMORY AND LANGUAGE 2025; 140:104565. [PMID: 39430798 PMCID: PMC11484864 DOI: 10.1016/j.jml.2024.104565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2024]
Abstract
Accurate word recognition is facilitated by context. Some relevant context, however, occurs after the word. Rational use of such "right context" would require listeners to have maintained uncertainty or subcategorical information about the word, thus allowing for consideration of possible alternatives when they encounter relevant right context. A classic study continues to be widely cited as evidence that subcategorical information maintenance is limited to highly ambiguous percepts and short time spans (Connine et al., 1991). More recent studies, however, using other phonological contrasts, and sometimes other paradigms, have returned mixed results. We identify procedural and analytical issues that provide an explanation for existing results. We address these issues in two reanalyses of previously published results and two new experiments. In all four cases, we find consistent evidence against both limitations reported in Connine et al.'s seminal work, at least within the classic paradigms. Key to our approach is the introduction of an ideal observer framework to derive normative predictions for human word recognition expected if listeners maintain and integrate subcategorical information about preceding speech input rationally with subsequent context. We test these predictions in Bayesian mixed-effect analyses, including at the level of individual participants. While we find that the ideal observer fits participants' behavior better than models based on previously proposed limitations, we also find one previously unrecognized aspect of listeners' behavior that is unexpected under any existing model, including the ideal observer.
Collapse
Affiliation(s)
- Klinton Bicknell
- Duolingo, Inc
- Department of Brain & Cognitive Sciences, University of Rochester
| | - Wednesday Bushong
- Department of Psychology, University of Hartford
- Cognitive & Linguistic Sciences Program, Wellesley College
- Department of Psychology, Wellesley College
| | - Michael K. Tanenhaus
- Department of Brain & Cognitive Sciences, University of Rochester
- School of Psychology, Nanjing Normal University
| | - T. Florian Jaeger
- Department of Brain & Cognitive Sciences, University of Rochester
- Department of Computer Science, University of Rochester
| |
Collapse
|
2
|
Fernandez LB, Pickering MJ, Naylor G, Hadley LV. Uses of Linguistic Context in Speech Listening: Does Acquired Hearing Loss Lead to Reduced Engagement of Prediction? Ear Hear 2024; 45:1107-1114. [PMID: 38880953 PMCID: PMC11325976 DOI: 10.1097/aud.0000000000001515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 04/01/2024] [Indexed: 06/18/2024]
Abstract
Research investigating the complex interplay of cognitive mechanisms involved in speech listening for people with hearing loss has been gaining prominence. In particular, linguistic context allows the use of several cognitive mechanisms that are not well distinguished in hearing science, namely those relating to "postdiction", "integration", and "prediction". We offer the perspective that an unacknowledged impact of hearing loss is the differential use of predictive mechanisms relative to age-matched individuals with normal hearing. As evidence, we first review how degraded auditory input leads to reduced prediction in people with normal hearing, then consider the literature exploring context use in people with acquired postlingual hearing loss. We argue that no research on hearing loss has directly assessed prediction. Because current interventions for hearing do not fully alleviate difficulty in conversation, and avoidance of spoken social interaction may be a mediator between hearing loss and cognitive decline, this perspective could lead to greater understanding of cognitive effects of hearing loss and provide insight regarding new targets for intervention.
Collapse
Affiliation(s)
- Leigh B. Fernandez
- Department of Social Sciences, Psycholinguistics Group, University of Kaiserslautern-Landau, Kaiserslautern, Germany
| | - Martin J. Pickering
- Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom
| | - Graham Naylor
- Hearing Sciences—Scottish Section, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| | - Lauren V. Hadley
- Hearing Sciences—Scottish Section, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| |
Collapse
|
3
|
Kapatsinski V, Bramlett AA, Idemaru K. What do you learn from a single cue? Dimensional reweighting and cue reassociation from experience with a newly unreliable phonetic cue. Cognition 2024; 249:105818. [PMID: 38772253 DOI: 10.1016/j.cognition.2024.105818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024]
Abstract
In language comprehension, we use perceptual cues to infer meanings. Some of these cues reside on perceptual dimensions. For example, the difference between bear and pear is cued by a difference in voice onset time (VOT), which is a continuous perceptual dimension. The present paper asks whether, and when, experience with a single value on a dimension behaving unexpectedly is used by the learner to reweight the whole dimension. We show that learners reweight the whole VOT dimension when exposed to a single VOT value (e.g., 45 ms) and provided with feedback indicating that the speaker intended to produce a /b/ 50% of the time and a /p/ the other 50% of the time. Importantly, dimensional reweighting occurs only if 1) the 50/50 feedback is unexpected for the VOT value, and 2) there is another dimension that is predictive of feedback. When no predictive dimension is available, listeners reassociate the experienced VOT value with the more surprising outcome but do not downweight the entire VOT dimension. These results provide support for perceptual representations of speech sounds that combine cues and dimensions, for viewing perceptual learning in speech as a combination of error-driven cue reassociation and dimensional reweighting, and for considering dimensional reweighting to be reallocation of attention that occurs only when there is evidence that reallocating attention would improve prediction accuracy (Harmon, Z., Idemaru, K., & Kapatsinski, V. 2019. Learning mechanisms in cue reweighting. Cognition, 189, 76-88.).
Collapse
Affiliation(s)
- Vsevolod Kapatsinski
- University of Oregon, Department of Linguistics, 161 Straub Hall, University of Oregon, Eugene, OR 97403-1290, United States of America.
| | - Adam A Bramlett
- Carnegie-Mellon University, Department of Modern Languages, 341 Posner Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213, United States of America.
| | - Kaori Idemaru
- University of Oregon, Department of East Asian Languages and Literatures, 114 Friendly Hall University of Oregon, Eugene, OR 97403-1248, United States of America.
| |
Collapse
|
4
|
Nora A, Rinkinen O, Renvall H, Service E, Arkkila E, Smolander S, Laasonen M, Salmelin R. Impaired Cortical Tracking of Speech in Children with Developmental Language Disorder. J Neurosci 2024; 44:e2048232024. [PMID: 38589232 PMCID: PMC11140678 DOI: 10.1523/jneurosci.2048-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
In developmental language disorder (DLD), learning to comprehend and express oneself with spoken language is impaired, but the reason for this remains unknown. Using millisecond-scale magnetoencephalography recordings combined with machine learning models, we investigated whether the possible neural basis of this disruption lies in poor cortical tracking of speech. The stimuli were common spoken Finnish words (e.g., dog, car, hammer) and sounds with corresponding meanings (e.g., dog bark, car engine, hammering). In both children with DLD (10 boys and 7 girls) and typically developing (TD) control children (14 boys and 3 girls), aged 10-15 years, the cortical activation to spoken words was best modeled as time-locked to the unfolding speech input at ∼100 ms latency between sound and cortical activation. Amplitude envelope (amplitude changes) and spectrogram (detailed time-varying spectral content) of the spoken words, but not other sounds, were very successfully decoded based on time-locked brain responses in bilateral temporal areas; based on the cortical responses, the models could tell at ∼75-85% accuracy which of the two sounds had been presented to the participant. However, the cortical representation of the amplitude envelope information was poorer in children with DLD compared with TD children at longer latencies (at ∼200-300 ms lag). We interpret this effect as reflecting poorer retention of acoustic-phonetic information in short-term memory. This impaired tracking could potentially affect the processing and learning of words as well as continuous speech. The present results offer an explanation for the problems in language comprehension and acquisition in DLD.
Collapse
Affiliation(s)
- Anni Nora
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| | - Oona Rinkinen
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| | - Hanna Renvall
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
- BioMag Laboratory, HUS Diagnostic Center, Helsinki University Hospital, Helsinki FI-00029, Finland
| | - Elisabet Service
- Department of Linguistics and Languages, Centre for Advanced Research in Experimental and Applied Linguistics (ARiEAL), McMaster University, Hamilton, Ontario L8S 4L8, Canada
- Department of Psychology and Logopedics, University of Helsinki, Helsinki FI-00014, Finland
| | - Eva Arkkila
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
| | - Sini Smolander
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
- Research Unit of Logopedics, University of Oulu, Oulu FI-90014, Finland
- Department of Logopedics, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Marja Laasonen
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
- Department of Logopedics, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Riitta Salmelin
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| |
Collapse
|
5
|
Guerreiro Fernandes F, Raemaekers M, Freudenburg Z, Ramsey N. Considerations for implanting speech brain computer interfaces based on functional magnetic resonance imaging. J Neural Eng 2024; 21:036005. [PMID: 38648782 DOI: 10.1088/1741-2552/ad4178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 04/22/2024] [Indexed: 04/25/2024]
Abstract
Objective.Brain-computer interfaces (BCIs) have the potential to reinstate lost communication faculties. Results from speech decoding studies indicate that a usable speech BCI based on activity in the sensorimotor cortex (SMC) can be achieved using subdurally implanted electrodes. However, the optimal characteristics for a successful speech implant are largely unknown. We address this topic in a high field blood oxygenation level dependent functional magnetic resonance imaging (fMRI) study, by assessing the decodability of spoken words as a function of hemisphere, gyrus, sulcal depth, and position along the ventral/dorsal-axis.Approach.Twelve subjects conducted a 7T fMRI experiment in which they pronounced 6 different pseudo-words over 6 runs. We divided the SMC by hemisphere, gyrus, sulcal depth, and position along the ventral/dorsal axis. Classification was performed on in these SMC areas using multiclass support vector machine (SVM).Main results.Significant classification was possible from the SMC, but no preference for the left or right hemisphere, nor for the precentral or postcentral gyrus for optimal word classification was detected. Classification while using information from the cortical surface was slightly better than when using information from deep in the central sulcus and was highest within the ventral 50% of SMC. Confusion matrices where highly similar across the entire SMC. An SVM-searchlight analysis revealed significant classification in the superior temporal gyrus and left planum temporale in addition to the SMC.Significance.The current results support a unilateral implant using surface electrodes, covering the ventral 50% of the SMC. The added value of depth electrodes is unclear. We did not observe evidence for variations in the qualitative nature of information across SMC. The current results need to be confirmed in paralyzed patients performing attempted speech.
Collapse
Affiliation(s)
- F Guerreiro Fernandes
- Department of Neurology and Neurosurgery, University Medical Center Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | - M Raemaekers
- Department of Neurology and Neurosurgery, University Medical Center Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | - Z Freudenburg
- Department of Neurology and Neurosurgery, University Medical Center Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | - N Ramsey
- Department of Neurology and Neurosurgery, University Medical Center Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
6
|
Clarke A, Tyler LK, Marslen-Wilson W. Hearing what is being said: the distributed neural substrate for early speech interpretation. LANGUAGE, COGNITION AND NEUROSCIENCE 2024; 39:1097-1116. [PMID: 39439863 PMCID: PMC11493057 DOI: 10.1080/23273798.2024.2345308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 03/26/2024] [Indexed: 10/25/2024]
Abstract
Speech comprehension is remarkable for the immediacy with which the listener hears what is being said. Here, we focus on the neural underpinnings of this process in isolated spoken words. We analysed source-localised MEG data for nouns using Representational Similarity Analysis to probe the spatiotemporal coordinates of phonology, lexical form, and the semantics of emerging word candidates. Phonological model fit was detectable within 40-50 ms, engaging a bilateral network including superior and middle temporal cortex and extending into anterior temporal and inferior parietal regions. Lexical form emerged within 60-70 ms, and model fit to semantics from 100-110 ms. Strikingly, the majority of vertices in a central core showed model fit to all three dimensions, consistent with a distributed neural substrate for early speech analysis. The early interpretation of speech seems to be conducted in a unified integrative representational space, in conflict with conventional views of a linguistically stratified representational hierarchy.
Collapse
Affiliation(s)
- Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK
| | | | | |
Collapse
|
7
|
Gwilliams L, Marantz A, Poeppel D, King JR. Hierarchical dynamic coding coordinates speech comprehension in the brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590280. [PMID: 38659750 PMCID: PMC11042271 DOI: 10.1101/2024.04.19.590280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Speech comprehension requires the human brain to transform an acoustic waveform into meaning. To do so, the brain generates a hierarchy of features that converts the sensory input into increasingly abstract language properties. However, little is known about how these hierarchical features are generated and continuously coordinated. Here, we propose that each linguistic feature is dynamically represented in the brain to simultaneously represent successive events. To test this 'Hierarchical Dynamic Coding' (HDC) hypothesis, we use time-resolved decoding of brain activity to track the construction, maintenance, and integration of a comprehensive hierarchy of language features spanning acoustic, phonetic, sub-lexical, lexical, syntactic and semantic representations. For this, we recorded 21 participants with magnetoencephalography (MEG), while they listened to two hours of short stories. Our analyses reveal three main findings. First, the brain incrementally represents and simultaneously maintains successive features. Second, the duration of these representations depend on their level in the language hierarchy. Third, each representation is maintained by a dynamic neural code, which evolves at a speed commensurate with its corresponding linguistic level. This HDC preserves the maintenance of information over time while limiting the interference between successive features. Overall, HDC reveals how the human brain continuously builds and maintains a language hierarchy during natural speech comprehension, thereby anchoring linguistic theories to their biological implementations.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, Stanford University
- Department of Psychology, New York University
| | - Alec Marantz
- Department of Psychology, New York University
- Department of Linguistics, New York University
| | - David Poeppel
- Department of Psychology, New York University
- Ernst Strungman Institute
| | | |
Collapse
|
8
|
Leonard MK, Gwilliams L, Sellers KK, Chung JE, Xu D, Mischler G, Mesgarani N, Welkenhuysen M, Dutta B, Chang EF. Large-scale single-neuron speech sound encoding across the depth of human cortex. Nature 2024; 626:593-602. [PMID: 38093008 PMCID: PMC10866713 DOI: 10.1038/s41586-023-06839-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 11/06/2023] [Indexed: 01/31/2024]
Abstract
Understanding the neural basis of speech perception requires that we study the human brain both at the scale of the fundamental computational unit of neurons and in their organization across the depth of cortex. Here we used high-density Neuropixels arrays1-3 to record from 685 neurons across cortical layers at nine sites in a high-level auditory region that is critical for speech, the superior temporal gyrus4,5, while participants listened to spoken sentences. Single neurons encoded a wide range of speech sound cues, including features of consonants and vowels, relative vocal pitch, onsets, amplitude envelope and sequence statistics. Neurons at each cross-laminar recording exhibited dominant tuning to a primary speech feature while also containing a substantial proportion of neurons that encoded other features contributing to heterogeneous selectivity. Spatially, neurons at similar cortical depths tended to encode similar speech features. Activity across all cortical layers was predictive of high-frequency field potentials (electrocorticography), providing a neuronal origin for macroelectrode recordings from the cortical surface. Together, these results establish single-neuron tuning across the cortical laminae as an important dimension of speech encoding in human superior temporal gyrus.
Collapse
Affiliation(s)
- Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Laura Gwilliams
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Kristin K Sellers
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Jason E Chung
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Duo Xu
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Nima Mesgarani
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | | | | | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
9
|
Frances C. Good enough processing: what have we learned in the 20 years since Ferreira et al. (2002)? Front Psychol 2024; 15:1323700. [PMID: 38328385 PMCID: PMC10847345 DOI: 10.3389/fpsyg.2024.1323700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/09/2024] [Indexed: 02/09/2024] Open
Abstract
Traditionally, language processing has been thought of in terms of complete processing of the input. In contrast to this, Ferreira and colleagues put forth the idea of good enough processing. The proposal was that during everyday processing, ambiguities remain unresolved, we rely on heuristics instead of full analyses, and we carry out deep processing only if we need to for the task at hand. This idea has gathered substantial traction since its conception. In the current work, I review the papers that have tested the three key claims of good enough processing: ambiguities remain unresolved and underspecified, we use heuristics to parse sentences, and deep processing is only carried out if required by the task. I find mixed evidence for these claims and conclude with an appeal to further refinement of the claims and predictions of the theory.
Collapse
Affiliation(s)
- Candice Frances
- Psychology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
| |
Collapse
|
10
|
Gwilliams L, Flick G, Marantz A, Pylkkänen L, Poeppel D, King JR. Introducing MEG-MASC a high-quality magneto-encephalography dataset for evaluating natural speech processing. Sci Data 2023; 10:862. [PMID: 38049487 PMCID: PMC10695966 DOI: 10.1038/s41597-023-02752-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/16/2023] [Indexed: 12/06/2023] Open
Abstract
The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers who listened to two hours of naturalistic stories. Each participant performed two identical sessions, involving listening to four fictional stories from the Manually Annotated Sub-Corpus (MASC) intermixed with random word lists and comprehension questions. We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS). This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech. We provide the Python code to replicate several validations analyses of the MEG evoked responses such as the temporal decoding of phonetic features and word frequency. All code and MEG, audio and text data are publicly available to keep with best practices in transparent and reproducible research.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, Stanford University, Stanford, USA.
- Department of Psychology, New York University, New York, USA.
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates.
| | - Graham Flick
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
- Rotman Research Institute, Baycrest Hospital, Toronto, Canada
| | - Alec Marantz
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
| | - Liina Pylkkänen
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, USA
- Ernst Struengmann Institute for Neuroscience, Frankfurt, Germany
| | - Jean-Rémi King
- Department of Psychology, New York University, New York, USA
- LSP, École normale supérieure, PSL University, CNRS, 75005, Paris, France
| |
Collapse
|
11
|
Schroën JAM, Gunter TC, Numssen O, Kroczek LOH, Hartwigsen G, Friederici AD. Causal evidence for a coordinated temporal interplay within the language network. Proc Natl Acad Sci U S A 2023; 120:e2306279120. [PMID: 37963247 PMCID: PMC10666120 DOI: 10.1073/pnas.2306279120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/06/2023] [Indexed: 11/16/2023] Open
Abstract
Recent neurobiological models on language suggest that auditory sentence comprehension is supported by a coordinated temporal interplay within a left-dominant brain network, including the posterior inferior frontal gyrus (pIFG), posterior superior temporal gyrus and sulcus (pSTG/STS), and angular gyrus (AG). Here, we probed the timing and causal relevance of the interplay between these regions by means of concurrent transcranial magnetic stimulation and electroencephalography (TMS-EEG). Our TMS-EEG experiments reveal region- and time-specific causal evidence for a bidirectional information flow from left pSTG/STS to left pIFG and back during auditory sentence processing. Adapting a condition-and-perturb approach, our findings further suggest that the left pSTG/STS can be supported by the left AG in a state-dependent manner.
Collapse
Affiliation(s)
- Joëlle A. M. Schroën
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig04103, Germany
| | - Thomas C. Gunter
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig04103, Germany
| | - Ole Numssen
- Methods and Development Group Brain Networks, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig04103, Germany
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig04103, Germany
| | - Leon O. H. Kroczek
- Department of Psychology, Clinical Psychology and Psychotherapy, Universität Regensburg, Regensburg93053, Germany
| | - Gesa Hartwigsen
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig04103, Germany
- Cognitive and Biological Psychology, Wilhelm Wundt Institute for Psychology, Leipzig04109, Germany
| | - Angela D. Friederici
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig04103, Germany
| |
Collapse
|
12
|
Charoy J, Samuel AG. Bad maps may not always get you lost: Lexically driven perceptual recalibration for substituted phonemes. Atten Percept Psychophys 2023; 85:2437-2458. [PMID: 37264293 PMCID: PMC10234583 DOI: 10.3758/s13414-023-02725-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2023] [Indexed: 06/03/2023]
Abstract
The speech perception system adjusts its phoneme categories based on the current speech input and lexical context. This is known as lexically driven perceptual recalibration, and it is often assumed to underlie accommodation to non-native accented speech. However, recalibration studies have focused on maximally ambiguous sounds (e.g., a sound ambiguous between "sh" and "s" in a word like "superpower"), a scenario that does not represent the full range of variation present in accented speech. Indeed, non-native speakers sometimes completely substitute a phoneme for another, rather than produce an ambiguous segment (e.g., saying "shuperpower"). This has been called a "bad map" in the literature. In this study, we scale up the lexically driven recalibration paradigm to such cases. Because previous research suggests that the position of the critically accented phoneme modulates the success of recalibration, we include such a manipulation in our study. And to ensure that participants treat all critical items as words (an important point for successful recalibration), we use a new exposure task that incentivizes them to do so. Our findings suggest that while recalibration is most robust after exposure to ambiguous sounds, it also occurs after exposure to bad maps. But interestingly, positional effects may be reversed: recalibration was more likely for ambiguous sounds late in words, but more likely for bad maps occurring early in words. Finally, a comparison of an online versus in-lab version of these conditions shows that experimental setting may have a non-trivial effect on the results of recalibration studies.
Collapse
Affiliation(s)
- Jeanne Charoy
- Department of Psychology, Stony Brook University, New York, NY, USA.
| | - Arthur G Samuel
- Department of Psychology, Stony Brook University, New York, NY, USA
- Basque Center on Cognition Brain and Language, Donostia-San Sebastian, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
13
|
Mischler G, Raghavan V, Keshishian M, Mesgarani N. naplib-python: Neural acoustic data processing and analysis tools in python. SOFTWARE IMPACTS 2023; 17:100541. [PMID: 37771949 PMCID: PMC10538526 DOI: 10.1016/j.simpa.2023.100541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/30/2023]
Abstract
Recently, the computational neuroscience community has pushed for more transparent and reproducible methods across the field. In the interest of unifying the domain of auditory neuroscience, naplib-python provides an intuitive and general data structure for handling all neural recordings and stimuli, as well as extensive preprocessing, feature extraction, and analysis tools which operate on that data structure. The package removes many of the complications associated with this domain, such as varying trial durations and multi-modal stimuli, and provides a general-purpose analysis framework that interfaces easily with existing toolboxes used in the field.
Collapse
Affiliation(s)
- Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, NY, United States
- Department of Electrical Engineering, Columbia University, NY, United States
| | - Vinay Raghavan
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, NY, United States
- Department of Electrical Engineering, Columbia University, NY, United States
| | - Menoua Keshishian
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, NY, United States
- Department of Electrical Engineering, Columbia University, NY, United States
| | - Nima Mesgarani
- Corresponding author at: Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, NY, United States. (N. Mesgarani)
| |
Collapse
|
14
|
Zhang Y, Rennig J, Magnotti JF, Beauchamp MS. Multivariate fMRI responses in superior temporal cortex predict visual contributions to, and individual differences in, the intelligibility of noisy speech. Neuroimage 2023; 278:120271. [PMID: 37442310 PMCID: PMC10460966 DOI: 10.1016/j.neuroimage.2023.120271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/20/2023] [Accepted: 07/06/2023] [Indexed: 07/15/2023] Open
Abstract
Humans have the unique ability to decode the rapid stream of language elements that constitute speech, even when it is contaminated by noise. Two reliable observations about noisy speech perception are that seeing the face of the talker improves intelligibility and the existence of individual differences in the ability to perceive noisy speech. We introduce a multivariate BOLD fMRI measure that explains both observations. In two independent fMRI studies, clear and noisy speech was presented in visual, auditory and audiovisual formats to thirty-seven participants who rated intelligibility. An event-related design was used to sort noisy speech trials by their intelligibility. Individual-differences multidimensional scaling was applied to fMRI response patterns in superior temporal cortex and the dissimilarity between responses to clear speech and noisy (but intelligible) speech was measured. Neural dissimilarity was less for audiovisual speech than auditory-only speech, corresponding to the greater intelligibility of noisy audiovisual speech. Dissimilarity was less in participants with better noisy speech perception, corresponding to individual differences. These relationships held for both single word and entire sentence stimuli, suggesting that they were driven by intelligibility rather than the specific stimuli tested. A neural measure of perceptual intelligibility may aid in the development of strategies for helping those with impaired speech perception.
Collapse
Affiliation(s)
- Yue Zhang
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States; Department of Neurosurgery, Baylor College of Medicine, Houston, TX, United States
| | - Johannes Rennig
- Division of Neuropsychology, Center of Neurology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - John F Magnotti
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
| |
Collapse
|
15
|
Kries J, De Clercq P, Lemmens R, Francart T, Vandermosten M. Acoustic and phonemic processing are impaired in individuals with aphasia. Sci Rep 2023; 13:11208. [PMID: 37433805 DOI: 10.1038/s41598-023-37624-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 06/24/2023] [Indexed: 07/13/2023] Open
Abstract
Acoustic and phonemic processing are understudied in aphasia, a language disorder that can affect different levels and modalities of language processing. For successful speech comprehension, processing of the speech envelope is necessary, which relates to amplitude changes over time (e.g., the rise times). Moreover, to identify speech sounds (i.e., phonemes), efficient processing of spectro-temporal changes as reflected in formant transitions is essential. Given the underrepresentation of aphasia studies on these aspects, we tested rise time processing and phoneme identification in 29 individuals with post-stroke aphasia and 23 healthy age-matched controls. We found significantly lower performance in the aphasia group than in the control group on both tasks, even when controlling for individual differences in hearing levels and cognitive functioning. Further, by conducting an individual deviance analysis, we found a low-level acoustic or phonemic processing impairment in 76% of individuals with aphasia. Additionally, we investigated whether this impairment would propagate to higher-level language processing and found that rise time processing predicts phonological processing performance in individuals with aphasia. These findings show that it is important to develop diagnostic and treatment tools that target low-level language processing mechanisms.
Collapse
Affiliation(s)
- Jill Kries
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium.
| | - Pieter De Clercq
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Robin Lemmens
- Experimental Neurology, Department of Neurosciences, KU Leuven, Leuven, Belgium
- Laboratory of Neurobiology, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Neurology, University Hospitals Leuven, Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium.
| |
Collapse
|
16
|
Mischler G, Raghavan V, Keshishian M, Mesgarani N. naplib-python: Neural Acoustic Data Processing and Analysis Tools in Python. ARXIV 2023:arXiv:2304.01799v1. [PMID: 37064534 PMCID: PMC10104195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Recently, the computational neuroscience community has pushed for more transparent and reproducible methods across the field. In the interest of unifying the domain of auditory neuroscience, naplib-python provides an intuitive and general data structure for handling all neural recordings and stimuli, as well as extensive preprocessing, feature extraction, and analysis tools which operate on that data structure. The package removes many of the complications associated with this domain, such as varying trial durations and multi-modal stimuli, and provides a general-purpose analysis framework that interfaces easily with existing toolboxes used in the field.
Collapse
Affiliation(s)
- Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States
- Department of Electrical Engineering, Columbia University, New York, United States
| | - Vinay Raghavan
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States
- Department of Electrical Engineering, Columbia University, New York, United States
| | - Menoua Keshishian
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States
- Department of Electrical Engineering, Columbia University, New York, United States
| | - Nima Mesgarani
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States
- Department of Electrical Engineering, Columbia University, New York, United States
| |
Collapse
|
17
|
Giroud J, Lerousseau JP, Pellegrino F, Morillon B. The channel capacity of multilevel linguistic features constrains speech comprehension. Cognition 2023; 232:105345. [PMID: 36462227 DOI: 10.1016/j.cognition.2022.105345] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 09/28/2022] [Accepted: 11/22/2022] [Indexed: 12/05/2022]
Abstract
Humans are expert at processing speech but how this feat is accomplished remains a major question in cognitive neuroscience. Capitalizing on the concept of channel capacity, we developed a unified measurement framework to investigate the respective influence of seven acoustic and linguistic features on speech comprehension, encompassing acoustic, sub-lexical, lexical and supra-lexical levels of description. We show that comprehension is independently impacted by all these features, but at varying degrees and with a clear dominance of the syllabic rate. Comparing comprehension of French words and sentences further reveals that when supra-lexical contextual information is present, the impact of all other features is dramatically reduced. Finally, we estimated the channel capacity associated with each linguistic feature and compared them with their generic distribution in natural speech. Our data reveal that while acoustic modulation, syllabic and phonemic rates unfold respectively at 5, 5, and 12 Hz in natural speech, they are associated with independent processing bottlenecks whose channel capacity are of 15, 15 and 35 Hz, respectively, as suggested by neurophysiological theories. They moreover point towards supra-lexical contextual information as the feature limiting the flow of natural speech. Overall, this study reveals how multilevel linguistic features constrain speech comprehension.
Collapse
Affiliation(s)
- Jérémy Giroud
- Aix Marseille Univ, Inserm, INS, Inst Neurosci Syst, Marseille, France.
| | | | - François Pellegrino
- Laboratoire Dynamique du Langage UMR 5596, CNRS, University of Lyon, 14 Avenue Berthelot, 69007 Lyon, France
| | - Benjamin Morillon
- Aix Marseille Univ, Inserm, INS, Inst Neurosci Syst, Marseille, France
| |
Collapse
|
18
|
Willeford K. The Luminescence Hypothesis of Olfaction. SENSORS (BASEL, SWITZERLAND) 2023; 23:1333. [PMID: 36772376 PMCID: PMC9919928 DOI: 10.3390/s23031333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/12/2023] [Accepted: 01/20/2023] [Indexed: 06/18/2023]
Abstract
A new hypothesis for the mechanism of olfaction is presented. It begins with an odorant molecule binding to an olfactory receptor. This is followed by the quantum biology event of inelastic electron tunneling as has been suggested with both the vibration and swipe card theories. It is novel in that it is not concerned with the possible effects of the tunneled electrons as has been discussed with the previous theories. Instead, the high energy state of the odorant molecule in the receptor following inelastic electron tunneling is considered. The hypothesis is that, as the high energy state decays, there is fluorescence luminescence with radiative emission of multiple photons. These photons pass through the supporting sustentacular cells and activate a set of olfactory neurons in near-simultaneous timing, which provides the temporal basis for the brain to interpret the required complex combinatorial coding as an odor. The Luminescence Hypothesis of Olfaction is the first to present the necessity of or mechanism for a 1:3 correspondence of odorant molecule to olfactory nerve activations. The mechanism provides for a consistent and reproducible time-based activation of sets of olfactory nerves correlated to an odor. The hypothesis has a biological precedent: an energy feasibility assessment is included, explaining the anosmia seen with COVID-19, and can be confirmed with existing laboratory techniques.
Collapse
Affiliation(s)
- Kenneth Willeford
- Coastal Carolinas Integrated Medicine, 10 Doctors Circle, STE 2, Supply, NC 28462, USA
| |
Collapse
|
19
|
Gow DW, Avcu E, Schoenhaut A, Sorensen DO, Ahlfors SP. Abstract representations in temporal cortex support generative linguistic processing. LANGUAGE, COGNITION AND NEUROSCIENCE 2022; 38:765-778. [PMID: 37332658 PMCID: PMC10270390 DOI: 10.1080/23273798.2022.2157029] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 11/21/2022] [Indexed: 06/20/2023]
Abstract
Generativity, the ability to create and evaluate novel constructions, is a fundamental property of human language and cognition. The productivity of generative processes is determined by the scope of the representations they engage. Here we examine the neural representation of reduplication, a productive phonological process that can create novel forms through patterned syllable copying (e.g. ba-mih → ba-ba-mih, ba-mih-mih, or ba-mih-ba). Using MRI-constrained source estimates of combined MEG/EEG data collected during an auditory artificial grammar task, we identified localized cortical activity associated with syllable reduplication pattern contrasts in novel trisyllabic nonwords. Neural decoding analyses identified a set of predominantly right hemisphere temporal lobe regions whose activity reliably discriminated reduplication patterns evoked by untrained, novel stimuli. Effective connectivity analyses suggested that sensitivity to abstracted reduplication patterns was propagated between these temporal regions. These results suggest that localized temporal lobe activity patterns function as abstract representations that support linguistic generativity.
Collapse
Affiliation(s)
- David W. Gow
- Department of Neurology Massachusetts General Hospital and Harvard Medical School; Boston, MA, 02114
- Department of Psychology, Salem State University; Salem, MA, 01970
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital; Charlestown, MA, 02129
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard Medical School; Boston, MA 02115
| | - Enes Avcu
- Department of Neurology Massachusetts General Hospital and Harvard Medical School; Boston, MA, 02114
| | - Adriana Schoenhaut
- Department of Neurology Massachusetts General Hospital and Harvard Medical School; Boston, MA, 02114
| | - David O. Sorensen
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard Medical School; Boston, MA 02115
| | - Seppo P. Ahlfors
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard Medical School; Boston, MA 02115
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School; Boston, MA, 02114
| |
Collapse
|
20
|
Kutlu E, Chiu S, McMurray B. Moving away from deficiency models: Gradiency in bilingual speech categorization. Front Psychol 2022; 13:1033825. [PMID: 36507048 PMCID: PMC9730410 DOI: 10.3389/fpsyg.2022.1033825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 11/03/2022] [Indexed: 11/25/2022] Open
Abstract
For much of its history, categorical perception was treated as a foundational theory of speech perception, which suggested that quasi-discrete categorization was a goal of speech perception. This had a profound impact on bilingualism research which adopted similar tasks to use as measures of nativeness or native-like processing, implicitly assuming that any deviation from discreteness was a deficit. This is particularly problematic for listeners like heritage speakers whose language proficiency, both in their heritage language and their majority language, is questioned. However, we now know that in the monolingual listener, speech perception is gradient and listeners use this gradiency to adjust subphonetic details, recover from ambiguity, and aid learning and adaptation. This calls for new theoretical and methodological approaches to bilingualism. We present the Visual Analogue Scaling task which avoids the discrete and binary assumptions of categorical perception and can capture gradiency more precisely than other measures. Our goal is to provide bilingualism researchers new conceptual and empirical tools that can help examine speech categorization in different bilingual communities without the necessity of forcing their speech categorization into discrete units and without assuming a deficit model.
Collapse
Affiliation(s)
- Ethan Kutlu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States
- Department of Linguistics, University of Iowa, Iowa City, IA, United States
| | - Samantha Chiu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States
| | - Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States
- Department of Linguistics, University of Iowa, Iowa City, IA, United States
| |
Collapse
|
21
|
Gwilliams L, King JR, Marantz A, Poeppel D. Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nat Commun 2022; 13:6606. [PMID: 36329058 PMCID: PMC9633780 DOI: 10.1038/s41467-022-34326-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 10/19/2022] [Indexed: 11/06/2022] Open
Abstract
Speech consists of a continuously-varying acoustic signal. Yet human listeners experience it as sequences of discrete speech sounds, which are used to recognise discrete words. To examine how the human brain appropriately sequences the speech signal, we recorded two-hour magnetoencephalograms from 21 participants listening to short narratives. Our analyses show that the brain continuously encodes the three most recently heard speech sounds in parallel, and maintains this information long past its dissipation from the sensory input. Each speech sound representation evolves over time, jointly encoding both its phonetic features and the amount of time elapsed since onset. As a result, this dynamic neural pattern encodes both the relative order and phonetic content of the speech sequence. These representations are active earlier when phonemes are more predictable, and are sustained longer when lexical identity is uncertain. Our results show how phonetic sequences in natural speech are represented at the level of populations of neurons, providing insight into what intermediary representations exist between the sensory input and sub-lexical units. The flexibility in the dynamics of these representations paves the way for further understanding of how such sequences may be used to interface with higher order structure such as lexical identity.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Neurological Surgery, University of California, San Francisco, USA.
- Department of Psychology, New York University, New York, USA.
- NYU Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Jean-Remi King
- Department of Psychology, New York University, New York, USA
- École normale supérieure, PSL University, CNRS, Paris, France
| | - Alec Marantz
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, UAE
- Department of Linguistics, New York University, New York, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, USA
- Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
| |
Collapse
|
22
|
McMurray B, Sarrett ME, Chiu S, Black AK, Wang A, Canale R, Aslin RN. Decoding the temporal dynamics of spoken word and nonword processing from EEG. Neuroimage 2022; 260:119457. [PMID: 35842096 PMCID: PMC10875705 DOI: 10.1016/j.neuroimage.2022.119457] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 07/02/2022] [Accepted: 07/06/2022] [Indexed: 11/23/2022] Open
Abstract
The efficiency of spoken word recognition is essential for real-time communication. There is consensus that this efficiency relies on an implicit process of activating multiple word candidates that compete for recognition as the acoustic signal unfolds in real-time. However, few methods capture the neural basis of this dynamic competition on a msec-by-msec basis. This is crucial for understanding the neuroscience of language, and for understanding hearing, language and cognitive disorders in people for whom current behavioral methods are not suitable. We applied machine-learning techniques to standard EEG signals to decode which word was heard on each trial and analyzed the patterns of confusion over time. Results mirrored psycholinguistic findings: Early on, the decoder was equally likely to report the target (e.g., baggage) or a similar sounding competitor (badger), but by around 500 msec, competitors were suppressed. Follow up analyses show that this is robust across EEG systems (gel and saline), with fewer channels, and with fewer trials. Results are robust within individuals and show high reliability. This suggests a powerful and simple paradigm that can assess the neural dynamics of speech decoding, with potential applications for understanding lexical development in a variety of clinical disorders.
Collapse
Affiliation(s)
- Bob McMurray
- Dept. of Psychological and Brain Sciences, Dept. of Communication Sciences and Disorders, Dept. of Linguistics and Dept. of Otolaryngology, University of Iowa.
| | - McCall E Sarrett
- Interdisciplinary Graduate Program in Neuroscience, Unviersity of Iowa
| | - Samantha Chiu
- Dept. of Psychological and Brain Sciences, University of Iowa
| | - Alexis K Black
- School of Audiology and Speech Sciences, University of British Columbia, Haskins Laboratories
| | - Alice Wang
- Dept. of Psychology, University of Oregon, Haskins Laboratories
| | - Rebecca Canale
- Dept. of Psychological Sciences, University of Connecticut, Haskins Laboratories
| | - Richard N Aslin
- Haskins Laboratories, Department of Psychology and Child Study Center, Yale University, Department of Psychology, University of Connecticut
| |
Collapse
|
23
|
Onnis L, Lim A, Cheung S, Huettig F. Is the Mind Inherently Predicting? Exploring Forward and Backward Looking in Language Processing. Cogn Sci 2022; 46:e13201. [PMID: 36240464 PMCID: PMC9786242 DOI: 10.1111/cogs.13201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 08/23/2022] [Accepted: 08/29/2022] [Indexed: 12/30/2022]
Abstract
Prediction is one characteristic of the human mind. But what does it mean to say the mind is a "prediction machine" and inherently forward looking as is frequently claimed? In natural languages, many contexts are not easily predictable in a forward fashion. In English, for example, many frequent verbs do not carry unique meaning on their own but instead, rely on another word or words that follow them to become meaningful. Upon reading take a the processor often cannot easily predict walk as the next word. But the system can "look back" and integrate walk more easily when it follows take a (e.g., as opposed to *make|get|have a walk). In the present paper, we provide further evidence for the importance of both forward and backward-looking in language processing. In two self-paced reading tasks and an eye-tracking reading task, we found evidence that adult English native speakers' sensitivity to word forward and backward conditional probability significantly predicted reading times over and above psycholinguistic predictors of reading latencies. We conclude that both forward and backward-looking (prediction and integration) appear to be important characteristics of language processing. Our results thus suggest that it makes just as much sense to call the mind an "integration machine" which is inherently backward 'looking.'
Collapse
Affiliation(s)
- Luca Onnis
- Centre for Multilingualism in Society across the LifespanUniversity of Oslo,Department of Linguistics and Scandinavian StudiesUniversity of Oslo
| | - Alfred Lim
- School of PsychologyUniversity of Nottingham Malaysia Campus
| | | | | |
Collapse
|
24
|
Nenadić F, Tucker BV, Ten Bosch L. Computational Modeling of an Auditory Lexical Decision Experiment Using DIANA. LANGUAGE AND SPEECH 2022:238309221111752. [PMID: 36000386 PMCID: PMC10394956 DOI: 10.1177/00238309221111752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We present an implementation of DIANA, a computational model of spoken word recognition, to model responses collected in the Massive Auditory Lexical Decision (MALD) project. DIANA is an end-to-end model, including an activation and decision component that takes the acoustic signal as input, activates internal word representations, and outputs lexicality judgments and estimated response latencies. Simulation 1 presents the process of creating acoustic models required by DIANA to analyze novel speech input. Simulation 2 investigates DIANA's performance in determining whether the input signal is a word present in the lexicon or a pseudoword. In Simulation 3, we generate estimates of response latency and correlate them with general tendencies in participant responses in MALD data. We find that DIANA performs fairly well in free word recognition and lexical decision. However, the current approach for estimating response latency provides estimates opposite to those found in behavioral data. We discuss these findings and offer suggestions as to what a contemporary model of spoken word recognition should be able to do.
Collapse
Affiliation(s)
- Filip Nenadić
- University of Alberta, Canada; Singidunum University, Serbia
| | | | | |
Collapse
|
25
|
Romero-Rivas C, Costa A. On the flexibility of the sound-to-meaning mapping when listening to native and foreign-accented speech. Cortex 2022; 149:1-15. [DOI: 10.1016/j.cortex.2022.01.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 10/11/2021] [Accepted: 01/12/2022] [Indexed: 11/29/2022]
|
26
|
Kapnoula EC, McMurray B. Idiosyncratic use of bottom-up and top-down information leads to differences in speech perception flexibility: Converging evidence from ERPs and eye-tracking. BRAIN AND LANGUAGE 2021; 223:105031. [PMID: 34628259 PMCID: PMC11251822 DOI: 10.1016/j.bandl.2021.105031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/29/2021] [Accepted: 09/22/2021] [Indexed: 06/13/2023]
Abstract
Listeners generally categorize speech sounds in a gradient manner. However, recent work, using a visual analogue scaling (VAS) task, suggests that some listeners show more categorical performance, leading to less flexible cue integration and poorer recovery from misperceptions (Kapnoula et al., 2017, 2021). We asked how individual differences in speech gradiency can be reconciled with the well-established gradiency in the modal listener, showing how VAS performance relates to both Visual World Paradigm and EEG measures of gradiency. We also investigated three potential sources of these individual differences: inhibitory control; lexical inhibition; and early cue encoding. We used the N1 ERP component to track pre-categorical encoding of Voice Onset Time (VOT). The N1 linearly tracked VOT, reflecting a fundamentally gradient speech perception; however, for less gradient listeners, this linearity was disrupted near the boundary. Thus, while all listeners are gradient, they may show idiosyncratic encoding of specific cues, affecting downstream processing.
Collapse
Affiliation(s)
- Efthymia C Kapnoula
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; DeLTA Center, University of Iowa, United States; Basque Center on Cognition, Brain and Language, Spain.
| | - Bob McMurray
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; DeLTA Center, University of Iowa, United States; Dept. of Communication Sciences and Disorders, DeLTA Center, University of Iowa, United States; Dept. of Linguistics, DeLTA Center, University of Iowa, United States
| |
Collapse
|
27
|
Beach SD, Ozernov-Palchik O, May SC, Centanni TM, Gabrieli JDE, Pantazis D. Neural Decoding Reveals Concurrent Phonemic and Subphonemic Representations of Speech Across Tasks. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2021; 2:254-279. [PMID: 34396148 PMCID: PMC8360503 DOI: 10.1162/nol_a_00034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 02/21/2021] [Indexed: 06/13/2023]
Abstract
Robust and efficient speech perception relies on the interpretation of acoustically variable phoneme realizations, yet prior neuroimaging studies are inconclusive regarding the degree to which subphonemic detail is maintained over time as categorical representations arise. It is also unknown whether this depends on the demands of the listening task. We addressed these questions by using neural decoding to quantify the (dis)similarity of brain response patterns evoked during two different tasks. We recorded magnetoencephalography (MEG) as adult participants heard isolated, randomized tokens from a /ba/-/da/ speech continuum. In the passive task, their attention was diverted. In the active task, they categorized each token as ba or da. We found that linear classifiers successfully decoded ba vs. da perception from the MEG data. Data from the left hemisphere were sufficient to decode the percept early in the trial, while the right hemisphere was necessary but not sufficient for decoding at later time points. We also decoded stimulus representations and found that they were maintained longer in the active task than in the passive task; however, these representations did not pattern more like discrete phonemes when an active categorical response was required. Instead, in both tasks, early phonemic patterns gave way to a representation of stimulus ambiguity that coincided in time with reliable percept decoding. Our results suggest that the categorization process does not require the loss of subphonemic detail, and that the neural representation of isolated speech sounds includes concurrent phonemic and subphonemic information.
Collapse
Affiliation(s)
- Sara D. Beach
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA
| | - Ola Ozernov-Palchik
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sidney C. May
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, USA
| | - Tracy M. Centanni
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychology, Texas Christian University, Fort Worth, TX, USA
| | - John D. E. Gabrieli
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Dimitrios Pantazis
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
28
|
Brown M, Tanenhaus MK, Dilley L. Syllable Inference as a Mechanism for Spoken Language Understanding. Top Cogn Sci 2021; 13:351-398. [PMID: 33780156 DOI: 10.1111/tops.12529] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 01/25/2023]
Abstract
A classic problem in spoken language comprehension is how listeners perceive speech as being composed of discrete words, given the variable time-course of information in continuous signals. We propose a syllable inference account of spoken word recognition and segmentation, according to which alternative hierarchical models of syllables, words, and phonemes are dynamically posited, which are expected to maximally predict incoming sensory input. Generative models are combined with current estimates of context speech rate drawn from neural oscillatory dynamics, which are sensitive to amplitude rises. Over time, models which result in local minima in error between predicted and recently experienced signals give rise to perceptions of hearing words. Three experiments using the visual world eye-tracking paradigm with a picture-selection task tested hypotheses motivated by this framework. Materials were sentences that were acoustically ambiguous in numbers of syllables, words, and phonemes they contained (cf. English plural constructions, such as "saw (a) raccoon(s) swimming," which have two loci of grammatical information). Time-compressing, or expanding, speech materials permitted determination of how temporal information at, or in the context of, each locus affected looks to, and selection of, pictures with a singular or plural referent (e.g., one or more than one raccoon). Supporting our account, listeners probabilistically interpreted identical chunks of speech as consistent with a singular or plural referent to a degree that was based on the chunk's gradient rate in relation to its context. We interpret these results as evidence that arriving temporal information, judged in relation to language model predictions generated from context speech rate evaluated on a continuous scale, informs inferences about syllables, thereby giving rise to perceptual experiences of understanding spoken language as words separated in time.
Collapse
Affiliation(s)
- Meredith Brown
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA.,Department of Psychiatry and Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, USA.,Department of Psychology, Tufts University, Medford, Massachusetts, USA
| | - Michael K Tanenhaus
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, USA.,School of Psychology, Nanjing Normal University, Nanjing, China
| | - Laura Dilley
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
29
|
Effects of temporal order and intentionality on reflective attention to words in noise. PSYCHOLOGICAL RESEARCH 2021; 86:544-557. [PMID: 33683449 DOI: 10.1007/s00426-021-01494-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 02/15/2021] [Indexed: 10/22/2022]
Abstract
Speech perception in noise is a cognitively demanding process that challenges not only the auditory sensory system, but also cognitive networks involved in attention. The predictive coding theory has been influential in characterizing the influence of prior context on processing incoming auditory stimuli, with comparatively less research dedicated to "postdictive" processes and subsequent context effects on speech perception. Effects of subsequent semantic context were evaluated while manipulating the relationship of three target words presented in noise and the temporal position of targets compared to the subsequent contextual cue, demonstrating that subsequent context benefits were present regardless of whether the targets were related to each other and did not depend on the position of the target. However, participants instructed to focus on the relation between target and cue performed worse than those who did not receive this instruction, suggesting a disruption of a natural process of continuous speech recognition. We discuss these findings in relation to lexical commitment and stimulus-driven attention to short-term memory as mechanisms of subsequent context integration.
Collapse
|
30
|
Caplan S, Hafri A, Trueswell JC. Now You Hear Me, Later You Don't: The Immediacy of Linguistic Computation and the Representation of Speech. Psychol Sci 2021; 32:410-423. [PMID: 33617735 DOI: 10.1177/0956797620968787] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
What happens to an acoustic signal after it enters the mind of a listener? Previous work has demonstrated that listeners maintain intermediate representations over time. However, the internal structure of such representations-be they the acoustic-phonetic signal or more general information about the probability of possible categories-remains underspecified. We present two experiments using a novel speaker-adaptation paradigm aimed at uncovering the format of speech representations. We exposed adult listeners (N = 297) to a speaker whose utterances contained acoustically ambiguous information concerning phones (and thus words), and we manipulated the temporal availability of disambiguating cues via visually presented text (presented before or after each utterance). Results from a traditional phoneme-categorization task showed that listeners adapted to a modified acoustic distribution when disambiguating text was provided before but not after the audio. These results support the position that speech representations consist of activation over categories and are inconsistent with direct maintenance of the acoustic-phonetic signal.
Collapse
Affiliation(s)
| | - Alon Hafri
- Department of Cognitive Science, Johns Hopkins University.,Department of Psychological and Brain Sciences, Johns Hopkins University
| | | |
Collapse
|
31
|
Cummings AE, Wu YC, Ogiela DA. Phonological Underspecification: An Explanation for How a Rake Can Become Awake. Front Hum Neurosci 2021; 15:585817. [PMID: 33679342 PMCID: PMC7925882 DOI: 10.3389/fnhum.2021.585817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 01/25/2021] [Indexed: 11/13/2022] Open
Abstract
Neural markers, such as the mismatch negativity (MMN), have been used to examine the phonological underspecification of English feature contrasts using the Featurally Underspecified Lexicon (FUL) model. However, neural indices have not been examined within the approximant phoneme class, even though there is evidence suggesting processing asymmetries between liquid (e.g., /ɹ/) and glide (e.g., /w/) phonemes. The goal of this study was to determine whether glide phonemes elicit electrophysiological asymmetries related to [consonantal] underspecification when contrasted with liquid phonemes in adult English speakers. Specifically, /ɹɑ/ is categorized as [+consonantal] while /wɑ/ is not specified [i.e., (-consonantal)]. Following the FUL framework, if /w/ is less specified than /ɹ/, the former phoneme should elicit a larger MMN response than the latter phoneme. Fifteen English-speaking adults were presented with two syllables, /ɹɑ/ and /wɑ/, in an event-related potential (ERP) oddball paradigm in which both syllables served as the standard and deviant stimulus in opposite stimulus sets. Three types of analyses were used: (1) traditional mean amplitude measurements; (2) cluster-based permutation analyses; and (3) event-related spectral perturbation (ERSP) analyses. The less specified /wɑ/ elicited a large MMN, while a much smaller MMN was elicited by the more specified /ɹɑ/. In the standard and deviant ERP waveforms, /wɑ/ elicited a significantly larger negative response than did /ɹɑ/. Theta activity elicited by /ɹɑ/ was significantly greater than that elicited by /wɑ/ in the 100-300 ms time window. Also, low gamma activation was significantly lower for /ɹɑ/ vs. /wɑ/ deviants over the left hemisphere, as compared to the right, in the 100-150 ms window. These outcomes suggest that the [consonantal] feature follows the underspecification predictions of FUL previously tested with the place of articulation and voicing features. Thus, this study provides new evidence for phonological underspecification. Moreover, as neural oscillation patterns have not previously been discussed in the underspecification literature, the ERSP analyses identified potential new indices of phonological underspecification.
Collapse
Affiliation(s)
- Alycia E. Cummings
- Department of Communication Sciences and Disorders, Idaho State University, Meridian, ID, United States
| | - Ying C. Wu
- Swartz Center for Computational Neuroscience, University of California, San Diego, San Diego, CA, United States
| | - Diane A. Ogiela
- Department of Communication Sciences and Disorders, Idaho State University, Meridian, ID, United States
| |
Collapse
|
32
|
Bidelman GM, Pearson C, Harrison A. Lexical Influences on Categorical Speech Perception Are Driven by a Temporoparietal Circuit. J Cogn Neurosci 2021; 33:840-852. [PMID: 33464162 DOI: 10.1162/jocn_a_01678] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Categorical judgments of otherwise identical phonemes are biased toward hearing words (i.e., "Ganong effect") suggesting lexical context influences perception of even basic speech primitives. Lexical biasing could manifest via late stage postperceptual mechanisms related to decision or, alternatively, top-down linguistic inference that acts on early perceptual coding. Here, we exploited the temporal sensitivity of EEG to resolve the spatiotemporal dynamics of these context-related influences on speech categorization. Listeners rapidly classified sounds from a /gɪ/-/kɪ/ gradient presented in opposing word-nonword contexts (GIFT-kift vs. giss-KISS), designed to bias perception toward lexical items. Phonetic perception shifted toward the direction of words, establishing a robust Ganong effect behaviorally. ERPs revealed a neural analog of lexical biasing emerging within ~200 msec. Source analyses uncovered a distributed neural network supporting the Ganong including middle temporal gyrus, inferior parietal lobe, and middle frontal cortex. Yet, among Ganong-sensitive regions, only left middle temporal gyrus and inferior parietal lobe predicted behavioral susceptibility to lexical influence. Our findings confirm lexical status rapidly constrains sublexical categorical representations for speech within several hundred milliseconds but likely does so outside the purview of canonical auditory-sensory brain areas.
Collapse
Affiliation(s)
- Gavin M Bidelman
- University of Memphis, TN.,University of Tennessee Health Sciences Center, Memphis, TN
| | | | | |
Collapse
|
33
|
Adaptation to mis-pronounced speech: evidence for a prefrontal-cortex repair mechanism. Sci Rep 2021; 11:97. [PMID: 33420193 PMCID: PMC7794353 DOI: 10.1038/s41598-020-79640-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 11/23/2020] [Indexed: 11/30/2022] Open
Abstract
Speech is a complex and ambiguous acoustic signal that varies significantly within and across speakers. Despite the processing challenge that such variability poses, humans adapt to systematic variations in pronunciation rapidly. The goal of this study is to uncover the neurobiological bases of the attunement process that enables such fluent comprehension. Twenty-four native English participants listened to words spoken by a “canonical” American speaker and two non-canonical speakers, and performed a word-picture matching task, while magnetoencephalography was recorded. Non-canonical speech was created by including systematic phonological substitutions within the word (e.g. [s] → [sh]). Activity in the auditory cortex (superior temporal gyrus) was greater in response to substituted phonemes, and, critically, this was not attenuated by exposure. By contrast, prefrontal regions showed an interaction between the presence of a substitution and the amount of exposure: activity decreased for canonical speech over time, whereas responses to non-canonical speech remained consistently elevated. Grainger causality analyses further revealed that prefrontal responses serve to modulate activity in auditory regions, suggesting the recruitment of top-down processing to decode non-canonical pronunciations. In sum, our results suggest that the behavioural deficit in processing mispronounced phonemes may be due to a disruption to the typical exchange of information between the prefrontal and auditory cortices as observed for canonical speech.
Collapse
|
34
|
The many timescales of context in language processing. PSYCHOLOGY OF LEARNING AND MOTIVATION 2021. [DOI: 10.1016/bs.plm.2021.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
35
|
Wagley N, Lajiness-O'Neill R, Hay JSF, Ugolini M, Bowyer SM, Kovelman I, Brennan JR. Predictive Processing during a Naturalistic Statistical Learning Task in ASD. eNeuro 2020; 7:ENEURO.0069-19.2020. [PMID: 33199412 PMCID: PMC7729300 DOI: 10.1523/eneuro.0069-19.2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 08/11/2020] [Accepted: 10/01/2020] [Indexed: 11/21/2022] Open
Abstract
Children's sensitivity to regularities within the linguistic stream, such as the likelihood that syllables co-occur, is foundational to speech segmentation and language acquisition. Yet, little is known about the neurocognitive mechanisms underlying speech segmentation in typical development and in neurodevelopmental disorders that impact language acquisition such as autism spectrum disorder (ASD). Here, we investigate the neural signals of statistical learning in 15 human participants (children ages 8-12) with a clinical diagnosis of ASD and 14 age-matched and gender-matched typically developing peers. We tracked the evoked neural responses to syllable sequences in a naturalistic statistical learning corpus using magnetoencephalography (MEG) in the left primary auditory cortex, posterior superior temporal gyrus (pSTG), and inferior frontal gyrus (IFG), across three repetitions of the passage. In typically developing children, we observed a neural index of learning in all three regions of interest (ROIs), measured by the change in evoked response amplitude as a function of syllable surprisal across passage repetitions. As surprisal increased, the amplitude of the neural response increased; this sensitivity emerged after repeated exposure to the corpus. Children with ASD did not show this pattern of learning in all three regions. We discuss two possible hypotheses related to children's sensitivity to bottom-up sensory deficits and difficulty with top-down incremental processing.
Collapse
Affiliation(s)
- Neelima Wagley
- Department of Psychology and Human Development, Vanderbilt University, Nashville, TN 37205
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | | | - Jessica S F Hay
- Department of Psychology, University of Tennessee, Knoxville, TN 37996
| | - Margaret Ugolini
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Susan M Bowyer
- Department of Neurology, Henry Ford Hospital, Detroit, MI 48202
| | - Ioulia Kovelman
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | | |
Collapse
|
36
|
Sarrett ME, McMurray B, Kapnoula EC. Dynamic EEG analysis during language comprehension reveals interactive cascades between perceptual processing and sentential expectations. BRAIN AND LANGUAGE 2020; 211:104875. [PMID: 33086178 PMCID: PMC7682806 DOI: 10.1016/j.bandl.2020.104875] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 08/07/2020] [Accepted: 10/02/2020] [Indexed: 05/22/2023]
Abstract
Understanding spoken language requires analysis of the rapidly unfolding speech signal at multiple levels: acoustic, phonological, and semantic. However, there is not yet a comprehensive picture of how these levels relate. We recorded electroencephalography (EEG) while listeners (N = 31) heard sentences in which we manipulated acoustic ambiguity (e.g., a bees/peas continuum) and sentential expectations (e.g., Honey is made by bees). EEG was analyzed with a mixed effects model over time to quantify how language processing cascades proceed on a millisecond-by-millisecond basis. Our results indicate: (1) perceptual processing and memory for fine-grained acoustics is preserved in brain activity for up to 900 msec; (2) contextual analysis begins early and is graded with respect to the acoustic signal; and (3) top-down predictions influence perceptual processing in some cases, however, these predictions are available simultaneously with the veridical signal. These mechanistic insights provide a basis for a better understanding of the cortical language network.
Collapse
Affiliation(s)
- McCall E Sarrett
- Interdisciplinary Graduate Program in Neuroscience, 356 Medical Research Center, University of Iowa, Iowa City, IA, 52242, United States.
| | - Bob McMurray
- Department of Psychological & Brain Sciences, W311 Seashore Hall, University of Iowa, Iowa City, IA, 52242, United States
| | - Efthymia C Kapnoula
- Department of Psychological & Brain Sciences, W311 Seashore Hall, University of Iowa, Iowa City, IA, 52242, United States; Basque Center on Cognition, Brain, & Language, Mikeletegi Pasealekua, 69, 20009 Donostia, Gipuzkoa, Spain
| |
Collapse
|
37
|
Chan TMV, Alain C. Brain indices associated with semantic cues prior to and after a word in noise. Brain Res 2020; 1751:147206. [PMID: 33189693 DOI: 10.1016/j.brainres.2020.147206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 11/01/2020] [Accepted: 11/09/2020] [Indexed: 10/23/2022]
Abstract
It is well established that identification of words in noise improves when it is preceded by a semantically related word, but comparatively little is known about the effect of subsequent context in guiding word in noise identification. We build on the findings of a previous behavioural study (Chan & Alain, 2019) by measuring neuro-electric brain activity while manipulating the semantic content of a cue that either preceded or followed a word in noise. Participants were more accurate in identifying the word in noise when it was preceded or followed by a cue that was semantically related. This gain in accuracy coincided with a late positive component, which was time-locked to the word in noise when preceded by a cue and time-locked to the cue when it followed the word in noise. Distributed source analyses of this positive component revealed different patterns in source activity between the two temporal conditions. The effects of relatedness also generated an event-related potential modulation around 400 ms (N400) that was present at cue presentation when it followed the word in noise, but not for the word in noise when preceded by the cue, consistent with findings regarding its sensitivity to signal degradation. Exploratory analyses examined a subset of data based on participants' subjective perceived clarity, which revealed a posterior deflection over the left hemisphere that showed a relatedness effect. We discuss these findings in light of research on prediction as well as a reflective attention framework.
Collapse
Affiliation(s)
- T M Vanessa Chan
- Department of Psychology, University of Toronto, Sidney Smith Building, 100 St. George St., Toronto, Ontario M5S 3G3, Canada; Rotman Research Institute, Baycrest, 3560 Bathurst Street, Toronto, Ontario M6A 2E1, Canada
| | - Claude Alain
- Department of Psychology, University of Toronto, Sidney Smith Building, 100 St. George St., Toronto, Ontario M5S 3G3, Canada; Rotman Research Institute, Baycrest, 3560 Bathurst Street, Toronto, Ontario M6A 2E1, Canada; Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada; Faculty of Music, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
38
|
Gwilliams L, King JR. Recurrent processes support a cascade of hierarchical decisions. eLife 2020; 9:56603. [PMID: 32869746 PMCID: PMC7513462 DOI: 10.7554/elife.56603] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 08/30/2020] [Indexed: 11/13/2022] Open
Abstract
Perception depends on a complex interplay between feedforward and recurrent processing. Yet, while the former has been extensively characterized, the computational organization of the latter remains largely unknown. Here, we use magneto-encephalography to localize, track and decode the feedforward and recurrent processes of reading, as elicited by letters and digits whose level of ambiguity was parametrically manipulated. We first confirm that a feedforward response propagates through the ventral and dorsal pathways within the first 200 ms. The subsequent activity is distributed across temporal, parietal and prefrontal cortices, which sequentially generate five levels of representations culminating in action-specific motor signals. Our decoding analyses reveal that both the content and the timing of these brain responses are best explained by a hierarchy of recurrent neural assemblies, which both maintain and broadcast increasingly rich representations. Together, these results show how recurrent processes generate, over extended time periods, a cascade of decisions that ultimately accounts for subjects’ perceptual reports and reaction times.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, New York University, New York, United States.,NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
| | - Jean-Remi King
- Department of Psychology, New York University, New York, United States.,Frankfurt Institute for Advanced Studies, Frankfurt, Germany.,Laboratoire des Systèmes Perceptifs (CNRS UMR 8248), Département d'Études Cognitives, École Normale Supérieure, PSL University, Paris, France
| |
Collapse
|
39
|
Getz LM, Toscano JC. The time-course of speech perception revealed by temporally-sensitive neural measures. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2020; 12:e1541. [PMID: 32767836 DOI: 10.1002/wcs.1541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 05/28/2020] [Accepted: 06/26/2020] [Indexed: 11/07/2022]
Abstract
Recent advances in cognitive neuroscience have provided a detailed picture of the early time-course of speech perception. In this review, we highlight this work, placing it within the broader context of research on the neurobiology of speech processing, and discuss how these data point us toward new models of speech perception and spoken language comprehension. We focus, in particular, on temporally-sensitive measures that allow us to directly measure early perceptual processes. Overall, the data provide support for two key principles: (a) speech perception is based on gradient representations of speech sounds and (b) speech perception is interactive and receives input from higher-level linguistic context at the earliest stages of cortical processing. Implications for models of speech processing and the neurobiology of language more broadly are discussed. This article is categorized under: Psychology > Language Psychology > Perception and Psychophysics Neuroscience > Cognition.
Collapse
Affiliation(s)
- Laura M Getz
- Department of Psychological Sciences, University of San Diego, San Diego, California, USA
| | - Joseph C Toscano
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
| |
Collapse
|
40
|
Abstract
Abstract
Hierarchical structure and compositionality imbue human language with unparalleled expressive power and set it apart from other perception–action systems. However, neither formal nor neurobiological models account for how these defining computational properties might arise in a physiological system. I attempt to reconcile hierarchy and compositionality with principles from cell assembly computation in neuroscience; the result is an emerging theory of how the brain could convert distributed perceptual representations into hierarchical structures across multiple timescales while representing interpretable incremental stages of (de)compositional meaning. The model's architecture—a multidimensional coordinate system based on neurophysiological models of sensory processing—proposes that a manifold of neural trajectories encodes sensory, motor, and abstract linguistic states. Gain modulation, including inhibition, tunes the path in the manifold in accordance with behavior and is how latent structure is inferred. As a consequence, predictive information about upcoming sensory input during production and comprehension is available without a separate operation. The proposed processing mechanism is synthesized from current models of neural entrainment to speech, concepts from systems neuroscience and category theory, and a symbolic-connectionist computational model that uses time and rhythm to structure information. I build on evidence from cognitive neuroscience and computational modeling that suggests a formal and mechanistic alignment between structure building and neural oscillations, and moves toward unifying basic insights from linguistics and psycholinguistics with the currency of neural computation.
Collapse
Affiliation(s)
- Andrea E. Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
41
|
Dikker S, Assaneo MF, Gwilliams L, Wang L, Kösem A. Magnetoencephalography and Language. Neuroimaging Clin N Am 2020; 30:229-238. [PMID: 32336409 DOI: 10.1016/j.nic.2020.01.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
This article provides an overview of research that uses magnetoencephalography to understand the brain basis of human language. The cognitive processes and brain networks that have been implicated in written and spoken language comprehension and production are discussed in relation to different methodologies: we review event-related brain responses, research on the coupling of neural oscillations to speech, oscillatory coupling between brain regions (eg, auditory-motor coupling), and neural decoding approaches in naturalistic language comprehension.
Collapse
Affiliation(s)
- Suzanne Dikker
- Department of Psychology, New York University, 6 Washington Place #275, New York, NY 10003, USA.
| | - M Florencia Assaneo
- Department of Psychology, New York University, 6 Washington Place #275, New York, NY 10003, USA
| | - Laura Gwilliams
- Department of Psychology, New York University, 6 Washington Place #275, New York, NY 10003, USA; New York University Abu Dhabi Research Institute, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | - Lin Wang
- Department of Psychiatry, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, 149 Thirteenth Street, #2306, Charlestown, MA 02129, USA
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), CH Le Vinatier Bâtiment 452, 95, BD Pinel, Bron, Lyon 69675, France
| |
Collapse
|
42
|
King A, Wedel A. Greater Early Disambiguating Information for Less-Probable Words: The Lexicon Is Shaped by Incremental Processing. Open Mind (Camb) 2020; 4:1-12. [PMID: 32617441 PMCID: PMC7323847 DOI: 10.1162/opmi_a_00030] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 12/19/2019] [Indexed: 11/12/2022] Open
Abstract
There has been much work over the last century on optimization of the lexicon for efficient communication, with a particular focus on the form of words as an evolving balance between production ease and communicative accuracy. Zipf's law of abbreviation, the cross-linguistic trend for less-probable words to be longer, represents some of the strongest evidence the lexicon is shaped by a pressure for communicative efficiency. However, the various sounds that make up words do not all contribute the same amount of disambiguating information to a listener. Rather, the information a sound contributes depends in part on what specific lexical competitors exist in the lexicon. In addition, because the speech stream is perceived incrementally, early sounds in a word contribute on average more information than later sounds. Using a dataset of diverse languages, we demonstrate that, above and beyond containing more sounds, less-probable words contain sounds that convey more disambiguating information overall. We show further that this pattern tends to be strongest at word-beginnings, where sounds can contribute the most information.
Collapse
Affiliation(s)
- Adam King
- Department of Linguistics, University of Arizona
| | - Andrew Wedel
- Department of Linguistics, University of Arizona
| |
Collapse
|
43
|
Abstract
Morphemes (e.g. [tune], [-ful], [-ly]) are the basic blocks with which complex meaning is built. Here, I explore the critical role that morpho-syntactic rules play in forming the meaning of morphologically complex words, from two primary standpoints: (i) how semantically rich stem morphemes (e.g. explode, bake, post) combine with syntactic operators (e.g. -ion, -er, -age) to output a semantically predictable result; (ii) how this process can be understood in terms of mathematical operations, easily allowing the brain to generate representations of novel morphemes and comprehend novel words. With these ideas in mind, I offer a model of morphological processing that incorporates semantic and morpho-syntactic operations in service to meaning composition, and discuss how such a model could be implemented in the human brain. This article is part of the theme issue 'Towards mechanistic models of meaning composition'.
Collapse
Affiliation(s)
- Laura Gwilliams
- Psychology Department, New York University, New York, NY 10003, USA
| |
Collapse
|
44
|
Roque L, Karawani H, Gordon-Salant S, Anderson S. Effects of Age, Cognition, and Neural Encoding on the Perception of Temporal Speech Cues. Front Neurosci 2019; 13:749. [PMID: 31379494 PMCID: PMC6659127 DOI: 10.3389/fnins.2019.00749] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Accepted: 07/05/2019] [Indexed: 12/11/2022] Open
Abstract
Older adults commonly report difficulty understanding speech, particularly in adverse listening environments. These communication difficulties may exist in the absence of peripheral hearing loss. Older adults, both with normal hearing and with hearing loss, demonstrate temporal processing deficits that affect speech perception. The purpose of the present study is to investigate aging, cognition, and neural processing factors that may lead to deficits on perceptual tasks that rely on phoneme identification based on a temporal cue - vowel duration. A better understanding of the neural and cognitive impairments underlying temporal processing deficits could lead to more focused aural rehabilitation for improved speech understanding for older adults. This investigation was conducted in younger (YNH) and older normal-hearing (ONH) participants who completed three measures of cognitive functioning known to decline with age: working memory, processing speed, and inhibitory control. To evaluate perceptual and neural processing of auditory temporal contrasts, identification functions for the contrasting word-pair WHEAT and WEED were obtained on a nine-step continuum of vowel duration, and frequency-following responses (FFRs) and cortical auditory-evoked potentials (CAEPs) were recorded to the two endpoints of the continuum. Multiple linear regression analyses were conducted to determine the cognitive, peripheral, and/or central mechanisms that may contribute to perceptual performance. YNH participants demonstrated higher cognitive functioning on all three measures compared to ONH participants. The slope of the identification function was steeper in YNH than in ONH participants, suggesting a clearer distinction between the contrasting words in the YNH participants. FFRs revealed better response waveform morphology and more robust phase-locking in YNH compared to ONH participants. ONH participants also exhibited earlier latencies for CAEP components compared to the YNH participants. Linear regression analyses revealed that cortical processing significantly contributed to the variance in perceptual performance in the WHEAT/WEED identification functions. These results suggest that reduced neural precision contributes to age-related speech perception difficulties that arise from temporal processing deficits.
Collapse
Affiliation(s)
- Lindsey Roque
- Department of Hearing and Speech Sciences, University of Maryland, College Park, College Park, MD, United States
| | - Hanin Karawani
- Department of Hearing and Speech Sciences, University of Maryland, College Park, College Park, MD, United States.,Department of Communication Sciences and Disorders, University of Haifa, Haifa, Israel
| | - Sandra Gordon-Salant
- Department of Hearing and Speech Sciences, University of Maryland, College Park, College Park, MD, United States
| | - Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
45
|
Yi HG, Leonard MK, Chang EF. The Encoding of Speech Sounds in the Superior Temporal Gyrus. Neuron 2019; 102:1096-1110. [PMID: 31220442 PMCID: PMC6602075 DOI: 10.1016/j.neuron.2019.04.023] [Citation(s) in RCA: 187] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/08/2019] [Accepted: 04/16/2019] [Indexed: 01/02/2023]
Abstract
The human superior temporal gyrus (STG) is critical for extracting meaningful linguistic features from speech input. Local neural populations are tuned to acoustic-phonetic features of all consonants and vowels and to dynamic cues for intonational pitch. These populations are embedded throughout broader functional zones that are sensitive to amplitude-based temporal cues. Beyond speech features, STG representations are strongly modulated by learned knowledge and perceptual goals. Currently, a major challenge is to understand how these features are integrated across space and time in the brain during natural speech comprehension. We present a theory that temporally recurrent connections within STG generate context-dependent phonological representations, spanning longer temporal sequences relevant for coherent percepts of syllables, words, and phrases.
Collapse
Affiliation(s)
- Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
46
|
Maintaining information about speech input during accent adaptation. PLoS One 2018; 13:e0199358. [PMID: 30086140 PMCID: PMC6080756 DOI: 10.1371/journal.pone.0199358] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Accepted: 06/06/2018] [Indexed: 11/19/2022] Open
Abstract
Speech understanding can be thought of as inferring progressively more abstract representations from a rapidly unfolding signal. One common view of this process holds that lower-level information is discarded as soon as higher-level units have been inferred. However, there is evidence that subcategorical information about speech percepts is not immediately discarded, but is maintained past word boundaries and integrated with subsequent input. Previous evidence for such subcategorical information maintenance has come from paradigms that lack many of the demands typical to everyday language use. We ask whether information maintenance is also possible under more typical constraints, and in particular whether it can facilitate accent adaptation. In a web-based paradigm, participants listened to isolated foreign-accented words in one of three conditions: subtitles were displayed concurrently with the speech, after speech offset, or not displayed at all. The delays between speech offset and subtitle presentation were manipulated. In a subsequent test phase, participants then transcribed novel words in the same accent without the aid of subtitles. We find that subtitles facilitate accent adaptation, even when displayed with a 6 second delay. Listeners thus maintained subcategorical information for sufficiently long to allow it to benefit adaptation. We close by discussing what type of information listeners maintain-subcategorical phonetic information, or just uncertainty about speech categories.
Collapse
|