1
|
Norman-Haignere SV, Keshishian MK, Devinsky O, Doyle W, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614358. [PMID: 39386565 PMCID: PMC11463558 DOI: 10.1101/2024.09.23.614358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Sound structures such as phonemes and words have highly variable durations. Thus, there is a fundamental difference between integrating across absolute time (e.g., 100 ms) vs. sound structure (e.g., phonemes). Auditory and cognitive models have traditionally cast neural integration in terms of time and structure, respectively, but the extent to which cortical computations reflect time or structure remains unknown. To answer this question, we rescaled the duration of all speech structures using time stretching/compression and measured integration windows in the human auditory cortex using a new experimental/computational method applied to spatiotemporally precise intracranial recordings. We observed significantly longer integration windows for stretched speech, but this lengthening was very small (∼5%) relative to the change in structure durations, even in non-primary regions strongly implicated in speech-specific processing. These findings demonstrate that time-yoked computations dominate throughout the human auditory cortex, placing important constraints on neurocomputational models of structure processing.
Collapse
|
2
|
Choudhari V, Han C, Bickel S, Mehta AD, Schevon C, McKhann GM, Mesgarani N. Brain-Controlled Augmented Hearing for Spatially Moving Conversations in Multi-Talker Environments. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2401379. [PMID: 39248654 DOI: 10.1002/advs.202401379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/17/2024] [Indexed: 09/10/2024]
Abstract
Focusing on a specific conversation amidst multiple interfering talkers is challenging, especially for those with hearing loss. Brain-controlled assistive hearing devices aim to alleviate this problem by enhancing the attended speech based on the listener's neural signals using auditory attention decoding (AAD). Departing from conventional AAD studies that relied on oversimplified scenarios with stationary talkers, a realistic AAD task that involves multiple talkers taking turns as they continuously move in space in background noise is presented. Invasive electroencephalography (iEEG) data are collected from three neurosurgical patients as they focused on one of the two moving conversations. An enhanced brain-controlled assistive hearing system that combines AAD and a binaural speaker-independent speech separation model is presented. The separation model unmixes talkers while preserving their spatial location and provides talker trajectories to the neural decoder to improve AAD accuracy. Subjective and objective evaluations show that the proposed system enhances speech intelligibility and facilitates conversation tracking while maintaining spatial cues and voice quality in challenging acoustic environments. This research demonstrates the potential of this approach in real-world scenarios and marks a significant step toward developing assistive hearing technologies that adapt to the intricate dynamics of everyday auditory experiences.
Collapse
Affiliation(s)
- Vishal Choudhari
- Department of Electrical Engineering, Columbia University, New York, NY, 10027, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, New York, NY, 10027, USA
| | - Cong Han
- Department of Electrical Engineering, Columbia University, New York, NY, 10027, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, New York, NY, 10027, USA
| | - Stephan Bickel
- Hofstra Northwell School of Medicine, Uniondale, NY, 11549, USA
- The Feinstein Institutes for Medical Research, Manhasset, NY, 11030, USA
| | - Ashesh D Mehta
- Hofstra Northwell School of Medicine, Uniondale, NY, 11549, USA
- The Feinstein Institutes for Medical Research, Manhasset, NY, 11030, USA
| | - Catherine Schevon
- Department of Neurology, Columbia University, New York, NY, 10027, USA
| | - Guy M McKhann
- Department of Neurological Surgery, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, NY, 10027, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, 10027, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, New York, NY, 10027, USA
| |
Collapse
|
3
|
Desai M, Field AM, Hamilton LS. A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts. PLoS Comput Biol 2024; 20:e1012433. [PMID: 39250485 PMCID: PMC11412666 DOI: 10.1371/journal.pcbi.1012433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 09/19/2024] [Accepted: 08/21/2024] [Indexed: 09/11/2024] Open
Abstract
Communication in the real world is inherently multimodal. When having a conversation, typically sighted and hearing people use both auditory and visual cues to understand one another. For example, objects may make sounds as they move in space, or we may use the movement of a person's mouth to better understand what they are saying in a noisy environment. Still, many neuroscience experiments rely on unimodal stimuli to understand encoding of sensory features in the brain. The extent to which visual information may influence encoding of auditory information and vice versa in natural environments is thus unclear. Here, we addressed this question by recording scalp electroencephalography (EEG) in 11 subjects as they listened to and watched movie trailers in audiovisual (AV), visual (V) only, and audio (A) only conditions. We then fit linear encoding models that described the relationship between the brain responses and the acoustic, phonetic, and visual information in the stimuli. We also compared whether auditory and visual feature tuning was the same when stimuli were presented in the original AV format versus when visual or auditory information was removed. In these stimuli, visual and auditory information was relatively uncorrelated, and included spoken narration over a scene as well as animated or live-action characters talking with and without their face visible. For this stimulus, we found that auditory feature tuning was similar in the AV and A-only conditions, and similarly, tuning for visual information was similar when stimuli were presented with the audio present (AV) and when the audio was removed (V only). In a cross prediction analysis, we investigated whether models trained on AV data predicted responses to A or V only test data similarly to models trained on unimodal data. Overall, prediction performance using AV training and V test sets was similar to using V training and V test sets, suggesting that the auditory information has a relatively smaller effect on EEG. In contrast, prediction performance using AV training and A only test set was slightly worse than using matching A only training and A only test sets. This suggests the visual information has a stronger influence on EEG, though this makes no qualitative difference in the derived feature tuning. In effect, our results show that researchers may benefit from the richness of multimodal datasets, which can then be used to answer more than one research question.
Collapse
Affiliation(s)
- Maansi Desai
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas, United States of America
| | - Alyssa M Field
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas, United States of America
| | - Liberty S Hamilton
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
4
|
Teng X, Larrouy-Maestri P, Poeppel D. Segmenting and Predicting Musical Phrase Structure Exploits Neural Gain Modulation and Phase Precession. J Neurosci 2024; 44:e1331232024. [PMID: 38926087 PMCID: PMC11270514 DOI: 10.1523/jneurosci.1331-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 05/29/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
Music, like spoken language, is often characterized by hierarchically organized structure. Previous experiments have shown neural tracking of notes and beats, but little work touches on the more abstract question: how does the brain establish high-level musical structures in real time? We presented Bach chorales to participants (20 females and 9 males) undergoing electroencephalogram (EEG) recording to investigate how the brain tracks musical phrases. We removed the main temporal cues to phrasal structures, so that listeners could only rely on harmonic information to parse a continuous musical stream. Phrasal structures were disrupted by locally or globally reversing the harmonic progression, so that our observations on the original music could be controlled and compared. We first replicated the findings on neural tracking of musical notes and beats, substantiating the positive correlation between musical training and neural tracking. Critically, we discovered a neural signature in the frequency range ∼0.1 Hz (modulations of EEG power) that reliably tracks musical phrasal structure. Next, we developed an approach to quantify the phrasal phase precession of the EEG power, revealing that phrase tracking is indeed an operation of active segmentation involving predictive processes. We demonstrate that the brain establishes complex musical structures online over long timescales (>5 s) and actively segments continuous music streams in a manner comparable to language processing. These two neural signatures, phrase tracking and phrasal phase precession, provide new conceptual and technical tools to study the processes underpinning high-level structure building using noninvasive recording techniques.
Collapse
Affiliation(s)
- Xiangbin Teng
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Pauline Larrouy-Maestri
- Music Department, Max-Planck-Institute for Empirical Aesthetics, Frankfurt 60322, Germany
- Center for Language, Music, and Emotion (CLaME), New York, New York 10003
| | - David Poeppel
- Center for Language, Music, and Emotion (CLaME), New York, New York 10003
- Department of Psychology, New York University, New York, New York 10003
- Ernst Struengmann Institute for Neuroscience, Frankfurt 60528, Germany
- Music and Audio Research Laboratory (MARL), New York, New York 11201
| |
Collapse
|
5
|
Lavan N, Rinke P, Scharinger M. The time course of person perception from voices in the brain. Proc Natl Acad Sci U S A 2024; 121:e2318361121. [PMID: 38889147 PMCID: PMC11214051 DOI: 10.1073/pnas.2318361121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 04/26/2024] [Indexed: 06/20/2024] Open
Abstract
When listeners hear a voice, they rapidly form a complex first impression of who the person behind that voice might be. We characterize how these multivariate first impressions from voices emerge over time across different levels of abstraction using electroencephalography and representational similarity analysis. We find that for eight perceived physical (gender, age, and health), trait (attractiveness, dominance, and trustworthiness), and social characteristics (educatedness and professionalism), representations emerge early (~80 ms after stimulus onset), with voice acoustics contributing to those representations between ~100 ms and 400 ms. While impressions of person characteristics are highly correlated, we can find evidence for highly abstracted, independent representations of individual person characteristics. These abstracted representationse merge gradually over time. That is, representations of physical characteristics (age, gender) arise early (from ~120 ms), while representations of some trait and social characteristics emerge later (~360 ms onward). The findings align with recent theoretical models and shed light on the computations underpinning person perception from voices.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, LondonE1 4NS, United Kingdom
| | - Paula Rinke
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Marburg35037, Germany
| | - Mathias Scharinger
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Marburg35037, Germany
- Research Center “Deutscher Sprachatlas”, Philipps-University Marburg, Marburg35037, Germany
- Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Marburg35032, Germany
| |
Collapse
|
6
|
Morgan AM, Devinsky O, Doyle WK, Dugan P, Friedman D, Flinker A. A low-activity cortical network selectively encodes syntax. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.20.599931. [PMID: 38948730 PMCID: PMC11212956 DOI: 10.1101/2024.06.20.599931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Syntax, the abstract structure of language, is a hallmark of human cognition. Despite its importance, its neural underpinnings remain obscured by inherent limitations of non-invasive brain measures and a near total focus on comprehension paradigms. Here, we address these limitations with high-resolution neurosurgical recordings (electrocorticography) and a controlled sentence production experiment. We uncover three syntactic networks that are broadly distributed across traditional language regions, but with focal concentrations in middle and inferior frontal gyri. In contrast to previous findings from comprehension studies, these networks process syntax mostly to the exclusion of words and meaning, supporting a cognitive architecture with a distinct syntactic system. Most strikingly, our data reveal an unexpected property of syntax: it is encoded independent of neural activity levels. We propose that this "low-activity coding" scheme represents a novel mechanism for encoding information, reserved for higher-order cognition more broadly.
Collapse
Affiliation(s)
- Adam M. Morgan
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Orrin Devinsky
- Neurosurgery Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Werner K. Doyle
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Patricia Dugan
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Daniel Friedman
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Adeen Flinker
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
- Biomedical Engineering Department, NYU Tandon School of Engineering, 6 MetroTech Center Ave, Brooklyn, 11201, NY, USA
| |
Collapse
|
7
|
Sorensen DO, Avcu E, Lynch S, Ahlfors SP, Gow DW. Neural representation of phonological wordform in temporal cortex. Psychon Bull Rev 2024:10.3758/s13423-024-02511-6. [PMID: 38689188 DOI: 10.3758/s13423-024-02511-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/08/2024] [Indexed: 05/02/2024]
Abstract
While the neural bases of the earliest stages of speech categorization have been widely explored using neural decoding methods, there is still a lack of consensus on questions as basic as how wordforms are represented and in what way this word-level representation influences downstream processing in the brain. Isolating and localizing the neural representations of wordform is challenging because spoken words activate a variety of representations (e.g., segmental, semantic, articulatory) in addition to form-based representations. We addressed these challenges through a novel integrated neural decoding and effective connectivity design using region of interest (ROI)-based, source-reconstructed magnetoencephalography/electroencephalography (MEG/EEG) data collected during a lexical decision task. To identify wordform representations, we trained classifiers on words and nonwords from different phonological neighborhoods and then tested the classifiers' ability to discriminate between untrained target words that overlapped phonologically with the trained items. Training with word neighbors supported significantly better decoding than training with nonword neighbors in the period immediately following target presentation. Decoding regions included mostly right hemisphere regions in the posterior temporal lobe implicated in phonetic and lexical representation. Additionally, neighbors that aligned with target word beginnings (critical for word recognition) supported decoding, but equivalent phonological overlap with word codas did not, suggesting lexical mediation. Effective connectivity analyses showed a rich pattern of interaction between ROIs that support decoding based on training with lexical neighbors, especially driven by right posterior middle temporal gyrus. Collectively, these results evidence functional representation of wordforms in temporal lobes isolated from phonemic or semantic representations.
Collapse
Affiliation(s)
- David O Sorensen
- Division of Medical Sciences, Harvard Medical School, Cambridge, MA, USA
| | - Enes Avcu
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Skyla Lynch
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Seppo P Ahlfors
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA
- Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - David W Gow
- Division of Medical Sciences, Harvard Medical School, Cambridge, MA, USA.
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA.
- Department of Psychology, Salem State University, Salem, MA, USA.
- Neurodynamics and Neural Decoding Group, Massachusetts General Hospital, 65 Landsdowne Street, rm 219, Cambridge, MA, 02139, USA.
| |
Collapse
|
8
|
Gwilliams L, Marantz A, Poeppel D, King JR. Hierarchical dynamic coding coordinates speech comprehension in the brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590280. [PMID: 38659750 PMCID: PMC11042271 DOI: 10.1101/2024.04.19.590280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Speech comprehension requires the human brain to transform an acoustic waveform into meaning. To do so, the brain generates a hierarchy of features that converts the sensory input into increasingly abstract language properties. However, little is known about how these hierarchical features are generated and continuously coordinated. Here, we propose that each linguistic feature is dynamically represented in the brain to simultaneously represent successive events. To test this 'Hierarchical Dynamic Coding' (HDC) hypothesis, we use time-resolved decoding of brain activity to track the construction, maintenance, and integration of a comprehensive hierarchy of language features spanning acoustic, phonetic, sub-lexical, lexical, syntactic and semantic representations. For this, we recorded 21 participants with magnetoencephalography (MEG), while they listened to two hours of short stories. Our analyses reveal three main findings. First, the brain incrementally represents and simultaneously maintains successive features. Second, the duration of these representations depend on their level in the language hierarchy. Third, each representation is maintained by a dynamic neural code, which evolves at a speed commensurate with its corresponding linguistic level. This HDC preserves the maintenance of information over time while limiting the interference between successive features. Overall, HDC reveals how the human brain continuously builds and maintains a language hierarchy during natural speech comprehension, thereby anchoring linguistic theories to their biological implementations.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, Stanford University
- Department of Psychology, New York University
| | - Alec Marantz
- Department of Psychology, New York University
- Department of Linguistics, New York University
| | - David Poeppel
- Department of Psychology, New York University
- Ernst Strungman Institute
| | | |
Collapse
|
9
|
Casilio M, Kasdan AV, Schneck SM, Entrup JL, Levy DF, Crouch K, Wilson SM. Situating word deafness within aphasia recovery: A case report. Cortex 2024; 173:96-119. [PMID: 38387377 PMCID: PMC11073474 DOI: 10.1016/j.cortex.2023.12.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 10/02/2023] [Accepted: 12/26/2023] [Indexed: 02/24/2024]
Abstract
Word deafness is a rare neurological disorder often observed following bilateral damage to superior temporal cortex and canonically defined as an auditory modality-specific deficit in word comprehension. The extent to which word deafness is dissociable from aphasia remains unclear given its heterogeneous presentation, and some have consequently posited that word deafness instead represents a stage in recovery from aphasia, where auditory and linguistic processing are affected to varying degrees and improve at differing rates. Here, we report a case of an individual (Mr. C) with bilateral temporal lobe lesions whose presentation evolved from a severe aphasia to an atypical form of word deafness, where auditory linguistic processing was impaired at the sentence level and beyond. We first reconstructed in detail Mr. C's stroke recovery through medical record review and supplemental interviewing. Then, using behavioral testing and multimodal neuroimaging, we documented a predominant auditory linguistic deficit in sentence and narrative comprehension-with markedly reduced behavioral performance and absent brain activation in the language network in the spoken modality exclusively. In contrast, Mr. C displayed near-unimpaired behavioral performance and robust brain activations in the language network for the linguistic processing of words, irrespective of modality. We argue that these findings not only support the view of word deafness as a stage in aphasia recovery but also further instantiate the important role of left superior temporal cortex in auditory linguistic processing.
Collapse
Affiliation(s)
| | - Anna V Kasdan
- Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Brain Institute, TN, USA
| | | | | | - Deborah F Levy
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Kelly Crouch
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Stephen M Wilson
- Vanderbilt University Medical Center, Nashville, TN, USA; School of Health and Rehabilitation Sciences, University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
10
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
11
|
Hjortdal A, Frid J, Novén M, Roll M. Swift Prosodic Modulation of Lexical Access: Brain Potentials From Three North Germanic Language Varieties. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:400-414. [PMID: 38306498 DOI: 10.1044/2023_jslhr-23-00193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2024]
Abstract
PURPOSE According to most models of spoken word recognition, listeners probabilistically activate a set of lexical candidates, which is incrementally updated as the speech signal unfolds. Speech carries segmental (speech sound) as well as suprasegmental (prosodic) information. The role of the latter in spoken word recognition is less clear. We investigated how suprasegments (tone and voice quality) in three North Germanic language varieties affected lexical access by scrutinizing temporally fine-grained neurophysiological effects of lexical uncertainty and information gain. METHOD Three event-related potential (ERP) studies were reanalyzed. In all varieties investigated, suprasegments are associated with specific word endings. Swedish has two lexical "word accents" realized as pitch falls with different timings across dialects. In Danish, the distinction is in voice quality. We combined pronunciation lexica and frequency lists to calculate estimates of lexical uncertainty about an unfolding word and information gain upon hearing a suprasegmental cue and the segment upon which it manifests. We used single-trial mixed-effects regression models run every 4 ms. RESULTS Only lexical uncertainty showed solid results: a frontal effect at 150-400 ms after suprasegmental cue onset and a later posterior effect after 200 ms. While a model including only segmental information mostly performed better, it was outperformed by the suprasegmental model at 200-330 ms at frontal sites. CONCLUSIONS The study points to suprasegmental cues contributing to lexical access over and beyond segments after around 200 ms in the North Germanic varieties investigated. Furthermore, the findings indicate that a previously reported "pre-activation negativity" predominantly reflects forward-looking processing. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.25016486.
Collapse
Affiliation(s)
- Anna Hjortdal
- Centre for Languages and Literature, Lund University, Sweden
| | - Johan Frid
- Lund University Humanities Lab, Lund University, Sweden
| | - Mikael Novén
- Department of Nutrition, Exercise and Sports, University of Copenhagen, Denmark
| | - Mikael Roll
- Centre for Languages and Literature, Lund University, Sweden
| |
Collapse
|
12
|
Young MJ, Fecchio M, Bodien YG, Edlow BL. Covert cortical processing: a diagnosis in search of a definition. Neurosci Conscious 2024; 2024:niad026. [PMID: 38327828 PMCID: PMC10849751 DOI: 10.1093/nc/niad026] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/22/2023] [Accepted: 12/10/2023] [Indexed: 02/09/2024] Open
Abstract
Historically, clinical evaluation of unresponsive patients following brain injury has relied principally on serial behavioral examination to search for emerging signs of consciousness and track recovery. Advances in neuroimaging and electrophysiologic techniques now enable clinicians to peer into residual brain functions even in the absence of overt behavioral signs. These advances have expanded clinicians' ability to sub-stratify behaviorally unresponsive and seemingly unaware patients following brain injury by querying and classifying covert brain activity made evident through active or passive neuroimaging or electrophysiologic techniques, including functional MRI, electroencephalography (EEG), transcranial magnetic stimulation-EEG, and positron emission tomography. Clinical research has thus reciprocally influenced clinical practice, giving rise to new diagnostic categories including cognitive-motor dissociation (i.e. 'covert consciousness') and covert cortical processing (CCP). While covert consciousness has received extensive attention and study, CCP is relatively less understood. We describe that CCP is an emerging and clinically relevant state of consciousness marked by the presence of intact association cortex responses to environmental stimuli in the absence of behavioral evidence of stimulus processing. CCP is not a monotonic state but rather encapsulates a spectrum of possible association cortex responses from rudimentary to complex and to a range of possible stimuli. In constructing a roadmap for this evolving field, we emphasize that efforts to inform clinicians, philosophers, and researchers of this condition are crucial. Along with strategies to sensitize diagnostic criteria and disorders of consciousness nosology to these vital discoveries, democratizing access to the resources necessary for clinical identification of CCP is an emerging clinical and ethical imperative.
Collapse
Affiliation(s)
- Michael J Young
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA 02114, USA
| | - Matteo Fecchio
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA 02114, USA
| | - Yelena G Bodien
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA 02114, USA
- Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital, Harvard Medical School, 300 1st Ave, Charlestown, Boston, MA 02129, USA
| | - Brian L Edlow
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA 02114, USA
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, 149 13th St, Charlestown, Charlestown, MA 02129, USA
| |
Collapse
|
13
|
Karunathilake IMD, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are modulated by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers of the computations underlying speech perception and comprehension.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| |
Collapse
|
14
|
Hedrick M, Thornton K. Reaction time for correct identification of vowels in consonant-vowel syllables and of vowel segments. JASA EXPRESS LETTERS 2024; 4:015205. [PMID: 38214609 DOI: 10.1121/10.0024334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 12/26/2023] [Indexed: 01/13/2024]
Abstract
Reaction times for correct vowel identification were measured to determine the effects of intertrial intervals, vowel, and cue type. Thirteen adults with normal hearing, aged 20-38 years old, participated. Stimuli included three naturally produced syllables (/ba/ /bi/ /bu/) presented whole or segmented to isolate the formant transition or static formant center. Participants identified the vowel presented via loudspeaker by mouse click. Results showed a significant effect of intertrial intervals, no significant effect of cue type, and a significant vowel effect-suggesting that feedback occurs, vowel identification may depend on cue duration, and vowel bias may stem from focal structure.
Collapse
Affiliation(s)
- Mark Hedrick
- Department of Audiology and Speech Pathology, The University of Tennessee Health Science Center, Knoxville, Tennessee 37996, USA
| | - Kristen Thornton
- Department of Hearing, Speech, and Language Sciences, Gallaudet University, Washington, DC 20002, ,
| |
Collapse
|
15
|
Gwilliams L, Flick G, Marantz A, Pylkkänen L, Poeppel D, King JR. Introducing MEG-MASC a high-quality magneto-encephalography dataset for evaluating natural speech processing. Sci Data 2023; 10:862. [PMID: 38049487 PMCID: PMC10695966 DOI: 10.1038/s41597-023-02752-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/16/2023] [Indexed: 12/06/2023] Open
Abstract
The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers who listened to two hours of naturalistic stories. Each participant performed two identical sessions, involving listening to four fictional stories from the Manually Annotated Sub-Corpus (MASC) intermixed with random word lists and comprehension questions. We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS). This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech. We provide the Python code to replicate several validations analyses of the MEG evoked responses such as the temporal decoding of phonetic features and word frequency. All code and MEG, audio and text data are publicly available to keep with best practices in transparent and reproducible research.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, Stanford University, Stanford, USA.
- Department of Psychology, New York University, New York, USA.
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates.
| | - Graham Flick
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
- Rotman Research Institute, Baycrest Hospital, Toronto, Canada
| | - Alec Marantz
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
| | - Liina Pylkkänen
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, USA
- Ernst Struengmann Institute for Neuroscience, Frankfurt, Germany
| | - Jean-Rémi King
- Department of Psychology, New York University, New York, USA
- LSP, École normale supérieure, PSL University, CNRS, 75005, Paris, France
| |
Collapse
|
16
|
Duraivel S, Rahimpour S, Chiang CH, Trumpis M, Wang C, Barth K, Harward SC, Lad SP, Friedman AH, Southwell DG, Sinha SR, Viventi J, Cogan GB. High-resolution neural recordings improve the accuracy of speech decoding. Nat Commun 2023; 14:6938. [PMID: 37932250 PMCID: PMC10628285 DOI: 10.1038/s41467-023-42555-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 10/13/2023] [Indexed: 11/08/2023] Open
Abstract
Patients suffering from debilitating neurodegenerative diseases often lose the ability to communicate, detrimentally affecting their quality of life. One solution to restore communication is to decode signals directly from the brain to enable neural speech prostheses. However, decoding has been limited by coarse neural recordings which inadequately capture the rich spatio-temporal structure of human brain signals. To resolve this limitation, we performed high-resolution, micro-electrocorticographic (µECoG) neural recordings during intra-operative speech production. We obtained neural signals with 57× higher spatial resolution and 48% higher signal-to-noise ratio compared to macro-ECoG and SEEG. This increased signal quality improved decoding by 35% compared to standard intracranial signals. Accurate decoding was dependent on the high-spatial resolution of the neural interface. Non-linear decoding models designed to utilize enhanced spatio-temporal neural information produced better results than linear techniques. We show that high-density µECoG can enable high-quality speech decoding for future neural speech prostheses.
Collapse
Affiliation(s)
| | - Shervin Rahimpour
- Department of Neurosurgery, Duke School of Medicine, Durham, NC, USA
- Department of Neurosurgery, Clinical Neuroscience Center, University of Utah, Salt Lake City, UT, USA
| | - Chia-Han Chiang
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Michael Trumpis
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Charles Wang
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Katrina Barth
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Stephen C Harward
- Department of Neurosurgery, Duke School of Medicine, Durham, NC, USA
- Duke Comprehensive Epilepsy Center, Duke School of Medicine, Durham, NC, USA
| | - Shivanand P Lad
- Department of Neurosurgery, Duke School of Medicine, Durham, NC, USA
| | - Allan H Friedman
- Department of Neurosurgery, Duke School of Medicine, Durham, NC, USA
| | - Derek G Southwell
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Department of Neurosurgery, Duke School of Medicine, Durham, NC, USA
- Duke Comprehensive Epilepsy Center, Duke School of Medicine, Durham, NC, USA
- Department of Neurobiology, Duke School of Medicine, Durham, NC, USA
| | - Saurabh R Sinha
- Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jonathan Viventi
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Department of Neurosurgery, Duke School of Medicine, Durham, NC, USA.
- Duke Comprehensive Epilepsy Center, Duke School of Medicine, Durham, NC, USA.
- Department of Neurobiology, Duke School of Medicine, Durham, NC, USA.
| | - Gregory B Cogan
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Department of Neurosurgery, Duke School of Medicine, Durham, NC, USA.
- Duke Comprehensive Epilepsy Center, Duke School of Medicine, Durham, NC, USA.
- Department of Neurology, Duke School of Medicine, Durham, NC, USA.
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA.
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA.
| |
Collapse
|
17
|
Puffay C, Vanthornhout J, Gillis M, Accou B, Van Hamme H, Francart T. Robust neural tracking of linguistic speech representations using a convolutional neural network. J Neural Eng 2023; 20:046040. [PMID: 37595606 DOI: 10.1088/1741-2552/acf1ce] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/18/2023] [Indexed: 08/20/2023]
Abstract
Objective.When listening to continuous speech, populations of neurons in the brain track different features of the signal. Neural tracking can be measured by relating the electroencephalography (EEG) and the speech signal. Recent studies have shown a significant contribution of linguistic features over acoustic neural tracking using linear models. However, linear models cannot model the nonlinear dynamics of the brain. To overcome this, we use a convolutional neural network (CNN) that relates EEG to linguistic features using phoneme or word onsets as a control and has the capacity to model non-linear relations.Approach.We integrate phoneme- and word-based linguistic features (phoneme surprisal, cohort entropy (CE), word surprisal (WS) and word frequency (WF)) in our nonlinear CNN model and investigate if they carry additional information on top of lexical features (phoneme and word onsets). We then compare the performance of our nonlinear CNN with that of a linear encoder and a linearized CNN.Main results.For the non-linear CNN, we found a significant contribution of CE over phoneme onsets and of WS and WF over word onsets. Moreover, the non-linear CNN outperformed the linear baselines.Significance.Measuring coding of linguistic features in the brain is important for auditory neuroscience research and applications that involve objectively measuring speech understanding. With linear models, this is measurable, but the effects are very small. The proposed non-linear CNN model yields larger differences between linguistic and lexical models and, therefore, could show effects that would otherwise be unmeasurable and may, in the future, lead to improved within-subject measures and shorter recordings.
Collapse
Affiliation(s)
- Corentin Puffay
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | | | - Marlies Gillis
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Bernd Accou
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| |
Collapse
|
18
|
Sorensen DO, Avcu E, Lynch S, Ahlfors SP, Gow DW. Neural representation of phonological wordform in bilateral posterior temporal cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.19.549751. [PMID: 37503242 PMCID: PMC10370090 DOI: 10.1101/2023.07.19.549751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
While the neural bases of the earliest stages of speech categorization have been widely explored using neural decoding methods, there is still a lack of consensus on questions as basic as how wordforms are represented and in what way this word-level representation influences downstream processing in the brain. Isolating and localizing the neural representations of wordform is challenging because spoken words evoke activation of a variety of representations (e.g., segmental, semantic, articulatory) in addition to form-based representations. We addressed these challenges through a novel integrated neural decoding and effective connectivity design using region of interest (ROI)-based, source reconstructed magnetoencephalography/electroencephalography (MEG/EEG) data collected during a lexical decision task. To localize wordform representations, we trained classifiers on words and nonwords from different phonological neighborhoods and then tested the classifiers' ability to discriminate between untrained target words that overlapped phonologically with the trained items. Training with either word or nonword neighbors supported decoding in many brain regions during an early analysis window (100-400 ms) reflecting primarily incremental phonological processing. Training with word neighbors, but not nonword neighbors, supported decoding in a bilateral set of temporal lobe ROIs, in a later time window (400-600 ms) reflecting activation related to word recognition. These ROIs included bilateral posterior temporal regions implicated in wordform representation. Effective connectivity analyses among regions within this subset indicated that word-evoked activity influenced the decoding accuracy more than nonword-evoked activity did. Taken together, these results evidence functional representation of wordforms in bilateral temporal lobes isolated from phonemic or semantic representations.
Collapse
|
19
|
Tezcan F, Weissbart H, Martin AE. A tradeoff between acoustic and linguistic feature encoding in spoken language comprehension. eLife 2023; 12:e82386. [PMID: 37417736 PMCID: PMC10328533 DOI: 10.7554/elife.82386] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 06/18/2023] [Indexed: 07/08/2023] Open
Abstract
When we comprehend language from speech, the phase of the neural response aligns with particular features of the speech input, resulting in a phenomenon referred to as neural tracking. In recent years, a large body of work has demonstrated the tracking of the acoustic envelope and abstract linguistic units at the phoneme and word levels, and beyond. However, the degree to which speech tracking is driven by acoustic edges of the signal, or by internally-generated linguistic units, or by the interplay of both, remains contentious. In this study, we used naturalistic story-listening to investigate (1) whether phoneme-level features are tracked over and above acoustic edges, (2) whether word entropy, which can reflect sentence- and discourse-level constraints, impacted the encoding of acoustic and phoneme-level features, and (3) whether the tracking of acoustic edges was enhanced or suppressed during comprehension of a first language (Dutch) compared to a statistically familiar but uncomprehended language (French). We first show that encoding models with phoneme-level linguistic features, in addition to acoustic features, uncovered an increased neural tracking response; this signal was further amplified in a comprehended language, putatively reflecting the transformation of acoustic features into internally generated phoneme-level representations. Phonemes were tracked more strongly in a comprehended language, suggesting that language comprehension functions as a neural filter over acoustic edges of the speech signal as it transforms sensory signals into abstract linguistic units. We then show that word entropy enhances neural tracking of both acoustic and phonemic features when sentence- and discourse-context are less constraining. When language was not comprehended, acoustic features, but not phonemic ones, were more strongly modulated, but in contrast, when a native language is comprehended, phoneme features are more strongly modulated. Taken together, our findings highlight the flexible modulation of acoustic, and phonemic features by sentence and discourse-level constraint in language comprehension, and document the neural transformation from speech perception to language comprehension, consistent with an account of language processing as a neural filter from sensory to abstract representations.
Collapse
Affiliation(s)
- Filiz Tezcan
- Language and Computation in Neural Systems Group, Max Planck Institute for PsycholinguisticsNijmegenNetherlands
| | - Hugo Weissbart
- Donders Centre for Cognitive Neuroimaging, Radboud UniversityNijmegenNetherlands
| | - Andrea E Martin
- Language and Computation in Neural Systems Group, Max Planck Institute for PsycholinguisticsNijmegenNetherlands
- Donders Centre for Cognitive Neuroimaging, Radboud UniversityNijmegenNetherlands
| |
Collapse
|
20
|
Raghavan VS, O’Sullivan J, Bickel S, Mehta AD, Mesgarani N. Distinct neural encoding of glimpsed and masked speech in multitalker situations. PLoS Biol 2023; 21:e3002128. [PMID: 37279203 PMCID: PMC10243639 DOI: 10.1371/journal.pbio.3002128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/19/2023] [Indexed: 06/08/2023] Open
Abstract
Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.
Collapse
Affiliation(s)
- Vinay S Raghavan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - James O’Sullivan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - Stephan Bickel
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
- Department of Neurology, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Ashesh D. Mehta
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
21
|
Wahbeh H, Cannard C, Kriegsman M, Delorme A. Evaluating brain spectral and connectivity differences between silent mind-wandering and trance states. PROGRESS IN BRAIN RESEARCH 2023; 277:29-61. [PMID: 37301570 DOI: 10.1016/bs.pbr.2022.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Trance is an altered state of consciousness characterized by alterations in cognition. In general, trance states induce mental silence (i.e., cognitive thought reduction), and mental silence can induce trance states. Conversely, mind-wandering is the mind's propensity to stray its attention away from the task at hand and toward content irrelevant to the current moment, and its main component is inner speech. Building on the previous literature on mental silence and trance states and incorporating inverse source reconstruction advances, the study's objectives were to evaluate differences between trance and mind-wandering states using: (1) electroencephalography (EEG) power spectra at the electrode level, (2) power spectra at the area level (source reconstructed signal), and (3) EEG functional connectivity between these areas (i.e., how they interact). The relationship between subjective trance depths ratings and whole-brain connectivity during trance was also evaluated. Spectral analyses revealed increased delta and theta power in the frontal region and increased gamma in the centro-parietal region during mind-wandering, whereas trance showed increased beta and gamma power in the frontal region. Power spectra at the area level and pairwise comparisons of the connectivity between these areas demonstrated no significant difference between the two states. However, subjective trance depth ratings were inversely correlated with whole-brain connectivity in all frequency bands (i.e., deeper trance is associated with less large-scale connectivity). Trance allows one to enter mentally silent states and explore their neurophenomenological processes. Limitations and future directions are discussed.
Collapse
Affiliation(s)
- Helané Wahbeh
- Research Department, Institute of Noetic Sciences, Petaluma, CA, United States.
| | - Cedric Cannard
- Research Department, Institute of Noetic Sciences, Petaluma, CA, United States
| | - Michael Kriegsman
- Research Department, Institute of Noetic Sciences, Petaluma, CA, United States
| | - Arnaud Delorme
- Research Department, Institute of Noetic Sciences, Petaluma, CA, United States; University of California, San Diego, CA, United States
| |
Collapse
|
22
|
De Clercq P, Vanthornhout J, Vandermosten M, Francart T. Beyond linear neural envelope tracking: a mutual information approach. J Neural Eng 2023; 20. [PMID: 36812597 DOI: 10.1088/1741-2552/acbe1d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 02/22/2023] [Indexed: 02/24/2023]
Abstract
Objective.The human brain tracks the temporal envelope of speech, which contains essential cues for speech understanding. Linear models are the most common tool to study neural envelope tracking. However, information on how speech is processed can be lost since nonlinear relations are precluded. Analysis based on mutual information (MI), on the other hand, can detect both linear and nonlinear relations and is gradually becoming more popular in the field of neural envelope tracking. Yet, several different approaches to calculating MI are applied with no consensus on which approach to use. Furthermore, the added value of nonlinear techniques remains a subject of debate in the field. The present paper aims to resolve these open questions.Approach.We analyzed electroencephalography (EEG) data of participants listening to continuous speech and applied MI analyses and linear models.Main results.Comparing the different MI approaches, we conclude that results are most reliable and robust using the Gaussian copula approach, which first transforms the data to standard Gaussians. With this approach, the MI analysis is a valid technique for studying neural envelope tracking. Like linear models, it allows spatial and temporal interpretations of speech processing, peak latency analyses, and applications to multiple EEG channels combined. In a final analysis, we tested whether nonlinear components were present in the neural response to the envelope by first removing all linear components in the data. We robustly detected nonlinear components on the single-subject level using the MI analysis.Significance.We demonstrate that the human brain processes speech in a nonlinear way. Unlike linear models, the MI analysis detects such nonlinear relations, proving its added value to neural envelope tracking. In addition, the MI analysis retains spatial and temporal characteristics of speech processing, an advantage lost when using more complex (nonlinear) deep neural networks.
Collapse
Affiliation(s)
- Pieter De Clercq
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Maaike Vandermosten
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
23
|
Su Y, MacGregor LJ, Olasagasti I, Giraud AL. A deep hierarchy of predictions enables online meaning extraction in a computational model of human speech comprehension. PLoS Biol 2023; 21:e3002046. [PMID: 36947552 PMCID: PMC10079236 DOI: 10.1371/journal.pbio.3002046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 04/06/2023] [Accepted: 02/22/2023] [Indexed: 03/23/2023] Open
Abstract
Understanding speech requires mapping fleeting and often ambiguous soundwaves to meaning. While humans are known to exploit their capacity to contextualize to facilitate this process, how internal knowledge is deployed online remains an open question. Here, we present a model that extracts multiple levels of information from continuous speech online. The model applies linguistic and nonlinguistic knowledge to speech processing, by periodically generating top-down predictions and incorporating bottom-up incoming evidence in a nested temporal hierarchy. We show that a nonlinguistic context level provides semantic predictions informed by sensory inputs, which are crucial for disambiguating among multiple meanings of the same word. The explicit knowledge hierarchy of the model enables a more holistic account of the neurophysiological responses to speech compared to using lexical predictions generated by a neural network language model (GPT-2). We also show that hierarchical predictions reduce peripheral processing via minimizing uncertainty and prediction error. With this proof-of-concept model, we demonstrate that the deployment of hierarchical predictions is a possible strategy for the brain to dynamically utilize structured knowledge and make sense of the speech input.
Collapse
Affiliation(s)
- Yaqing Su
- Department of Fundamental Neuroscience, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Swiss National Centre of Competence in Research “Evolving Language” (NCCR EvolvingLanguage), Geneva, Switzerland
| | - Lucy J. MacGregor
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Itsaso Olasagasti
- Department of Fundamental Neuroscience, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Swiss National Centre of Competence in Research “Evolving Language” (NCCR EvolvingLanguage), Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Fundamental Neuroscience, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Swiss National Centre of Competence in Research “Evolving Language” (NCCR EvolvingLanguage), Geneva, Switzerland
- Institut Pasteur, Université Paris Cité, Inserm, Institut de l’Audition, Paris, France
| |
Collapse
|
24
|
Willeford K. The Luminescence Hypothesis of Olfaction. SENSORS (BASEL, SWITZERLAND) 2023; 23:1333. [PMID: 36772376 PMCID: PMC9919928 DOI: 10.3390/s23031333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/12/2023] [Accepted: 01/20/2023] [Indexed: 06/18/2023]
Abstract
A new hypothesis for the mechanism of olfaction is presented. It begins with an odorant molecule binding to an olfactory receptor. This is followed by the quantum biology event of inelastic electron tunneling as has been suggested with both the vibration and swipe card theories. It is novel in that it is not concerned with the possible effects of the tunneled electrons as has been discussed with the previous theories. Instead, the high energy state of the odorant molecule in the receptor following inelastic electron tunneling is considered. The hypothesis is that, as the high energy state decays, there is fluorescence luminescence with radiative emission of multiple photons. These photons pass through the supporting sustentacular cells and activate a set of olfactory neurons in near-simultaneous timing, which provides the temporal basis for the brain to interpret the required complex combinatorial coding as an odor. The Luminescence Hypothesis of Olfaction is the first to present the necessity of or mechanism for a 1:3 correspondence of odorant molecule to olfactory nerve activations. The mechanism provides for a consistent and reproducible time-based activation of sets of olfactory nerves correlated to an odor. The hypothesis has a biological precedent: an energy feasibility assessment is included, explaining the anosmia seen with COVID-19, and can be confirmed with existing laboratory techniques.
Collapse
Affiliation(s)
- Kenneth Willeford
- Coastal Carolinas Integrated Medicine, 10 Doctors Circle, STE 2, Supply, NC 28462, USA
| |
Collapse
|