1
|
Anderson AJ, Davis C, Lalor EC. Deep-learning models reveal how context and listener attention shape electrophysiological correlates of speech-to-language transformation. PLoS Comput Biol 2024; 20:e1012537. [PMID: 39527649 DOI: 10.1371/journal.pcbi.1012537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 10/04/2024] [Indexed: 11/16/2024] Open
Abstract
To transform continuous speech into words, the human brain must resolve variability across utterances in intonation, speech rate, volume, accents and so on. A promising approach to explaining this process has been to model electroencephalogram (EEG) recordings of brain responses to speech. Contemporary models typically invoke context invariant speech categories (e.g. phonemes) as an intermediary representational stage between sounds and words. However, such models may not capture the complete picture because they do not model the brain mechanism that categorizes sounds and consequently may overlook associated neural representations. By providing end-to-end accounts of speech-to-text transformation, new deep-learning systems could enable more complete brain models. We model EEG recordings of audiobook comprehension with the deep-learning speech recognition system Whisper. We find that (1) Whisper provides a self-contained EEG model of an intermediary representational stage that reflects elements of prelexical and lexical representation and prediction; (2) EEG modeling is more accurate when informed by 5-10s of speech context, which traditional context invariant categorical models do not encode; (3) Deep Whisper layers encoding linguistic structure were more accurate EEG models of selectively attended speech in two-speaker "cocktail party" listening conditions than early layers encoding acoustics. No such layer depth advantage was observed for unattended speech, consistent with a more superficial level of linguistic processing in the brain.
Collapse
Affiliation(s)
- Andrew J Anderson
- Department of Neurology. Medical College of Wisconsin, Milwaukee, Wisconsin United States of America
- Department of Biomedical Engineering. Medical College of Wisconsin. Milwaukee, Wisconsin United States of America
- Department of Neurosurgery. Medical College of Wisconsin. Milwaukee, Wisconsin United States of America
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York, United States of America
| | - Chris Davis
- Western Sydney University, The MARCS Institute for Brain, Behaviour and Development, Westmead Innovation Quarter, Westmead, New South Wales, Australia
| | - Edmund C Lalor
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York, United States of America
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Center for Visual Science, University of Rochester, Rochester, New York, United States of America
| |
Collapse
|
2
|
Fan Y, Wang M, Fang F, Ding N, Luo H. Two-dimensional neural geometry underpins hierarchical organization of sequence in human working memory. Nat Hum Behav 2024:10.1038/s41562-024-02047-8. [PMID: 39511344 DOI: 10.1038/s41562-024-02047-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 10/02/2024] [Indexed: 11/15/2024]
Abstract
Working memory (WM) is constructive in nature. Instead of passively retaining information, WM reorganizes complex sequences into hierarchically embedded chunks to overcome capacity limits and facilitate flexible behaviour. Here, to investigate the neural mechanisms underlying hierarchical reorganization in WM, we performed two electroencephalography and one magnetoencephalography experiments, wherein humans retain in WM a temporal sequence of items, that is, syllables, which are organized into chunks, that is, multisyllabic words. We demonstrate that the one-dimensional sequence is represented by two-dimensional neural representational geometry in WM arising from left prefrontal and temporoparietal regions, with separate dimensions encoding item position within a chunk and chunk position in the sequence. Critically, this two-dimensional geometry is observed consistently in different experimental settings, even during tasks not encouraging hierarchical reorganization in WM and correlates with WM behaviour. Overall, these findings strongly support that complex sequences are reorganized into factorized multidimensional neural representational geometry in WM, which also speaks to general structure-based organizational principles given WM's involvement in many cognitive functions.
Collapse
Affiliation(s)
- Ying Fan
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China
- Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Muzhi Wang
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China
- Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Fang Fang
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China
- Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China.
- State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China.
| | - Huan Luo
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China.
- Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China.
- Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing, China.
| |
Collapse
|
3
|
Banaraki AK, Toghi A, Mohammadzadeh A. RDoC Framework Through the Lens of Predictive Processing: Focusing on Cognitive Systems Domain. COMPUTATIONAL PSYCHIATRY (CAMBRIDGE, MASS.) 2024; 8:178-201. [PMID: 39478691 PMCID: PMC11523845 DOI: 10.5334/cpsy.119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 10/11/2024] [Indexed: 11/02/2024]
Abstract
In response to shortcomings of the current classification system in translating discoveries from basic science to clinical applications, NIMH offers a new framework for studying mental health disorders called Research Domain Criteria (RDoC). This framework holds a multidimensional outlook on psychopathologies focusing on functional domains of behavior and their implementing neural circuits. In parallel, the Predictive Processing (PP) framework stands as a leading theory of human brain function, offering a unified explanation for various types of information processing in the brain. While both frameworks share an interest in studying psychopathologies based on pathophysiology, their integration still needs to be explored. Here, we argued in favor of the explanatory power of PP to be a groundwork for the RDoC matrix in validating its constructs and creating testable hypotheses about mechanistic interactions between molecular biomarkers and clinical traits. Together, predictive processing may serve as a foundation for achieving the goals of the RDoC framework.
Collapse
Affiliation(s)
| | - Armin Toghi
- Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| | - Azar Mohammadzadeh
- Research Center for Cognitive and Behavioral Studies, Tehran University of Medical Science, Tehran, Iran
| |
Collapse
|
4
|
Sooter NM, Seragnoli F, Picard F. Insights from Ecstatic Epilepsy: From Uncertainty to Metacognitive Feelings. Curr Top Behav Neurosci 2024. [PMID: 39436631 DOI: 10.1007/7854_2024_528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Ecstatic epilepsy is a rare form of focal epilepsy linked to the anterior insula in which patients experience a blissful state with a unique set of symptoms, including a feeling of physical well-being, mental clarity, a sense of oneness with the universe, and time dilation. In this chapter, we reflect on how these symptoms coincide with our current knowledge of the insula's functions and explore how this stunning natural model can further inform our understanding of the insula's role in the sentient self, uncertainty and surprise monitoring, and metacognitive feelings.
Collapse
Affiliation(s)
- Nina M Sooter
- Geneva School of Economics and Management, University of Geneva, Geneva, Switzerland
| | - Federico Seragnoli
- Institute of Psychology, Lausanne University, Lausanne, Switzerland
- Psychiatry Department, Geneva University Hospital, Geneva, Switzerland
| | - Fabienne Picard
- Department of Clinical Neurosciences, Geneva University Hospitals and Medical School, Geneva, Switzerland.
| |
Collapse
|
5
|
Song M, Wang J, Cai Q. The unique contribution of uncertainty reduction during naturalistic language comprehension. Cortex 2024; 181:12-25. [PMID: 39447486 DOI: 10.1016/j.cortex.2024.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 07/21/2024] [Accepted: 09/24/2024] [Indexed: 10/26/2024]
Abstract
Language comprehension is an incremental process with prediction. Delineating various mental states during such a process is critical to understanding the relationship between human cognition and the properties of language. Entropy reduction, which indicates the dynamic decrease of uncertainty as language input unfolds, has been recognized as effective in predicting neural responses during comprehension. According to the entropy reduction hypothesis (Hale, 2006), entropy reduction is related to the processing difficulty of a word, the effect of which may overlap with other well-documented information-theoretical metrics such as surprisal or next-word entropy. However, the processing difficulty was often confused with the information conveyed by a word, especially lacking neural differentiation. We propose that entropy reduction represents the cognitive neural process of information gain that can be dissociated from processing difficulty. This study characterized various information-theoretical metrics using GPT-2 and identified the unique effects of entropy reduction in predicting fMRI time series acquired during language comprehension. In addition to the effects of surprisal and entropy, entropy reduction was associated with activations in the left inferior frontal gyrus, bilateral ventromedial prefrontal cortex, insula, thalamus, basal ganglia, and middle cingulate cortex. The reduction of uncertainty, rather than its fluctuation, proved to be an effective factor in modeling neural responses. The neural substrates underlying the reduction in uncertainty might imply the brain's desire for information regardless of processing difficulty.
Collapse
Affiliation(s)
- Ming Song
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China; Shanghai Changning Mental Health Center, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China
| | - Jing Wang
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China; Shanghai Changning Mental Health Center, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China.
| | - Qing Cai
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China; Shanghai Changning Mental Health Center, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China.
| |
Collapse
|
6
|
Regev TI, Casto C, Hosseini EA, Adamek M, Ritaccio AL, Willie JT, Brunner P, Fedorenko E. Neural populations in the language network differ in the size of their temporal receptive windows. Nat Hum Behav 2024; 8:1924-1942. [PMID: 39187713 DOI: 10.1038/s41562-024-01944-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 07/03/2024] [Indexed: 08/28/2024]
Abstract
Despite long knowing what brain areas support language comprehension, our knowledge of the neural computations that these frontal and temporal regions implement remains limited. One important unresolved question concerns functional differences among the neural populations that comprise the language network. Here we leveraged the high spatiotemporal resolution of human intracranial recordings (n = 22) to examine responses to sentences and linguistically degraded conditions. We discovered three response profiles that differ in their temporal dynamics. These profiles appear to reflect different temporal receptive windows, with average windows of about 1, 4 and 6 words, respectively. Neural populations exhibiting these profiles are interleaved across the language network, which suggests that all language regions have direct access to distinct, multiscale representations of linguistic input-a property that may be critical for the efficiency and robustness of language processing.
Collapse
Affiliation(s)
- Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Colton Casto
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
| | - Eghbal A Hosseini
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Markus Adamek
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
| | | | - Jon T Willie
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
| | - Peter Brunner
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
- Department of Neurology, Albany Medical College, Albany, NY, USA
| | - Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA.
| |
Collapse
|
7
|
Tarder-Stoll H, Baldassano C, Aly M. Consolidation Enhances Sequential Multistep Anticipation but Diminishes Access to Perceptual Features. Psychol Sci 2024; 35:1178-1199. [PMID: 39110746 PMCID: PMC11532645 DOI: 10.1177/09567976241256617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 04/19/2024] [Indexed: 08/10/2024] Open
Abstract
Many experiences unfold predictably over time. Memory for these temporal regularities enables anticipation of events multiple steps into the future. Because temporally predictable events repeat over days, weeks, and years, we must maintain-and potentially transform-memories of temporal structure to support adaptive behavior. We explored how individuals build durable models of temporal regularities to guide multistep anticipation. Healthy young adults (Experiment 1: N = 99, age range = 18-40 years; Experiment 2: N = 204, age range = 19-40 years) learned sequences of scene images that were predictable at the category level and contained incidental perceptual details. Individuals then anticipated upcoming scene categories multiple steps into the future, immediately and at a delay. Consolidation increased the efficiency of anticipation, particularly for events further in the future, but diminished access to perceptual features. Further, maintaining a link-based model of the sequence after consolidation improved anticipation accuracy. Consolidation may therefore promote efficient and durable models of temporal structure, thus facilitating anticipation of future events.
Collapse
Affiliation(s)
- Hannah Tarder-Stoll
- Department of Psychology, Columbia University
- Baycrest Health Sciences, Rotman Research Institute, Toronto, Canada
| | | | - Mariam Aly
- Department of Psychology, Columbia University
| |
Collapse
|
8
|
Zada Z, Goldstein A, Michelmann S, Simony E, Price A, Hasenfratz L, Barham E, Zadbood A, Doyle W, Friedman D, Dugan P, Melloni L, Devore S, Flinker A, Devinsky O, Nastase SA, Hasson U. A shared model-based linguistic space for transmitting our thoughts from brain to brain in natural conversations. Neuron 2024; 112:3211-3222.e5. [PMID: 39096896 PMCID: PMC11427153 DOI: 10.1016/j.neuron.2024.06.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 03/26/2024] [Accepted: 06/25/2024] [Indexed: 08/05/2024]
Abstract
Effective communication hinges on a mutual understanding of word meaning in different contexts. We recorded brain activity using electrocorticography during spontaneous, face-to-face conversations in five pairs of epilepsy patients. We developed a model-based coupling framework that aligns brain activity in both speaker and listener to a shared embedding space from a large language model (LLM). The context-sensitive LLM embeddings allow us to track the exchange of linguistic information, word by word, from one brain to another in natural conversations. Linguistic content emerges in the speaker's brain before word articulation and rapidly re-emerges in the listener's brain after word articulation. The contextual embeddings better capture word-by-word neural alignment between speaker and listener than syntactic and articulatory models. Our findings indicate that the contextual embeddings learned by LLMs can serve as an explicit numerical model of the shared, context-rich meaning space humans use to communicate their thoughts to one another.
Collapse
Affiliation(s)
- Zaid Zada
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA.
| | - Ariel Goldstein
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Department of Cognitive and Brain Sciences and Business School, Hebrew University, Jerusalem 9190501, Israel
| | - Sebastian Michelmann
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Erez Simony
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Faculty of Engineering, Holon Institute of Technology, Holon 5810201, Israel
| | - Amy Price
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Liat Hasenfratz
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Emily Barham
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Asieh Zadbood
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Department of Psychology, Columbia University, New York, NY 10027, USA
| | - Werner Doyle
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Daniel Friedman
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Patricia Dugan
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Lucia Melloni
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Sasha Devore
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Adeen Flinker
- Grossman School of Medicine, New York University, New York, NY 10016, USA; Tandon School of Engineering, New York University, New York, NY 10016, USA
| | - Orrin Devinsky
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Samuel A Nastase
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Uri Hasson
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
9
|
Norman-Haignere SV, Keshishian MK, Devinsky O, Doyle W, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Temporal integration in human auditory cortex is predominantly yoked to absolute time, not structure duration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614358. [PMID: 39386565 PMCID: PMC11463558 DOI: 10.1101/2024.09.23.614358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Sound structures such as phonemes and words have highly variable durations. Thus, there is a fundamental difference between integrating across absolute time (e.g., 100 ms) vs. sound structure (e.g., phonemes). Auditory and cognitive models have traditionally cast neural integration in terms of time and structure, respectively, but the extent to which cortical computations reflect time or structure remains unknown. To answer this question, we rescaled the duration of all speech structures using time stretching/compression and measured integration windows in the human auditory cortex using a new experimental/computational method applied to spatiotemporally precise intracranial recordings. We observed significantly longer integration windows for stretched speech, but this lengthening was very small (~5%) relative to the change in structure durations, even in non-primary regions strongly implicated in speech-specific processing. These findings demonstrate that time-yoked computations dominate throughout the human auditory cortex, placing important constraints on neurocomputational models of structure processing.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- University of Rochester Medical Center, Department of Biostatistics and Computational Biology
- University of Rochester Medical Center, Department of Neuroscience
- University of Rochester, Department of Brain and Cognitive Sciences
- University of Rochester, Department of Biomedical Engineering
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
| | - Menoua K. Keshishian
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| | - Orrin Devinsky
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
| | - Werner Doyle
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Neurosurgery, NYU Langone Medical Center
| | - Guy M. McKhann
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | | | - Adeen Flinker
- Department of Neurology, NYU Langone Medical Center
- Comprehensive Epilepsy Center, NYU Langone Medical Center
- Department of Biomedical Engineering, NYU Tandon School of Engineering
| | - Nima Mesgarani
- Zuckerman Institute for Mind Brain and Behavior, Columbia University
- Department of Electrical Engineering, Columbia University
| |
Collapse
|
10
|
Chalas N, Meyer L, Lo CW, Park H, Kluger DS, Abbasi O, Kayser C, Nitsch R, Gross J. Dissociating prosodic from syntactic delta activity during natural speech comprehension. Curr Biol 2024; 34:3537-3549.e5. [PMID: 39047734 DOI: 10.1016/j.cub.2024.06.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/24/2024] [Accepted: 06/27/2024] [Indexed: 07/27/2024]
Abstract
Decoding human speech requires the brain to segment the incoming acoustic signal into meaningful linguistic units, ranging from syllables and words to phrases. Integrating these linguistic constituents into a coherent percept sets the root of compositional meaning and hence understanding. One important cue for segmentation in natural speech is prosodic cues, such as pauses, but their interplay with higher-level linguistic processing is still unknown. Here, we dissociate the neural tracking of prosodic pauses from the segmentation of multi-word chunks using magnetoencephalography (MEG). We find that manipulating the regularity of pauses disrupts slow speech-brain tracking bilaterally in auditory areas (below 2 Hz) and in turn increases left-lateralized coherence of higher-frequency auditory activity at speech onsets (around 25-45 Hz). Critically, we also find that multi-word chunks-defined as short, coherent bundles of inter-word dependencies-are processed through the rhythmic fluctuations of low-frequency activity (below 2 Hz) bilaterally and independently of prosodic cues. Importantly, low-frequency alignment at chunk onsets increases the accuracy of an encoding model in bilateral auditory and frontal areas while controlling for the effect of acoustics. Our findings provide novel insights into the neural basis of speech perception, demonstrating that both acoustic features (prosodic cues) and abstract linguistic processing at the multi-word timescale are underpinned independently by low-frequency electrophysiological brain activity in the delta frequency range.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany; Institute for Translational Neuroscience, University of Münster, Münster, Germany.
| | - Lars Meyer
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Chia-Wen Lo
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Hyojin Park
- Centre for Human Brain Health (CHBH), School of Psychology, University of Birmingham, Birmingham, UK
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
11
|
Yakura H. Evaluating Large Language Models' Ability Using a Psychiatric Screening Tool Based on Metaphor and Sarcasm Scenarios. J Intell 2024; 12:70. [PMID: 39057190 PMCID: PMC11278383 DOI: 10.3390/jintelligence12070070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 07/15/2024] [Accepted: 07/18/2024] [Indexed: 07/28/2024] Open
Abstract
Metaphors and sarcasm are precious fruits of our highly evolved social communication skills. However, children with the condition then known as Asperger syndrome are known to have difficulties in comprehending sarcasm, even if they possess adequate verbal IQs for understanding metaphors. Accordingly, researchers had employed a screening test that assesses metaphor and sarcasm comprehension to distinguish Asperger syndrome from other conditions with similar external behaviors (e.g., attention-deficit/hyperactivity disorder). This study employs a standardized test to evaluate recent large language models' (LLMs) understanding of nuanced human communication. The results indicate improved metaphor comprehension with increased model parameters; however, no similar improvement was observed for sarcasm comprehension. Considering that a human's ability to grasp sarcasm has been associated with the amygdala, a pivotal cerebral region for emotional learning, a distinctive strategy for training LLMs would be imperative to imbue them with the ability in a cognitively grounded manner.
Collapse
Affiliation(s)
- Hiromu Yakura
- Max-Planck Institute for Human Development, 14195 Berlin, Germany
| |
Collapse
|
12
|
Santamaría-García H, Migeot J, Medel V, Hazelton JL, Teckentrup V, Romero-Ortuno R, Piguet O, Lawor B, Northoff G, Ibanez A. Allostatic Interoceptive Overload Across Psychiatric and Neurological Conditions. Biol Psychiatry 2024:S0006-3223(24)01428-8. [PMID: 38964530 DOI: 10.1016/j.biopsych.2024.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024]
Abstract
Emerging theories emphasize the crucial role of allostasis (anticipatory and adaptive regulation of the body's biological processes) and interoception (integration, anticipation, and regulation of internal bodily states) in adjusting physiological responses to environmental and bodily demands. In this review, we explore the disruptions in integrated allostatic interoceptive mechanisms in psychiatric and neurological disorders, including anxiety, depression, Alzheimer's disease, and frontotemporal dementia. We assess the biological mechanisms associated with allostatic interoception, including whole-body cascades, brain structure and function of the allostatic interoceptive network, heart-brain interactions, respiratory-brain interactions, the gut-brain-microbiota axis, peripheral biological processes (inflammatory, immune), and epigenetic pathways. These processes span psychiatric and neurological conditions and call for developing dimensional and transnosological frameworks. We synthesize new pathways to understand how allostatic interoceptive processes modulate interactions between environmental demands and biological functions in brain disorders. We discuss current limitations of the framework and future transdisciplinary developments. This review opens a new research agenda for understanding how allostatic interoception involves brain predictive coding in psychiatry and neurology, allowing for better clinical application and the development of new therapeutic interventions.
Collapse
Affiliation(s)
- Hernando Santamaría-García
- Pontificia Universidad Javeriana, PhD program of Neuroscience, Bogotá, Colombia; Hospital Universitario San Ignacio, Centro de Memoria y Cognición Intellectus, Bogotá, Colombia
| | - Joaquin Migeot
- Global Brain Health Institute, University California of San Francisco, San Francisco, California; Global Brain Health Institute, Trinity College of Dublin, Dublin, Ireland; Latin American Brain Health Institute, Universidad Adolfo Ibanez, Santiago, Chile
| | - Vicente Medel
- Latin American Brain Health Institute, Universidad Adolfo Ibanez, Santiago, Chile
| | - Jessica L Hazelton
- Latin American Brain Health Institute, Universidad Adolfo Ibanez, Santiago, Chile; School of Psychology and Brain & Mind Centre, The University of Sydney, Sydney, New South Wales, Australia
| | - Vanessa Teckentrup
- School of Psychology and Trinity Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland
| | - Roman Romero-Ortuno
- Pontificia Universidad Javeriana, PhD program of Neuroscience, Bogotá, Colombia; Discipline of Medical Gerontology, School of Medicine, Trinity College Dublin, Dublin, Ireland
| | - Olivier Piguet
- School of Psychology and Brain & Mind Centre, The University of Sydney, Sydney, New South Wales, Australia
| | - Brian Lawor
- Pontificia Universidad Javeriana, PhD program of Neuroscience, Bogotá, Colombia
| | - George Northoff
- Institute of Mental Health Research, Mind, Brain Imaging and Neuroethics Research Unit, University of Ottawa, Ottawa, Ontario, Canada
| | - Agustin Ibanez
- Global Brain Health Institute, University California of San Francisco, San Francisco, California; Global Brain Health Institute, Trinity College of Dublin, Dublin, Ireland; Latin American Brain Health Institute, Universidad Adolfo Ibanez, Santiago, Chile; School of Psychology and Trinity Institute of Neuroscience, Trinity College Dublin, Dublin, Ireland.
| |
Collapse
|
13
|
Kumar S, Sumers TR, Yamakoshi T, Goldstein A, Hasson U, Norman KA, Griffiths TL, Hawkins RD, Nastase SA. Shared functional specialization in transformer-based language models and the human brain. Nat Commun 2024; 15:5523. [PMID: 38951520 PMCID: PMC11217339 DOI: 10.1038/s41467-024-49173-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 05/24/2024] [Indexed: 07/03/2024] Open
Abstract
When processing language, the brain is thought to deploy specialized computations to construct meaning from complex linguistic structures. Recently, artificial neural networks based on the Transformer architecture have revolutionized the field of natural language processing. Transformers integrate contextual information across words via structured circuit computations. Prior work has focused on the internal representations ("embeddings") generated by these circuits. In this paper, we instead analyze the circuit computations directly: we deconstruct these computations into the functionally-specialized "transformations" that integrate contextual information across words. Using functional MRI data acquired while participants listened to naturalistic stories, we first verify that the transformations account for considerable variance in brain activity across the cortical language network. We then demonstrate that the emergent computations performed by individual, functionally-specialized "attention heads" differentially predict brain activity in specific cortical regions. These heads fall along gradients corresponding to different layers and context lengths in a low-dimensional cortical space.
Collapse
Affiliation(s)
- Sreejan Kumar
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| | - Theodore R Sumers
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.
| | - Takateru Yamakoshi
- Faculty of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Ariel Goldstein
- Department of Cognitive and Brain Sciences and Business School, Hebrew University, Jerusalem, 9190401, Israel
| | - Uri Hasson
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Kenneth A Norman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Thomas L Griffiths
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Robert D Hawkins
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Samuel A Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| |
Collapse
|
14
|
Wang R, Chen ZS. Large-scale foundation models and generative AI for BigData neuroscience. Neurosci Res 2024:S0168-0102(24)00075-0. [PMID: 38897235 DOI: 10.1016/j.neures.2024.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 04/15/2024] [Accepted: 05/15/2024] [Indexed: 06/21/2024]
Abstract
Recent advances in machine learning have led to revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the landscapes of neuroscience research and make a significant impact on the future. Here we present a mini-review on recent advances in foundation models and generative AI models as well as their applications in neuroscience, including natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.
Collapse
Affiliation(s)
- Ran Wang
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Zhe Sage Chen
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Neuroscience and Physiology, Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY 11201, USA.
| |
Collapse
|
15
|
Alcalá-López D, Mei N, Margolles P, Soto D. Brain-wide representation of social knowledge. Soc Cogn Affect Neurosci 2024; 19:nsae032. [PMID: 38804694 PMCID: PMC11173195 DOI: 10.1093/scan/nsae032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/28/2024] [Accepted: 05/30/2024] [Indexed: 05/29/2024] Open
Abstract
Understanding how the human brain maps different dimensions of social conceptualizations remains a key unresolved issue. We performed a functional magnetic resonance imaging (MRI) study in which participants were exposed to audio definitions of personality traits and asked to simulate experiences associated with the concepts. Half of the concepts were affective (e.g. empathetic), and the other half were non-affective (e.g. intelligent). Orthogonally, half of the concepts were highly likable (e.g. sincere) and half were socially undesirable (e.g. liar). Behaviourally, we observed that the dimension of social desirability reflected the participant's subjective ratings better than affect. FMRI decoding results showed that both social desirability and affect could be decoded in local patterns of activity through distributed brain regions including the superior temporal, inferior frontal, precuneus and key nodes of the default mode network in posterior/anterior cingulate and ventromedial prefrontal cortex. Decoding accuracy was better for social desirability than affect. A representational similarity analysis further demonstrated that a deep language model significantly predicted brain activity associated with the concepts in bilateral regions of superior and anterior temporal lobes. The results demonstrate a brain-wide representation of social knowledge, involving default model network systems that support the multimodal simulation of social experience, with a further reliance on language-related preprocessing.
Collapse
Affiliation(s)
- Daniel Alcalá-López
- Consciousness group, Basque Center on Cognition, Brain and Language, San Sebastian 20009, Spain
| | - Ning Mei
- Psychology Department, Shenzhen University, Nanshan district, Guangdong province 3688, China
| | - Pedro Margolles
- Consciousness group, Basque Center on Cognition, Brain and Language, San Sebastian 20009, Spain
| | - David Soto
- Consciousness group, Basque Center on Cognition, Brain and Language, San Sebastian 20009, Spain
| |
Collapse
|
16
|
Yu S, Gu C, Huang K, Li P. Predicting the next sentence (not word) in large language models: What model-brain alignment tells us about discourse comprehension. SCIENCE ADVANCES 2024; 10:eadn7744. [PMID: 38781343 PMCID: PMC11114233 DOI: 10.1126/sciadv.adn7744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/18/2024] [Indexed: 05/25/2024]
Abstract
Current large language models (LLMs) rely on word prediction as their backbone pretraining task. Although word prediction is an important mechanism underlying language processing, human language comprehension occurs at multiple levels, involving the integration of words and sentences to achieve a full understanding of discourse. This study models language comprehension by using the next sentence prediction (NSP) task to investigate mechanisms of discourse-level comprehension. We show that NSP pretraining enhanced a model's alignment with brain data especially in the right hemisphere and in the multiple demand network, highlighting the contributions of nonclassical language regions to high-level language understanding. Our results also suggest that NSP can enable the model to better capture human comprehension performance and to better encode contextual information. Our study demonstrates that the inclusion of diverse learning objectives in a model leads to more human-like representations, and investigating the neurocognitive plausibility of pretraining tasks in LLMs can shed light on outstanding questions in language neuroscience.
Collapse
Affiliation(s)
- Shaoyun Yu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Chanyuan Gu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kexin Huang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Ping Li
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
- Centre for Immersive Learning and Metaverse in Education, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
17
|
Kurteff GL, Field AM, Asghar S, Tyler-Kabara EC, Clarke D, Weiner HL, Anderson AE, Watrous AJ, Buchanan RJ, Modur PN, Hamilton LS. Processing of auditory feedback in perisylvian and insular cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.14.593257. [PMID: 38798574 PMCID: PMC11118286 DOI: 10.1101/2024.05.14.593257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
When we speak, we not only make movements with our mouth, lips, and tongue, but we also hear the sound of our own voice. Thus, speech production in the brain involves not only controlling the movements we make, but also auditory and sensory feedback. Auditory responses are typically suppressed during speech production compared to perception, but how this manifests across space and time is unclear. Here we recorded intracranial EEG in seventeen pediatric, adolescent, and adult patients with medication-resistant epilepsy who performed a reading/listening task to investigate how other auditory responses are modulated during speech production. We identified onset and sustained responses to speech in bilateral auditory cortex, with a selective suppression of onset responses during speech production. Onset responses provide a temporal landmark during speech perception that is redundant with forward prediction during speech production. Phonological feature tuning in these "onset suppression" electrodes remained stable between perception and production. Notably, the posterior insula responded at sentence onset for both perception and production, suggesting a role in multisensory integration during feedback control.
Collapse
Affiliation(s)
- Garret Lynn Kurteff
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
| | - Alyssa M. Field
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
| | - Saman Asghar
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Elizabeth C. Tyler-Kabara
- Department of Neurosurgery, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Department of Pediatrics, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Dave Clarke
- Department of Neurosurgery, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Department of Pediatrics, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Howard L. Weiner
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Anne E. Anderson
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Andrew J. Watrous
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Robert J. Buchanan
- Department of Neurosurgery, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Pradeep N. Modur
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Liberty S. Hamilton
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Lead contact
| |
Collapse
|
18
|
Fedorenko E, Ivanova AA, Regev TI. The language network as a natural kind within the broader landscape of the human brain. Nat Rev Neurosci 2024; 25:289-312. [PMID: 38609551 DOI: 10.1038/s41583-024-00802-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 04/14/2024]
Abstract
Language behaviour is complex, but neuroscientific evidence disentangles it into distinct components supported by dedicated brain areas or networks. In this Review, we describe the 'core' language network, which includes left-hemisphere frontal and temporal areas, and show that it is strongly interconnected, independent of input and output modalities, causally important for language and language-selective. We discuss evidence that this language network plausibly stores language knowledge and supports core linguistic computations related to accessing words and constructions from memory and combining them to interpret (decode) or generate (encode) linguistic messages. We emphasize that the language network works closely with, but is distinct from, both lower-level - perceptual and motor - mechanisms and higher-level systems of knowledge and reasoning. The perceptual and motor mechanisms process linguistic signals, but, in contrast to the language network, are sensitive only to these signals' surface properties, not their meanings; the systems of knowledge and reasoning (such as the system that supports social reasoning) are sometimes engaged during language use but are not language-selective. This Review lays a foundation both for in-depth investigations of these different components of the language processing pipeline and for probing inter-component interactions.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- The Program in Speech and Hearing in Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| | - Anna A Ivanova
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
19
|
Bortolotti A, Conti A, Romagnoli A, Sacco PL. Imagination vs. routines: festive time, weekly time, and the predictive brain. Front Hum Neurosci 2024; 18:1357354. [PMID: 38736532 PMCID: PMC11082368 DOI: 10.3389/fnhum.2024.1357354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 04/05/2024] [Indexed: 05/14/2024] Open
Abstract
This paper examines the relationship between societal structures shaped by traditions, norms, laws, and customs, and creative expressions in arts and media through the lens of the predictive coding framework in cognitive science. The article proposes that both dimensions of culture can be viewed as adaptations designed to enhance and train the brain's predictive abilities in the social domain. Traditions, norms, laws, and customs foster shared predictions and expectations among individuals, thereby reducing uncertainty in social environments. On the other hand, arts and media expose us to simulated experiences that explore alternative social realities, allowing the predictive machinery of the brain to hone its skills through exposure to a wider array of potentially relevant social circumstances and scenarios. We first review key principles of predictive coding and active inference, and then explore the rationale of cultural traditions and artistic culture in this perspective. Finally, we draw parallels between institutionalized normative habits that stabilize social worlds and creative and imaginative acts that temporarily subvert established conventions to inject variability.
Collapse
Affiliation(s)
- Alessandro Bortolotti
- Department of Neuroscience, Imaging, and Clinical Sciences, University “G. D'Annunzio” of Chieti-Pescara, Chieti, Italy
| | - Alice Conti
- Department of Neuroscience, Imaging, and Clinical Sciences, University “G. D'Annunzio” of Chieti-Pescara, Chieti, Italy
| | | | - Pier Luigi Sacco
- Department of Neuroscience, Imaging, and Clinical Sciences, University “G. D'Annunzio” of Chieti-Pescara, Chieti, Italy
- metaLAB (at) Harvard, Cambridge, MA, United States
| |
Collapse
|
20
|
Heins C, Millidge B, Da Costa L, Mann RP, Friston KJ, Couzin ID. Collective behavior from surprise minimization. Proc Natl Acad Sci U S A 2024; 121:e2320239121. [PMID: 38630721 PMCID: PMC11046639 DOI: 10.1073/pnas.2320239121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 03/08/2024] [Indexed: 04/19/2024] Open
Abstract
Collective motion is ubiquitous in nature; groups of animals, such as fish, birds, and ungulates appear to move as a whole, exhibiting a rich behavioral repertoire that ranges from directed movement to milling to disordered swarming. Typically, such macroscopic patterns arise from decentralized, local interactions among constituent components (e.g., individual fish in a school). Preeminent models of this process describe individuals as self-propelled particles, subject to self-generated motion and "social forces" such as short-range repulsion and long-range attraction or alignment. However, organisms are not particles; they are probabilistic decision-makers. Here, we introduce an approach to modeling collective behavior based on active inference. This cognitive framework casts behavior as the consequence of a single imperative: to minimize surprise. We demonstrate that many empirically observed collective phenomena, including cohesion, milling, and directed motion, emerge naturally when considering behavior as driven by active Bayesian inference-without explicitly building behavioral rules or goals into individual agents. Furthermore, we show that active inference can recover and generalize the classical notion of social forces as agents attempt to suppress prediction errors that conflict with their expectations. By exploring the parameter space of the belief-based model, we reveal nontrivial relationships between the individual beliefs and group properties like polarization and the tendency to visit different collective states. We also explore how individual beliefs about uncertainty determine collective decision-making accuracy. Finally, we show how agents can update their generative model over time, resulting in groups that are collectively more sensitive to external fluctuations and encode information more robustly.
Collapse
Affiliation(s)
- Conor Heins
- Department of Collective Behaviour, Max Planck Institute of Animal Behavior, KonstanzD-78457, Germany
- Centre for the Advanced Study of Collective Behaviour, University of Konstanz, KonstanzD-78457, Germany
- Department of Biology, University of Konstanz, KonstanzD-78457, Germany
- VERSES Research Lab, Los Angeles, CA90016
| | - Beren Millidge
- Medical Research Council Brain Networks Dynamics Unit, University of Oxford, OxfordOX1 3TH, United Kingdom
| | - Lancelot Da Costa
- VERSES Research Lab, Los Angeles, CA90016
- Department of Mathematics, Imperial College London, LondonSW7 2AZ, United Kingdom
- Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3AR, United Kingdom
| | - Richard P. Mann
- Department of Statistics, School of Mathematics, University of Leeds, LeedsLS2 9JT, United Kingdom
| | - Karl J. Friston
- VERSES Research Lab, Los Angeles, CA90016
- Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3AR, United Kingdom
| | - Iain D. Couzin
- Department of Collective Behaviour, Max Planck Institute of Animal Behavior, KonstanzD-78457, Germany
- Centre for the Advanced Study of Collective Behaviour, University of Konstanz, KonstanzD-78457, Germany
- Department of Biology, University of Konstanz, KonstanzD-78457, Germany
| |
Collapse
|
21
|
Gwilliams L, Marantz A, Poeppel D, King JR. Hierarchical dynamic coding coordinates speech comprehension in the brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590280. [PMID: 38659750 PMCID: PMC11042271 DOI: 10.1101/2024.04.19.590280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Speech comprehension requires the human brain to transform an acoustic waveform into meaning. To do so, the brain generates a hierarchy of features that converts the sensory input into increasingly abstract language properties. However, little is known about how these hierarchical features are generated and continuously coordinated. Here, we propose that each linguistic feature is dynamically represented in the brain to simultaneously represent successive events. To test this 'Hierarchical Dynamic Coding' (HDC) hypothesis, we use time-resolved decoding of brain activity to track the construction, maintenance, and integration of a comprehensive hierarchy of language features spanning acoustic, phonetic, sub-lexical, lexical, syntactic and semantic representations. For this, we recorded 21 participants with magnetoencephalography (MEG), while they listened to two hours of short stories. Our analyses reveal three main findings. First, the brain incrementally represents and simultaneously maintains successive features. Second, the duration of these representations depend on their level in the language hierarchy. Third, each representation is maintained by a dynamic neural code, which evolves at a speed commensurate with its corresponding linguistic level. This HDC preserves the maintenance of information over time while limiting the interference between successive features. Overall, HDC reveals how the human brain continuously builds and maintains a language hierarchy during natural speech comprehension, thereby anchoring linguistic theories to their biological implementations.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, Stanford University
- Department of Psychology, New York University
| | - Alec Marantz
- Department of Psychology, New York University
- Department of Linguistics, New York University
| | - David Poeppel
- Department of Psychology, New York University
- Ernst Strungman Institute
| | | |
Collapse
|
22
|
Blache P. A neuro-cognitive model of comprehension based on prediction and unification. Front Hum Neurosci 2024; 18:1356541. [PMID: 38655372 PMCID: PMC11035797 DOI: 10.3389/fnhum.2024.1356541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/20/2024] [Indexed: 04/26/2024] Open
Abstract
Most architectures and models of language processing have been built upon a restricted view of language, which is limited to sentence processing. These approaches fail to capture one primordial characteristic: efficiency. Many facilitation effects are known to be at play in natural situations such as conversation (shallow processing, no real access to the lexicon, etc.) without any impact on the comprehension. In this study, on the basis of a new model integrating into a unique architecture, we present these facilitation effects for accessing the meaning into the classical compositional architecture. This model relies on two mechanisms, prediction and unification, and provides a unique architecture for the description of language processing in its natural environment.
Collapse
Affiliation(s)
- Philippe Blache
- Laboratoire Parole et Langage (LPL-CNRS), Aix-en-Provence, France
- Institute of Language, Communication and the Brain (ILCB), Marseille, France
| |
Collapse
|
23
|
Lyu B, Marslen-Wilson WD, Fang Y, Tyler LK. Finding structure during incremental speech comprehension. eLife 2024; 12:RP89311. [PMID: 38577982 PMCID: PMC10997333 DOI: 10.7554/elife.89311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024] Open
Abstract
A core aspect of human speech comprehension is the ability to incrementally integrate consecutive words into a structured and coherent interpretation, aligning with the speaker's intended meaning. This rapid process is subject to multidimensional probabilistic constraints, including both linguistic knowledge and non-linguistic information within specific contexts, and it is their interpretative coherence that drives successful comprehension. To study the neural substrates of this process, we extract word-by-word measures of sentential structure from BERT, a deep language model, which effectively approximates the coherent outcomes of the dynamic interplay among various types of constraints. Using representational similarity analysis, we tested BERT parse depths and relevant corpus-based measures against the spatiotemporally resolved brain activity recorded by electro-/magnetoencephalography when participants were listening to the same sentences. Our results provide a detailed picture of the neurobiological processes involved in the incremental construction of structured interpretations. These findings show when and where coherent interpretations emerge through the evaluation and integration of multifaceted constraints in the brain, which engages bilateral brain regions extending beyond the classical fronto-temporal language system. Furthermore, this study provides empirical evidence supporting the use of artificial neural networks as computational models for revealing the neural dynamics underpinning complex cognitive processes in the brain.
Collapse
Affiliation(s)
| | - William D Marslen-Wilson
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| | - Yuxing Fang
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| | - Lorraine K Tyler
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| |
Collapse
|
24
|
Bzdok D, Thieme A, Levkovskyy O, Wren P, Ray T, Reddy S. Data science opportunities of large language models for neuroscience and biomedicine. Neuron 2024; 112:698-717. [PMID: 38340718 DOI: 10.1016/j.neuron.2024.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/03/2024] [Accepted: 01/17/2024] [Indexed: 02/12/2024]
Abstract
Large language models (LLMs) are a new asset class in the machine-learning landscape. Here we offer a primer on defining properties of these modeling techniques. We then reflect on new modes of investigation in which LLMs can be used to reframe classic neuroscience questions to deliver fresh answers. We reason that LLMs have the potential to (1) enrich neuroscience datasets by adding valuable meta-information, such as advanced text sentiment, (2) summarize vast information sources to overcome divides between siloed neuroscience communities, (3) enable previously unthinkable fusion of disparate information sources relevant to the brain, (4) help deconvolve which cognitive concepts most usefully grasp phenomena in the brain, and much more.
Collapse
Affiliation(s)
- Danilo Bzdok
- Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada; TheNeuro - Montreal Neurological Institute (MNI), Department of Biomedical Engineering, McGill University, Montreal, QC, Canada.
| | | | | | - Paul Wren
- Mindstate Design Labs, San Francisco, CA, USA
| | - Thomas Ray
- Mindstate Design Labs, San Francisco, CA, USA
| | - Siva Reddy
- Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada; Facebook CIFAR AI Chair; ServiceNow Research
| |
Collapse
|
25
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. Nat Hum Behav 2024; 8:544-561. [PMID: 38172630 DOI: 10.1038/s41562-023-01783-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 11/10/2023] [Indexed: 01/05/2024]
Abstract
Transformer models such as GPT generate human-like language and are predictive of human brain responses to language. Here, using functional-MRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of the brain response associated with each sentence. We then use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress the activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also non-invasively control neural activity in higher-level cortical areas, such as the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- MIT-IBM Watson AI Lab, Cambridge, MA, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA, USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
26
|
Karunathilake IMD, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are modulated by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers of the computations underlying speech perception and comprehension.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| |
Collapse
|
27
|
Ohmae K, Ohmae S. Emergence of syntax and word prediction in an artificial neural circuit of the cerebellum. Nat Commun 2024; 15:927. [PMID: 38296954 PMCID: PMC10831061 DOI: 10.1038/s41467-024-44801-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 01/03/2024] [Indexed: 02/02/2024] Open
Abstract
The cerebellum, interconnected with the cerebral neocortex, plays a vital role in human-characteristic cognition such as language processing, however, knowledge about the underlying circuit computation of the cerebellum remains very limited. To gain a better understanding of the computation underlying cerebellar language processing, we developed a biologically constrained cerebellar artificial neural network (cANN) model, which implements the recently identified cerebello-cerebellar recurrent pathway. We found that while cANN acquires prediction of future words, another function of syntactic recognition emerges in the middle layer of the prediction circuit. The recurrent pathway of the cANN was essential for the two language functions, whereas cANN variants with further biological constraints preserved these functions. Considering the uniform structure of cerebellar circuitry across all functional domains, the single-circuit computation, which is the common basis of the two language functions, can be generalized to fundamental cerebellar functions of prediction and grammar-like rule extraction from sequences, that underpin a wide range of cerebellar motor and cognitive functions. This is a pioneering study to understand the circuit computation of human-characteristic cognition using biologically-constrained ANNs.
Collapse
Affiliation(s)
- Keiko Ohmae
- Neuroscience Department, Baylor College of Medicine, Houston, TX, USA
- Chinese Institute for Brain Research (CIBR), Beijing, China
| | - Shogo Ohmae
- Neuroscience Department, Baylor College of Medicine, Houston, TX, USA.
- Chinese Institute for Brain Research (CIBR), Beijing, China.
| |
Collapse
|
28
|
Skrill D, Norman-Haignere SV. Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:638-654. [PMID: 38434255 PMCID: PMC10907028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.
Collapse
Affiliation(s)
- David Skrill
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642
| | - Sam V Norman-Haignere
- Depts. of Biostatistics and Computational Biology, Neuroscience, University of Rochester Medical Center, Rochester, NY 14642
- Depts. of Brain and Cognitive Sciences, Biomedical Engineering, University of Rochester, Rochester, NY 14642
| |
Collapse
|
29
|
Ryskin R, Nieuwland MS. Prediction during language comprehension: what is next? Trends Cogn Sci 2023; 27:1032-1052. [PMID: 37704456 DOI: 10.1016/j.tics.2023.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 09/15/2023]
Abstract
Prediction is often regarded as an integral aspect of incremental language comprehension, but little is known about the cognitive architectures and mechanisms that support it. We review studies showing that listeners and readers use all manner of contextual information to generate multifaceted predictions about upcoming input. The nature of these predictions may vary between individuals owing to differences in language experience, among other factors. We then turn to unresolved questions which may guide the search for the underlying mechanisms. (i) Is prediction essential to language processing or an optional strategy? (ii) Are predictions generated from within the language system or by domain-general processes? (iii) What is the relationship between prediction and memory? (iv) Does prediction in comprehension require simulation via the production system? We discuss promising directions for making progress in answering these questions and for developing a mechanistic understanding of prediction in language.
Collapse
Affiliation(s)
- Rachel Ryskin
- Department of Cognitive and Information Sciences, University of California Merced, 5200 Lake Road, Merced, CA 95343, USA.
| | - Mante S Nieuwland
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands; Donders Institute for Brain, Cognition, and Behaviour, Nijmegen, The Netherlands
| |
Collapse
|
30
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.16.537080. [PMID: 37090673 PMCID: PMC10120732 DOI: 10.1101/2023.04.16.537080] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Transformer models such as GPT generate human-like language and are highly predictive of human brain responses to language. Here, using fMRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of brain response associated with each sentence. Then, we use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also noninvasively control neural activity in higher-level cortical areas, like the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- MIT-IBM Watson AI Lab, Cambridge, MA 02142, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455 USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- The Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138 USA
| |
Collapse
|
31
|
Nour MM, McNamee DC, Liu Y, Dolan RJ. Trajectories through semantic spaces in schizophrenia and the relationship to ripple bursts. Proc Natl Acad Sci U S A 2023; 120:e2305290120. [PMID: 37816054 PMCID: PMC10589662 DOI: 10.1073/pnas.2305290120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/31/2023] [Indexed: 10/12/2023] Open
Abstract
Human cognition is underpinned by structured internal representations that encode relationships between entities in the world (cognitive maps). Clinical features of schizophrenia-from thought disorder to delusions-are proposed to reflect disorganization in such conceptual representations. Schizophrenia is also linked to abnormalities in neural processes that support cognitive map representations, including hippocampal replay and high-frequency ripple oscillations. Here, we report a computational assay of semantically guided conceptual sampling and exploit this to test a hypothesis that people with schizophrenia (PScz) exhibit abnormalities in semantically guided cognition that relate to hippocampal replay and ripples. Fifty-two participants [26 PScz (13 unmedicated) and 26 age-, gender-, and intelligence quotient (IQ)-matched nonclinical controls] completed a category- and letter-verbal fluency task, followed by a magnetoencephalography (MEG) scan involving a separate sequence-learning task. We used a pretrained word embedding model of semantic similarity, coupled to a computational model of word selection, to quantify the degree to which each participant's verbal behavior was guided by semantic similarity. Using MEG, we indexed neural replay and ripple power in a post-task rest session. Across all participants, word selection was strongly influenced by semantic similarity. The strength of this influence showed sensitivity to task demands (category > letter fluency) and predicted performance. In line with our hypothesis, the influence of semantic similarity on behavior was reduced in schizophrenia relative to controls, predicted negative psychotic symptoms, and correlated with an MEG signature of hippocampal ripple power (but not replay). The findings bridge a gap between phenomenological and neurocomputational accounts of schizophrenia.
Collapse
Affiliation(s)
- Matthew M. Nour
- Department of Psychiatry, University of Oxford, OxfordOX3 7JX, United Kingdom
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, LondonWC1B 5EH, United Kingdom
| | - Daniel C. McNamee
- Champalimaud Research, Centre for the Unknown, 1400-038Lisbon, Portugal
| | - Yunzhe Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing100875, China
- Chinese Institute for Brain Research, Beijing102206, China
| | - Raymond J. Dolan
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, LondonWC1B 5EH, United Kingdom
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing100875, China
- Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3AR, United Kingdom
| |
Collapse
|
32
|
Nour MM, Huys QJM. Natural Language Processing in Psychiatry: A Field at an Inflection Point. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2023; 8:979-981. [PMID: 37805191 DOI: 10.1016/j.bpsc.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 08/01/2023] [Indexed: 10/09/2023]
Affiliation(s)
- Matthew M Nour
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, Queen Square Institute of Neurology, University College London, London, United Kingdom.
| | - Quentin J M Huys
- Applied Computational Psychiatry Lab, Mental Health Neuroscience Department, Division of Psychiatry and Max Planck Centre for Computational Psychiatry and Ageing Research, Queen Square Institute of Neurology, University College London, London, United Kingdom
| |
Collapse
|
33
|
Overgaard M, Preston C, Aspell J. The self, its body and its brain. Sci Rep 2023; 13:12761. [PMID: 37550400 PMCID: PMC10406811 DOI: 10.1038/s41598-023-39959-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 08/02/2023] [Indexed: 08/09/2023] Open
Affiliation(s)
- Morten Overgaard
- CFIN, Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.
| | | | - Jane Aspell
- School of Psychology and Sport Science, Anglia Ruskin University, Cambridge, UK
| |
Collapse
|
34
|
Stanojević M, Brennan JR, Dunagan D, Steedman M, Hale JT. Modeling Structure-Building in the Brain With CCG Parsing and Large Language Models. Cogn Sci 2023; 47:e13312. [PMID: 37417470 DOI: 10.1111/cogs.13312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 06/07/2023] [Accepted: 06/17/2023] [Indexed: 07/08/2023]
Abstract
To model behavioral and neural correlates of language comprehension in naturalistic environments, researchers have turned to broad-coverage tools from natural-language processing and machine learning. Where syntactic structure is explicitly modeled, prior work has relied predominantly on context-free grammars (CFGs), yet such formalisms are not sufficiently expressive for human languages. Combinatory categorial grammars (CCGs) are sufficiently expressive directly compositional models of grammar with flexible constituency that affords incremental interpretation. In this work, we evaluate whether a more expressive CCG provides a better model than a CFG for human neural signals collected with functional magnetic resonance imaging (fMRI) while participants listen to an audiobook story. We further test between variants of CCG that differ in how they handle optional adjuncts. These evaluations are carried out against a baseline that includes estimates of next-word predictability from a transformer neural network language model. Such a comparison reveals unique contributions of CCG structure-building predominantly in the left posterior temporal lobe: CCG-derived measures offer a superior fit to neural signals compared to those derived from a CFG. These effects are spatially distinct from bilateral superior temporal effects that are unique to predictability. Neural effects for structure-building are thus separable from predictability during naturalistic listening, and those effects are best characterized by a grammar whose expressive power is motivated on independent linguistic grounds.
Collapse
Affiliation(s)
| | | | | | | | - John T Hale
- Google DeepMind
- Department of Linguistics, University of Georgia
| |
Collapse
|
35
|
Thoret E, Ystad S, Kronland-Martinet R. Hearing as adaptive cascaded envelope interpolation. Commun Biol 2023; 6:671. [PMID: 37355702 PMCID: PMC10290642 DOI: 10.1038/s42003-023-05040-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/12/2023] [Indexed: 06/26/2023] Open
Abstract
The human auditory system is designed to capture and encode sounds from our surroundings and conspecifics. However, the precise mechanisms by which it adaptively extracts the most important spectro-temporal information from sounds are still not fully understood. Previous auditory models have explained sound encoding at the cochlear level using static filter banks, but this vision is incompatible with the nonlinear and adaptive properties of the auditory system. Here we propose an approach that considers the cochlear processes as envelope interpolations inspired by cochlear physiology. It unifies linear and nonlinear adaptive behaviors into a single comprehensive framework that provides a data-driven understanding of auditory coding. It allows simulating a broad range of psychophysical phenomena from virtual pitches and combination tones to consonance and dissonance of harmonic sounds. It further predicts the properties of the cochlear filters such as frequency selectivity. Here we propose a possible link between the parameters of the model and the density of hair cells on the basilar membrane. Cascaded Envelope Interpolation may lead to improvements in sound processing for hearing aids by providing a non-linear, data-driven, way to preprocessing of acoustic signals consistent with peripheral processes.
Collapse
Affiliation(s)
- Etienne Thoret
- Aix Marseille Univ, CNRS, UMR7061 PRISM, UMR7020 LIS, Marseille, France.
- Institute of Language, Communication, and the Brain (ILCB), Marseille, France.
| | - Sølvi Ystad
- CNRS, Aix Marseille Univ, UMR 7061 PRISM, Marseille, France
| | | |
Collapse
|
36
|
Antonello RJ, Vaidya AR, Huth AG. Scaling laws for language encoding models in fMRI. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:21895-21907. [PMID: 39035676 PMCID: PMC11258918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar logarithmic behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.
Collapse
Affiliation(s)
| | - Aditya R Vaidya
- Department of Computer Science, The University of Texas at Austin
| | - Alexander G Huth
- Departments of Computer Science and Neuroscience, The University of Texas at Austin
| |
Collapse
|