1
|
Anderson AJ, Davis C, Lalor EC. Deep-learning models reveal how context and listener attention shape electrophysiological correlates of speech-to-language transformation. PLoS Comput Biol 2024; 20:e1012537. [PMID: 39527649 DOI: 10.1371/journal.pcbi.1012537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 10/04/2024] [Indexed: 11/16/2024] Open
Abstract
To transform continuous speech into words, the human brain must resolve variability across utterances in intonation, speech rate, volume, accents and so on. A promising approach to explaining this process has been to model electroencephalogram (EEG) recordings of brain responses to speech. Contemporary models typically invoke context invariant speech categories (e.g. phonemes) as an intermediary representational stage between sounds and words. However, such models may not capture the complete picture because they do not model the brain mechanism that categorizes sounds and consequently may overlook associated neural representations. By providing end-to-end accounts of speech-to-text transformation, new deep-learning systems could enable more complete brain models. We model EEG recordings of audiobook comprehension with the deep-learning speech recognition system Whisper. We find that (1) Whisper provides a self-contained EEG model of an intermediary representational stage that reflects elements of prelexical and lexical representation and prediction; (2) EEG modeling is more accurate when informed by 5-10s of speech context, which traditional context invariant categorical models do not encode; (3) Deep Whisper layers encoding linguistic structure were more accurate EEG models of selectively attended speech in two-speaker "cocktail party" listening conditions than early layers encoding acoustics. No such layer depth advantage was observed for unattended speech, consistent with a more superficial level of linguistic processing in the brain.
Collapse
Affiliation(s)
- Andrew J Anderson
- Department of Neurology. Medical College of Wisconsin, Milwaukee, Wisconsin United States of America
- Department of Biomedical Engineering. Medical College of Wisconsin. Milwaukee, Wisconsin United States of America
- Department of Neurosurgery. Medical College of Wisconsin. Milwaukee, Wisconsin United States of America
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York, United States of America
| | - Chris Davis
- Western Sydney University, The MARCS Institute for Brain, Behaviour and Development, Westmead Innovation Quarter, Westmead, New South Wales, Australia
| | - Edmund C Lalor
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York, United States of America
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Center for Visual Science, University of Rochester, Rochester, New York, United States of America
| |
Collapse
|
2
|
Botch TL, Finn ES. Neural Representations of Concreteness and Concrete Concepts Are Specific to the Individual. J Neurosci 2024; 44:e0288242024. [PMID: 39349055 PMCID: PMC11551891 DOI: 10.1523/jneurosci.0288-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 08/29/2024] [Accepted: 09/09/2024] [Indexed: 10/02/2024] Open
Abstract
Different people listening to the same story may converge upon a largely shared interpretation while still developing idiosyncratic experiences atop that shared foundation. What linguistic properties support this individualized experience of natural language? Here, we investigate how the "concrete-abstract" axis-the extent to which a word is grounded in sensory experience-relates to within- and across-subject variability in the neural representations of language. Leveraging a dataset of human participants of both sexes who each listened to four auditory stories while undergoing functional magnetic resonance imaging, we demonstrate that neural representations of "concreteness" are both reliable across stories and relatively unique to individuals, while neural representations of "abstractness" are variable both within individuals and across the population. Using natural language processing tools, we show that concrete words exhibit similar neural representations despite spanning larger distances within a high-dimensional semantic space, which potentially reflects an underlying representational signature of sensory experience-namely, imageability-shared by concrete words but absent from abstract words. Our findings situate the concrete-abstract axis as a core dimension that supports both shared and individualized representations of natural language.
Collapse
Affiliation(s)
- Thomas L Botch
- Department of Psychological & Brain Sciences, Dartmouth College, Hanover, New Hampshire 03755
| | - Emily S Finn
- Department of Psychological & Brain Sciences, Dartmouth College, Hanover, New Hampshire 03755
| |
Collapse
|
3
|
Hong Z, Wang H, Zada Z, Gazula H, Turner D, Aubrey B, Niekerken L, Doyle W, Devore S, Dugan P, Friedman D, Devinsky O, Flinker A, Hasson U, Nastase SA, Goldstein A. Scale matters: Large language models with billions (rather than millions) of parameters better match neural representations of natural language. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.12.598513. [PMID: 39005394 PMCID: PMC11244877 DOI: 10.1101/2024.06.12.598513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Recent research has used large language models (LLMs) to study the neural basis of naturalistic language processing in the human brain. LLMs have rapidly grown in complexity, leading to improved language processing capabilities. However, neuroscience researchers haven't kept up with the quick progress in LLM development. Here, we utilized several families of transformer-based LLMs to investigate the relationship between model size and their ability to capture linguistic information in the human brain. Crucially, a subset of LLMs were trained on a fixed training set, enabling us to dissociate model size from architecture and training set size. We used electrocorticography (ECoG) to measure neural activity in epilepsy patients while they listened to a 30-minute naturalistic audio story. We fit electrode-wise encoding models using contextual embeddings extracted from each hidden layer of the LLMs to predict word-level neural signals. In line with prior work, we found that larger LLMs better capture the structure of natural language and better predict neural activity. We also found a log-linear relationship where the encoding performance peaks in relatively earlier layers as model size increases. We also observed variations in the best-performing layer across different brain regions, corresponding to an organized language processing hierarchy.
Collapse
Affiliation(s)
- Zhuoqiao Hong
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Haocheng Wang
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Zaid Zada
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Harshvardhan Gazula
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA
| | - David Turner
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Bobbi Aubrey
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Leonard Niekerken
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Werner Doyle
- New York University Grossman School of Medicine, New York, NY
| | - Sasha Devore
- New York University Grossman School of Medicine, New York, NY
| | - Patricia Dugan
- New York University Grossman School of Medicine, New York, NY
| | - Daniel Friedman
- New York University Grossman School of Medicine, New York, NY
| | - Orrin Devinsky
- New York University Grossman School of Medicine, New York, NY
| | - Adeen Flinker
- New York University Grossman School of Medicine, New York, NY
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Samuel A Nastase
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Ariel Goldstein
- Business School, Data Science Department and Cognitive Science Department, Hebrew University, Jerusalem, Israel
| |
Collapse
|
4
|
McCoy RT, Yao S, Friedman D, Hardy MD, Griffiths TL. Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proc Natl Acad Sci U S A 2024; 121:e2322420121. [PMID: 39365822 PMCID: PMC11474099 DOI: 10.1073/pnas.2322420121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 08/09/2024] [Indexed: 10/06/2024] Open
Abstract
The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that to develop a holistic understanding of these systems, we must consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts, we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. Using this approach-which we call the teleological approach-we identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. To test our predictions, we evaluate five LLMs (GPT-3.5, GPT-4, Claude 3, Llama 3, and Gemini 1.0) on 11 tasks, and we find robust evidence that LLMs are influenced by probability in the hypothesized ways. Many of the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability sentence but only 13% when it is low-probability, even though this task is a deterministic one for which probability should not matter. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system-one that has been shaped by its own particular set of pressures.
Collapse
Affiliation(s)
- R. Thomas McCoy
- Department of Computer Science, Princeton University, Princeton, NJ08542
| | - Shunyu Yao
- Department of Computer Science, Princeton University, Princeton, NJ08542
| | - Dan Friedman
- Department of Computer Science, Princeton University, Princeton, NJ08542
| | - Mathew D. Hardy
- Department of Psychology, Princeton University, Princeton, NJ08542
| | - Thomas L. Griffiths
- Department of Computer Science, Princeton University, Princeton, NJ08542
- Department of Psychology, Princeton University, Princeton, NJ08542
| |
Collapse
|
5
|
Debray S, Dehaene S. Mapping and modeling the semantic space of math concepts. Cognition 2024; 254:105971. [PMID: 39369595 DOI: 10.1016/j.cognition.2024.105971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 09/26/2024] [Accepted: 09/27/2024] [Indexed: 10/08/2024]
Abstract
Mathematics is an underexplored domain of human cognition. While many studies have focused on subsets of math concepts such as numbers, fractions, or geometric shapes, few have ventured beyond these elementary domains. Here, we attempted to map out the full space of math concepts and to answer two specific questions: can distributed semantic models, such a GloVe, provide a satisfactory fit to human semantic judgements in mathematics? And how does this fit vary with education? We first analyzed all of the French and English Wikipedia pages with math contents, and used a semi-automatic procedure to extract the 1000 most frequent math terms in both languages. In a second step, we collected extensive behavioral judgements of familiarity and semantic similarity between them. About half of the variance in human similarity judgements was explained by vector embeddings that attempt to capture latent semantic structures based on cooccurence statistics. Participants' self-reported level of education modulated familiarity and similarity, allowing us to create a partial hierarchy among high-level math concepts. Our results converge onto the proposal of a map of math space, organized as a database of math terms with information about their frequency, familiarity, grade of acquisition, and entanglement with other concepts.
Collapse
Affiliation(s)
- Samuel Debray
- Cognitive Neuroimaging Unit, Institut National de la Santé et de la Recherche Médicale, Commissariat à l'Energie Atomique et aux énergies alternatives, Centre National de la Recherche Scientifique, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, France.
| | - Stanislas Dehaene
- Cognitive Neuroimaging Unit, Institut National de la Santé et de la Recherche Médicale, Commissariat à l'Energie Atomique et aux énergies alternatives, Centre National de la Recherche Scientifique, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, France; Collège de France, Université Paris Sciences & Lettres, Paris, France.
| |
Collapse
|
6
|
Regev TI, Casto C, Hosseini EA, Adamek M, Ritaccio AL, Willie JT, Brunner P, Fedorenko E. Neural populations in the language network differ in the size of their temporal receptive windows. Nat Hum Behav 2024; 8:1924-1942. [PMID: 39187713 DOI: 10.1038/s41562-024-01944-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 07/03/2024] [Indexed: 08/28/2024]
Abstract
Despite long knowing what brain areas support language comprehension, our knowledge of the neural computations that these frontal and temporal regions implement remains limited. One important unresolved question concerns functional differences among the neural populations that comprise the language network. Here we leveraged the high spatiotemporal resolution of human intracranial recordings (n = 22) to examine responses to sentences and linguistically degraded conditions. We discovered three response profiles that differ in their temporal dynamics. These profiles appear to reflect different temporal receptive windows, with average windows of about 1, 4 and 6 words, respectively. Neural populations exhibiting these profiles are interleaved across the language network, which suggests that all language regions have direct access to distinct, multiscale representations of linguistic input-a property that may be critical for the efficiency and robustness of language processing.
Collapse
Affiliation(s)
- Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Colton Casto
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA, USA.
| | - Eghbal A Hosseini
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Markus Adamek
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
| | | | - Jon T Willie
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
| | - Peter Brunner
- National Center for Adaptive Neurotechnologies, Albany, NY, USA
- Department of Neurosurgery, Washington University School of Medicine, St Louis, MO, USA
- Department of Neurology, Albany Medical College, Albany, NY, USA
| | - Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology (SHBT), Harvard University, Boston, MA, USA.
| |
Collapse
|
7
|
Zada Z, Goldstein A, Michelmann S, Simony E, Price A, Hasenfratz L, Barham E, Zadbood A, Doyle W, Friedman D, Dugan P, Melloni L, Devore S, Flinker A, Devinsky O, Nastase SA, Hasson U. A shared model-based linguistic space for transmitting our thoughts from brain to brain in natural conversations. Neuron 2024; 112:3211-3222.e5. [PMID: 39096896 PMCID: PMC11427153 DOI: 10.1016/j.neuron.2024.06.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 03/26/2024] [Accepted: 06/25/2024] [Indexed: 08/05/2024]
Abstract
Effective communication hinges on a mutual understanding of word meaning in different contexts. We recorded brain activity using electrocorticography during spontaneous, face-to-face conversations in five pairs of epilepsy patients. We developed a model-based coupling framework that aligns brain activity in both speaker and listener to a shared embedding space from a large language model (LLM). The context-sensitive LLM embeddings allow us to track the exchange of linguistic information, word by word, from one brain to another in natural conversations. Linguistic content emerges in the speaker's brain before word articulation and rapidly re-emerges in the listener's brain after word articulation. The contextual embeddings better capture word-by-word neural alignment between speaker and listener than syntactic and articulatory models. Our findings indicate that the contextual embeddings learned by LLMs can serve as an explicit numerical model of the shared, context-rich meaning space humans use to communicate their thoughts to one another.
Collapse
Affiliation(s)
- Zaid Zada
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA.
| | - Ariel Goldstein
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Department of Cognitive and Brain Sciences and Business School, Hebrew University, Jerusalem 9190501, Israel
| | - Sebastian Michelmann
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Erez Simony
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Faculty of Engineering, Holon Institute of Technology, Holon 5810201, Israel
| | - Amy Price
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Liat Hasenfratz
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Emily Barham
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Asieh Zadbood
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Department of Psychology, Columbia University, New York, NY 10027, USA
| | - Werner Doyle
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Daniel Friedman
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Patricia Dugan
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Lucia Melloni
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Sasha Devore
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Adeen Flinker
- Grossman School of Medicine, New York University, New York, NY 10016, USA; Tandon School of Engineering, New York University, New York, NY 10016, USA
| | - Orrin Devinsky
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Samuel A Nastase
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Uri Hasson
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
8
|
Pacheco-Estefan D, Fellner MC, Kunz L, Zhang H, Reinacher P, Roy C, Brandt A, Schulze-Bonhage A, Yang L, Wang S, Liu J, Xue G, Axmacher N. Maintenance and transformation of representational formats during working memory prioritization. Nat Commun 2024; 15:8234. [PMID: 39300141 DOI: 10.1038/s41467-024-52541-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 09/11/2024] [Indexed: 09/22/2024] Open
Abstract
Visual working memory depends on both material-specific brain areas in the ventral visual stream (VVS) that support the maintenance of stimulus representations and on regions in the prefrontal cortex (PFC) that control these representations. How executive control prioritizes working memory contents and whether this affects their representational formats remains an open question, however. Here, we analyzed intracranial EEG (iEEG) recordings in epilepsy patients with electrodes in VVS and PFC who performed a multi-item working memory task involving a retro-cue. We employed Representational Similarity Analysis (RSA) with various Deep Neural Network (DNN) architectures to investigate the representational format of prioritized VWM content. While recurrent DNN representations matched PFC representations in the beta band (15-29 Hz) following the retro-cue, they corresponded to VVS representations in a lower frequency range (3-14 Hz) towards the end of the maintenance period. Our findings highlight the distinct coding schemes and representational formats of prioritized content in VVS and PFC.
Collapse
Affiliation(s)
- Daniel Pacheco-Estefan
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany.
| | - Marie-Christin Fellner
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
| | - Lukas Kunz
- Department of Epileptology, University Hospital Bonn, Bonn, Germany
| | - Hui Zhang
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
| | - Peter Reinacher
- Department of Stereotactic and Functional Neurosurgery, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
- Fraunhofer Institute for Laser Technology, Aachen, Germany
| | - Charlotte Roy
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Armin Brandt
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Andreas Schulze-Bonhage
- Epilepsy Center, Medical Center - Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Linglin Yang
- Department of Psychiatry, Second Affiliated Hospital, School of medicine, Zhejiang University, Hangzhou, China
| | - Shuang Wang
- Department of Neurology, Epilepsy center, Second Affiliated Hospital, School of medicine, Zhejiang University, Hangzhou, China
| | - Jing Liu
- Department of Applied Social Sciences, The Hong Kong Polytechnic University, Hong Kong, Hong Kong SAR
| | - Gui Xue
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, PR China
| | - Nikolai Axmacher
- Department of Neuropsychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University Bochum, 44801, Bochum, Germany
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, PR China
| |
Collapse
|
9
|
Dima DC, Janarthanan S, Culham JC, Mohsenzadeh Y. Shared representations of human actions across vision and language. Neuropsychologia 2024; 202:108962. [PMID: 39047974 DOI: 10.1016/j.neuropsychologia.2024.108962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/26/2024] [Accepted: 07/20/2024] [Indexed: 07/27/2024]
Abstract
Humans can recognize and communicate about many actions performed by others. How are actions organized in the mind, and is this organization shared across vision and language? We collected similarity judgments of human actions depicted through naturalistic videos and sentences, and tested four models of action categorization, defining actions at different levels of abstraction ranging from specific (action verb) to broad (action target: whether an action is directed towards an object, another person, or the self). The similarity judgments reflected a shared organization of action representations across videos and sentences, determined mainly by the target of actions, even after accounting for other semantic features. Furthermore, language model embeddings predicted the behavioral similarity of action videos and sentences, and captured information about the target of actions alongside unique semantic information. Together, our results show that action concepts are similarly organized in the mind across vision and language, and that this organization reflects socially relevant goals.
Collapse
Affiliation(s)
- Diana C Dima
- Dept of Computer Science, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
| | | | - Jody C Culham
- Dept of Psychology, Western University, London, Ontario, Canada
| | - Yalda Mohsenzadeh
- Dept of Computer Science, Western University, London, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| |
Collapse
|
10
|
Wood JN, Pandey L, Wood SMW. Digital Twin Studies for Reverse Engineering the Origins of Visual Intelligence. Annu Rev Vis Sci 2024; 10:145-170. [PMID: 39292554 DOI: 10.1146/annurev-vision-101322-103628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
What are the core learning algorithms in brains? Nativists propose that intelligence emerges from innate domain-specific knowledge systems, whereas empiricists propose that intelligence emerges from domain-general systems that learn domain-specific knowledge from experience. We address this debate by reviewing digital twin studies designed to reverse engineer the learning algorithms in newborn brains. In digital twin studies, newborn animals and artificial agents are raised in the same environments and tested with the same tasks, permitting direct comparison of their learning abilities. Supporting empiricism, digital twin studies show that domain-general algorithms learn animal-like object perception when trained on the first-person visual experiences of newborn animals. Supporting nativism, digital twin studies show that domain-general algorithms produce innate domain-specific knowledge when trained on prenatal experiences (retinal waves). We argue that learning across humans, animals, and machines can be explained by a universal principle, which we call space-time fitting. Space-time fitting explains both empiricist and nativist phenomena, providing a unified framework for understanding the origins of intelligence.
Collapse
Affiliation(s)
- Justin N Wood
- Informatics Department, Indiana University Bloomington, Bloomington, Indiana, USA; , ,
- Cognitive Science Program, Indiana University Bloomington, Bloomington, Indiana, USA
- Neuroscience Department, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Lalit Pandey
- Informatics Department, Indiana University Bloomington, Bloomington, Indiana, USA; , ,
| | - Samantha M W Wood
- Informatics Department, Indiana University Bloomington, Bloomington, Indiana, USA; , ,
- Cognitive Science Program, Indiana University Bloomington, Bloomington, Indiana, USA
- Neuroscience Department, Indiana University Bloomington, Bloomington, Indiana, USA
| |
Collapse
|
11
|
Cuskley C, Woods R, Flaherty M. The Limitations of Large Language Models for Understanding Human Language and Cognition. Open Mind (Camb) 2024; 8:1058-1083. [PMID: 39229609 PMCID: PMC11370970 DOI: 10.1162/opmi_a_00160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 07/19/2024] [Indexed: 09/05/2024] Open
Abstract
Researchers have recently argued that the capabilities of Large Language Models (LLMs) can provide new insights into longstanding debates about the role of learning and/or innateness in the development and evolution of human language. Here, we argue on two grounds that LLMs alone tell us very little about human language and cognition in terms of acquisition and evolution. First, any similarities between human language and the output of LLMs are purely functional. Borrowing the "four questions" framework from ethology, we argue that what LLMs do is superficially similar, but how they do it is not. In contrast to the rich multimodal data humans leverage in interactive language learning, LLMs rely on immersive exposure to vastly greater quantities of unimodal text data, with recent multimodal efforts built upon mappings between images and text. Second, turning to functional similarities between human language and LLM output, we show that human linguistic behavior is much broader. LLMs were designed to imitate the very specific behavior of human writing; while they do this impressively, the underlying mechanisms of these models limit their capacities for meaning and naturalistic interaction, and their potential for dealing with the diversity in human language. We conclude by emphasising that LLMs are not theories of language, but tools that may be used to study language, and that can only be effectively applied with specific hypotheses to motivate research.
Collapse
Affiliation(s)
- Christine Cuskley
- Language Evolution, Acquisition and Development Group, Newcastle University, Newcastle upon Tyne, UK
| | - Rebecca Woods
- Language Evolution, Acquisition and Development Group, Newcastle University, Newcastle upon Tyne, UK
| | - Molly Flaherty
- Department of Psychology, Davidson College, Davidson, NC, USA
| |
Collapse
|
12
|
Čeko M, Hirshfield L, Doherty E, Southwell R, D'Mello SK. Cortical cognitive processing during reading captured using functional-near infrared spectroscopy. Sci Rep 2024; 14:19483. [PMID: 39174562 PMCID: PMC11341567 DOI: 10.1038/s41598-024-69630-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 08/07/2024] [Indexed: 08/24/2024] Open
Abstract
Neuroimaging studies using functional magnetic resonance imaging (fMRI) have provided unparalleled insights into the fundamental neural mechanisms underlying human cognitive processing, such as high-level linguistic processes during reading. Here, we build upon this prior work to capture sentence reading comprehension outside the MRI scanner using functional near infra-red spectroscopy (fNIRS) in a large sample of participants (n = 82). We observed increased task-related hemodynamic responses in prefrontal and temporal cortical regions during sentence-level reading relative to the control condition (a list of non-words), replicating prior fMRI work on cortical recruitment associated with high-level linguistic processing during reading comprehension. These results lay the groundwork towards developing adaptive systems to support novice readers and language learners by targeting the underlying cognitive processes. This work also contributes to bridging the gap between laboratory findings and more real-world applications in the realm of cognitive neuroscience.
Collapse
Affiliation(s)
- Marta Čeko
- Institute of Cognitive Science, University of Colorado Boulder, 1777 Exposition Drive, Boulder, CO, 80305, USA.
| | - Leanne Hirshfield
- Institute of Cognitive Science, University of Colorado Boulder, 1777 Exposition Drive, Boulder, CO, 80305, USA.
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, USA.
| | - Emily Doherty
- Institute of Cognitive Science, University of Colorado Boulder, 1777 Exposition Drive, Boulder, CO, 80305, USA
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, USA
| | - Rosy Southwell
- Institute of Cognitive Science, University of Colorado Boulder, 1777 Exposition Drive, Boulder, CO, 80305, USA
| | - Sidney K D'Mello
- Institute of Cognitive Science, University of Colorado Boulder, 1777 Exposition Drive, Boulder, CO, 80305, USA
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, USA
| |
Collapse
|
13
|
Tuckute G, Kanwisher N, Fedorenko E. Language in Brains, Minds, and Machines. Annu Rev Neurosci 2024; 47:277-301. [PMID: 38669478 DOI: 10.1146/annurev-neuro-120623-101142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
It has long been argued that only humans could produce and understand language. But now, for the first time, artificial language models (LMs) achieve this feat. Here we survey the new purchase LMs are providing on the question of how language is implemented in the brain. We discuss why, a priori, LMs might be expected to share similarities with the human language system. We then summarize evidence that LMs represent linguistic information similarly enough to humans to enable relatively accurate brain encoding and decoding during language processing. Finally, we examine which LM properties-their architecture, task performance, or training-are critical for capturing human neural responses to language and review studies using LMs as in silico model organisms for testing hypotheses about language. These ongoing investigations bring us closer to understanding the representations and processes that underlie our ability to comprehend sentences and express thoughts in language.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| |
Collapse
|
14
|
Margalit E, Lee H, Finzi D, DiCarlo JJ, Grill-Spector K, Yamins DLK. A unifying framework for functional organization in early and higher ventral visual cortex. Neuron 2024; 112:2435-2451.e7. [PMID: 38733985 PMCID: PMC11257790 DOI: 10.1016/j.neuron.2024.04.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 12/08/2023] [Accepted: 04/15/2024] [Indexed: 05/13/2024]
Abstract
A key feature of cortical systems is functional organization: the arrangement of functionally distinct neurons in characteristic spatial patterns. However, the principles underlying the emergence of functional organization in the cortex are poorly understood. Here, we develop the topographic deep artificial neural network (TDANN), the first model to predict several aspects of the functional organization of multiple cortical areas in the primate visual system. We analyze the factors driving the TDANN's success and find that it balances two objectives: learning a task-general sensory representation and maximizing the spatial smoothness of responses according to a metric that scales with cortical surface area. In turn, the representations learned by the TDANN are more brain-like than in spatially unconstrained models. Finally, we provide evidence that the TDANN's functional organization balances performance with between-area connection length. Our results offer a unified principle for understanding the functional organization of the primate ventral visual system.
Collapse
Affiliation(s)
- Eshed Margalit
- Neurosciences Graduate Program, Stanford University, Stanford, CA 94305, USA.
| | - Hyodong Lee
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Dawn Finzi
- Department of Psychology, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - James J DiCarlo
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Kalanit Grill-Spector
- Department of Psychology, Stanford University, Stanford, CA 94305, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305, USA
| | - Daniel L K Yamins
- Department of Psychology, Stanford University, Stanford, CA 94305, USA; Department of Computer Science, Stanford University, Stanford, CA 94305, USA; Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
15
|
Alonso-Sanchez MF, Z-Rivera L, Otero M, Portal J, Cavieres Á, Alfaro-Faccio P. Aberrant brain language network in schizophrenia spectrum disorder: a systematic review of its relation to language signs beyond symptoms. Front Psychiatry 2024; 15:1244694. [PMID: 39026525 PMCID: PMC11254709 DOI: 10.3389/fpsyt.2024.1244694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 06/07/2024] [Indexed: 07/20/2024] Open
Abstract
Background Language disturbances are a core feature of schizophrenia, often studied as a formal thought disorder. The neurobiology of language in schizophrenia has been addressed within the same framework, that language and thought are equivalents considering symptoms and not signs. This review aims to systematically examine published peer-reviewed studies that employed neuroimaging techniques to investigate aberrant brain-language networks in individuals with schizophrenia in relation to linguistic signs. Methods We employed a language model for automatic data extraction. We selected our studies according to the PRISMA recommendations, and we conducted the quality assessment of the selected studies according to the STROBE guidance. Results We analyzed the findings from 37 studies, categorizing them based on patient characteristics, brain measures, and language task types. The inferior frontal gyrus (IFG) and superior temporal gyrus (STG) exhibited the most significant differences among these studies and paradigms. Conclusions We propose guidelines for future research in this field based on our analysis. It is crucial to investigate larger networks involved in language processing, and language models with brain metrics must be integrated to enhance our understanding of the relationship between language and brain abnormalities in schizophrenia.
Collapse
Affiliation(s)
- María F. Alonso-Sanchez
- Escuela de Fonoaudiología, Centro de Investigación del Desarrollo en Cognición y Lenguaje (CIDCL), Facultad de Medicina, Universidad de Valparaíso, Viña del Mar, Chile
| | - Lucía Z-Rivera
- Advanced Center for Electrical and Electronic Engineering (AC3E), Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Mónica Otero
- Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Santiago de Chile, Chile
- Centro BASAL Ciencia & Vida, Universidad San Sebastián, Santiago de Chile, Chile
| | - Jorge Portal
- Advanced Center for Electrical and Electronic Engineering (AC3E), Universidad Técnica Federico Santa María, Valparaíso, Chile
- Departamento de Electrónica, Univeridad Técnica Federico Santa María (USM), Valparaíso, Chile
| | - Álvaro Cavieres
- Departamento de Psiquiatría, Escuela de Medicina, Universidad de Valparaíso, Valparaíso, Chile
| | - Pedro Alfaro-Faccio
- Instituto de Literatura y Ciencias del Lenguaje, Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
| |
Collapse
|
16
|
Kumar S, Sumers TR, Yamakoshi T, Goldstein A, Hasson U, Norman KA, Griffiths TL, Hawkins RD, Nastase SA. Shared functional specialization in transformer-based language models and the human brain. Nat Commun 2024; 15:5523. [PMID: 38951520 PMCID: PMC11217339 DOI: 10.1038/s41467-024-49173-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 05/24/2024] [Indexed: 07/03/2024] Open
Abstract
When processing language, the brain is thought to deploy specialized computations to construct meaning from complex linguistic structures. Recently, artificial neural networks based on the Transformer architecture have revolutionized the field of natural language processing. Transformers integrate contextual information across words via structured circuit computations. Prior work has focused on the internal representations ("embeddings") generated by these circuits. In this paper, we instead analyze the circuit computations directly: we deconstruct these computations into the functionally-specialized "transformations" that integrate contextual information across words. Using functional MRI data acquired while participants listened to naturalistic stories, we first verify that the transformations account for considerable variance in brain activity across the cortical language network. We then demonstrate that the emergent computations performed by individual, functionally-specialized "attention heads" differentially predict brain activity in specific cortical regions. These heads fall along gradients corresponding to different layers and context lengths in a low-dimensional cortical space.
Collapse
Affiliation(s)
- Sreejan Kumar
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| | - Theodore R Sumers
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.
| | - Takateru Yamakoshi
- Faculty of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Ariel Goldstein
- Department of Cognitive and Brain Sciences and Business School, Hebrew University, Jerusalem, 9190401, Israel
| | - Uri Hasson
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Kenneth A Norman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Thomas L Griffiths
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Robert D Hawkins
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Samuel A Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| |
Collapse
|
17
|
Subramaniam V, Conwell C, Wang C, Kreiman G, Katz B, Cases I, Barbu A. Revealing Vision-Language Integration in the Brain with Multimodal Networks. ARXIV 2024:arXiv:2406.14481v1. [PMID: 38947929 PMCID: PMC11213144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoen-cephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models. Our target DNN models span different architectures (e.g., convolutional networks and transformers) and multimodal training techniques (e.g., cross-attention and contrastive learning). As a key enabling step, we first demonstrate that trained vision and language models systematically outperform their randomly initialized counterparts in their ability to predict SEEG signals. We then compare unimodal and multimodal models against one another. Because our target DNN models often have different architectures, number of parameters, and training sets (possibly obscuring those differences attributable to integration), we carry out a controlled comparison of two models (SLIP and SimCLR), which keep all of these attributes the same aside from input modality. Using this approach, we identify a sizable number of neural sites (on average 141 out of 1090 total sites or 12.94%) and brain regions where multimodal integration seems to occur. Additionally, we find that among the variants of multimodal training techniques we assess, CLIP-style training is the best suited for downstream prediction of the neural activity in these sites.
Collapse
Affiliation(s)
| | - Colin Conwell
- Department of Cognitive Science, Johns Hopkins University
| | | | | | | | | | | |
Collapse
|
18
|
Waldrop MM. Can ChatGPT help researchers understand how the human brain handles language? Proc Natl Acad Sci U S A 2024; 121:e2410196121. [PMID: 38875152 PMCID: PMC11194597 DOI: 10.1073/pnas.2410196121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024] Open
|
19
|
Graves WW, Levinson HJ, Staples R, Boukrina O, Rothlein D, Purcell J. An inclusive multivariate approach to neural localization of language components. Brain Struct Funct 2024; 229:1243-1263. [PMID: 38693340 PMCID: PMC11147878 DOI: 10.1007/s00429-024-02800-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 04/22/2024] [Indexed: 05/03/2024]
Abstract
To determine how language is implemented in the brain, it is important to know which brain areas are primarily engaged in language processing and which are not. Existing protocols for localizing language are typically univariate, treating each small unit of brain volume as independent. One prominent example that focuses on the overall language network in functional magnetic resonance imaging (fMRI) uses a contrast between neural responses to sentences and sets of pseudowords (pronounceable nonwords). This contrast reliably activates peri-sylvian language areas but is less sensitive to extra-sylvian areas that are also known to support aspects of language such as word meanings (semantics). In this study, we assess areas where a multivariate, pattern-based approach shows high reproducibility across multiple measurements and participants, identifying these areas as multivariate regions of interest (mROI). We then perform a representational similarity analysis (RSA) of an fMRI dataset where participants made familiarity judgments on written words. We also compare those results to univariate regions of interest (uROI) taken from previous sentences > pseudowords contrasts. RSA with word stimuli defined in terms of their semantic distance showed greater correspondence with neural patterns in mROI than uROI. This was confirmed in two independent datasets, one involving single-word recognition, and the other focused on the meaning of noun-noun phrases by contrasting meaningful phrases > pseudowords. In all cases, areas of spatial overlap between mROI and uROI showed the greatest neural association. This suggests that ROIs defined in terms of multivariate reproducibility can help localize components of language such as semantics. The multivariate approach can also be extended to focus on other aspects of language such as phonology, and can be used along with the univariate approach for inclusively mapping language cortex.
Collapse
Affiliation(s)
- William W Graves
- Department of Psychology, Rutgers University, Smith Hall, Room 301, 101 Warren Street, Newark, NJ, 07102, USA.
| | - Hillary J Levinson
- Department of Psychology, Rutgers University, Smith Hall, Room 301, 101 Warren Street, Newark, NJ, 07102, USA
| | - Ryan Staples
- Georgetown University Medical Center, Washington, DC, USA
| | | | | | | |
Collapse
|
20
|
Mahowald K, Ivanova AA, Blank IA, Kanwisher N, Tenenbaum JB, Fedorenko E. Dissociating language and thought in large language models. Trends Cogn Sci 2024; 28:517-540. [PMID: 38508911 DOI: 10.1016/j.tics.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 03/22/2024]
Abstract
Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence (knowledge of linguistic rules and patterns) and functional linguistic competence (understanding and using language in the world). We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. We posit that models that use language in human-like ways would need to master both of these competence types, which, in turn, could require the emergence of separate mechanisms specialized for formal versus functional linguistic competence.
Collapse
|
21
|
Fedorenko E, Piantadosi ST, Gibson EAF. Language is primarily a tool for communication rather than thought. Nature 2024; 630:575-586. [PMID: 38898296 DOI: 10.1038/s41586-024-07522-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/03/2024] [Indexed: 06/21/2024]
Abstract
Language is a defining characteristic of our species, but the function, or functions, that it serves has been debated for centuries. Here we bring recent evidence from neuroscience and allied disciplines to argue that in modern humans, language is a tool for communication, contrary to a prominent view that we use language for thinking. We begin by introducing the brain network that supports linguistic ability in humans. We then review evidence for a double dissociation between language and thought, and discuss several properties of language that suggest that it is optimized for communication. We conclude that although the emergence of language has unquestionably transformed human culture, language does not appear to be a prerequisite for complex thought, including symbolic thought. Instead, language is a powerful tool for the transmission of cultural knowledge; it plausibly co-evolved with our thinking and reasoning capacities, and only reflects, rather than gives rise to, the signature sophistication of human cognition.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Massachusetts Institute of Technology, Cambridge, MA, USA.
- Speech and Hearing in Bioscience and Technology Program at Harvard University, Boston, MA, USA.
| | | | | |
Collapse
|
22
|
Yu S, Gu C, Huang K, Li P. Predicting the next sentence (not word) in large language models: What model-brain alignment tells us about discourse comprehension. SCIENCE ADVANCES 2024; 10:eadn7744. [PMID: 38781343 PMCID: PMC11114233 DOI: 10.1126/sciadv.adn7744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/18/2024] [Indexed: 05/25/2024]
Abstract
Current large language models (LLMs) rely on word prediction as their backbone pretraining task. Although word prediction is an important mechanism underlying language processing, human language comprehension occurs at multiple levels, involving the integration of words and sentences to achieve a full understanding of discourse. This study models language comprehension by using the next sentence prediction (NSP) task to investigate mechanisms of discourse-level comprehension. We show that NSP pretraining enhanced a model's alignment with brain data especially in the right hemisphere and in the multiple demand network, highlighting the contributions of nonclassical language regions to high-level language understanding. Our results also suggest that NSP can enable the model to better capture human comprehension performance and to better encode contextual information. Our study demonstrates that the inclusion of diverse learning objectives in a model leads to more human-like representations, and investigating the neurocognitive plausibility of pretraining tasks in LLMs can shed light on outstanding questions in language neuroscience.
Collapse
Affiliation(s)
- Shaoyun Yu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Chanyuan Gu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kexin Huang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Ping Li
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
- Centre for Immersive Learning and Metaverse in Education, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
23
|
Gastaldon S, Bonfiglio N, Vespignani F, Peressotti F. Predictive language processing: integrating comprehension and production, and what atypical populations can tell us. Front Psychol 2024; 15:1369177. [PMID: 38836235 PMCID: PMC11148270 DOI: 10.3389/fpsyg.2024.1369177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 05/06/2024] [Indexed: 06/06/2024] Open
Abstract
Predictive processing, a crucial aspect of human cognition, is also relevant for language comprehension. In everyday situations, we exploit various sources of information to anticipate and therefore facilitate processing of upcoming linguistic input. In the literature, there are a variety of models that aim at accounting for such ability. One group of models propose a strict relationship between prediction and language production mechanisms. In this review, we first introduce very briefly the concept of predictive processing during language comprehension. Secondly, we focus on models that attribute a prominent role to language production and sensorimotor processing in language prediction ("prediction-by-production" models). Contextually, we provide a summary of studies that investigated the role of speech production and auditory perception on language comprehension/prediction tasks in healthy, typical participants. Then, we provide an overview of the limited existing literature on specific atypical/clinical populations that may represent suitable testing ground for such models-i.e., populations with impaired speech production and auditory perception mechanisms. Ultimately, we suggest a more widely and in-depth testing of prediction-by-production accounts, and the involvement of atypical populations both for model testing and as targets for possible novel speech/language treatment approaches.
Collapse
Affiliation(s)
- Simone Gastaldon
- Dipartimento di Psicologia dello Sviluppo e della Socializzazione, University of Padua, Padua, Italy
- Padova Neuroscience Center, University of Padua, Padua, Italy
| | - Noemi Bonfiglio
- Dipartimento di Psicologia dello Sviluppo e della Socializzazione, University of Padua, Padua, Italy
- BCBL-Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
| | - Francesco Vespignani
- Dipartimento di Psicologia dello Sviluppo e della Socializzazione, University of Padua, Padua, Italy
- Centro Interdipartimentale di Ricerca "I-APPROVE-International Auditory Processing Project in Venice", University of Padua, Padua, Italy
| | - Francesca Peressotti
- Dipartimento di Psicologia dello Sviluppo e della Socializzazione, University of Padua, Padua, Italy
- Padova Neuroscience Center, University of Padua, Padua, Italy
- Centro Interdipartimentale di Ricerca "I-APPROVE-International Auditory Processing Project in Venice", University of Padua, Padua, Italy
| |
Collapse
|
24
|
O'Brien AM, May TA, Koskey KLK, Bungert L, Cardinaux A, Cannon J, Treves IN, D'Mello AM, Joseph RM, Li C, Diamond S, Gabrieli JDE, Sinha P. Development of a Self-Report Measure of Prediction in Daily Life: The Prediction-Related Experiences Questionnaire. J Autism Dev Disord 2024:10.1007/s10803-024-06379-2. [PMID: 38713266 DOI: 10.1007/s10803-024-06379-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/08/2024]
Abstract
PURPOSE Predictions are complex, multisensory, and dynamic processes involving real-time adjustments based on environmental inputs. Disruptions to prediction abilities have been proposed to underlie characteristics associated with autism. While there is substantial empirical literature related to prediction, the field lacks a self-assessment measure of prediction skills related to daily tasks. Such a measure would be useful to better understand the nature of day-to-day prediction-related activities and characterize these abilities in individuals who struggle with prediction. METHODS An interdisciplinary mixed-methods approach was utilized to develop and validate a self-report questionnaire of prediction skills for adults, the Prediction-Related Experiences Questionnaire (PRE-Q). Two rounds of online field testing were completed in samples of autistic and neurotypical (NT) adults. Qualitative feedback from a subset of these participants regarding question content and quality was integrated and Rasch modeling of the item responses was applied. RESULTS The final PRE-Q includes 19 items across 3 domains (Sensory, Motor, Social), with evidence supporting the validity of the measure's 4-point response categories, internal structure, and relationship to other outcome measures associated with prediction. Consistent with models of prediction challenges in autism, autistic participants indicated more prediction-related difficulties than the NT group. CONCLUSIONS This study provides evidence for the validity of a novel self-report questionnaire designed to measure the day-to-day prediction skills of autistic and non-autistic adults. Future research should focus on characterizing the relationship between the PRE-Q and lab-based measures of prediction, and understanding how the PRE-Q may be used to identify potential areas for clinical supports for individuals with prediction-related challenges.
Collapse
Affiliation(s)
- Amanda M O'Brien
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Hock E. Tan and K. Lisa Yang Center for Autism Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Toni A May
- School of Education, Drexel University, Philadelphia, PA, USA
| | | | - Lindsay Bungert
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- The Donald and Barbara Zucker School of Medicine, Hofstra University, Long Island, NY, USA
| | - Annie Cardinaux
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan Cannon
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychology, Neuroscience, and Behaviour, McMaster University, Hamilton, Ontario, Canada
| | - Isaac N Treves
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Anila M D'Mello
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychiatry and O'Donnell Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA
| | - Robert M Joseph
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA, USA
| | - Cindy Li
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Hock E. Tan and K. Lisa Yang Center for Autism Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sidney Diamond
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - John D E Gabrieli
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Hock E. Tan and K. Lisa Yang Center for Autism Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pawan Sinha
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
25
|
Riveland R, Pouget A. Natural language instructions induce compositional generalization in networks of neurons. Nat Neurosci 2024; 27:988-999. [PMID: 38499855 PMCID: PMC11537972 DOI: 10.1038/s41593-024-01607-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 02/15/2024] [Indexed: 03/20/2024]
Abstract
A fundamental human cognitive feat is to interpret linguistic instructions in order to perform novel tasks without explicit task experience. Yet, the neural computations that might be used to accomplish this remain poorly understood. We use advances in natural language processing to create a neural model of generalization based on linguistic instructions. Models are trained on a set of common psychophysical tasks, and receive instructions embedded by a pretrained language model. Our best models can perform a previously unseen task with an average performance of 83% correct based solely on linguistic instructions (that is, zero-shot learning). We found that language scaffolds sensorimotor representations such that activity for interrelated tasks shares a common geometry with the semantic representations of instructions, allowing language to cue the proper composition of practiced skills in unseen settings. We show how this model generates a linguistic description of a novel task it has identified using only motor feedback, which can subsequently guide a partner model to perform the task. Our models offer several experimentally testable predictions outlining how linguistic information must be represented to facilitate flexible and general cognition in the human brain.
Collapse
Affiliation(s)
- Reidar Riveland
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland.
| | - Alexandre Pouget
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
| |
Collapse
|
26
|
Fedorenko E, Ivanova AA, Regev TI. The language network as a natural kind within the broader landscape of the human brain. Nat Rev Neurosci 2024; 25:289-312. [PMID: 38609551 DOI: 10.1038/s41583-024-00802-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 04/14/2024]
Abstract
Language behaviour is complex, but neuroscientific evidence disentangles it into distinct components supported by dedicated brain areas or networks. In this Review, we describe the 'core' language network, which includes left-hemisphere frontal and temporal areas, and show that it is strongly interconnected, independent of input and output modalities, causally important for language and language-selective. We discuss evidence that this language network plausibly stores language knowledge and supports core linguistic computations related to accessing words and constructions from memory and combining them to interpret (decode) or generate (encode) linguistic messages. We emphasize that the language network works closely with, but is distinct from, both lower-level - perceptual and motor - mechanisms and higher-level systems of knowledge and reasoning. The perceptual and motor mechanisms process linguistic signals, but, in contrast to the language network, are sensitive only to these signals' surface properties, not their meanings; the systems of knowledge and reasoning (such as the system that supports social reasoning) are sometimes engaged during language use but are not language-selective. This Review lays a foundation both for in-depth investigations of these different components of the language processing pipeline and for probing inter-component interactions.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- The Program in Speech and Hearing in Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| | - Anna A Ivanova
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
27
|
Gwilliams L, Marantz A, Poeppel D, King JR. Hierarchical dynamic coding coordinates speech comprehension in the brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590280. [PMID: 38659750 PMCID: PMC11042271 DOI: 10.1101/2024.04.19.590280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Speech comprehension requires the human brain to transform an acoustic waveform into meaning. To do so, the brain generates a hierarchy of features that converts the sensory input into increasingly abstract language properties. However, little is known about how these hierarchical features are generated and continuously coordinated. Here, we propose that each linguistic feature is dynamically represented in the brain to simultaneously represent successive events. To test this 'Hierarchical Dynamic Coding' (HDC) hypothesis, we use time-resolved decoding of brain activity to track the construction, maintenance, and integration of a comprehensive hierarchy of language features spanning acoustic, phonetic, sub-lexical, lexical, syntactic and semantic representations. For this, we recorded 21 participants with magnetoencephalography (MEG), while they listened to two hours of short stories. Our analyses reveal three main findings. First, the brain incrementally represents and simultaneously maintains successive features. Second, the duration of these representations depend on their level in the language hierarchy. Third, each representation is maintained by a dynamic neural code, which evolves at a speed commensurate with its corresponding linguistic level. This HDC preserves the maintenance of information over time while limiting the interference between successive features. Overall, HDC reveals how the human brain continuously builds and maintains a language hierarchy during natural speech comprehension, thereby anchoring linguistic theories to their biological implementations.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, Stanford University
- Department of Psychology, New York University
| | - Alec Marantz
- Department of Psychology, New York University
- Department of Linguistics, New York University
| | - David Poeppel
- Department of Psychology, New York University
- Ernst Strungman Institute
| | | |
Collapse
|
28
|
Cai J, Hadjinicolaou AE, Paulk AC, Soper DJ, Xia T, Williams ZM, Cash SS. Natural language processing models reveal neural dynamics of human conversation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.10.531095. [PMID: 36945468 PMCID: PMC10028965 DOI: 10.1101/2023.03.10.531095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
Abstract
Through conversation, humans relay complex information through the alternation of speech production and comprehension. The neural mechanisms that underlie these complementary processes or through which information is precisely conveyed by language, however, remain poorly understood. Here, we used pretrained deep learning natural language processing models in combination with intracranial neuronal recordings to discover neural signals that reliably reflect speech production, comprehension, and their transitions during natural conversation between individuals. Our findings indicate that neural activities that encoded linguistic information were broadly distributed throughout frontotemporal areas across multiple frequency bands. We also find that these activities were specific to the words and sentences being conveyed and that they were dependent on the word's specific context and order. Finally, we demonstrate that these neural patterns partially overlapped during language production and comprehension and that listener-speaker transitions were associated with specific, time-aligned changes in neural activity. Collectively, our findings reveal a dynamical organization of neural activities that subserve language production and comprehension during natural conversation and harness the use of deep learning models in understanding the neural mechanisms underlying human language.
Collapse
Affiliation(s)
- Jing Cai
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Alex E. Hadjinicolaou
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Angelique C. Paulk
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Daniel J. Soper
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Tian Xia
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Ziv M. Williams
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA
- Harvard-MIT Division of Health Sciences and Technology, Boston, MA
- Harvard Medical School, Program in Neuroscience, Boston, MA
- These authors contributed equally
| | - Sydney S. Cash
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
- Harvard-MIT Division of Health Sciences and Technology, Boston, MA
- These authors contributed equally
| |
Collapse
|
29
|
Lyu B, Marslen-Wilson WD, Fang Y, Tyler LK. Finding structure during incremental speech comprehension. eLife 2024; 12:RP89311. [PMID: 38577982 PMCID: PMC10997333 DOI: 10.7554/elife.89311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024] Open
Abstract
A core aspect of human speech comprehension is the ability to incrementally integrate consecutive words into a structured and coherent interpretation, aligning with the speaker's intended meaning. This rapid process is subject to multidimensional probabilistic constraints, including both linguistic knowledge and non-linguistic information within specific contexts, and it is their interpretative coherence that drives successful comprehension. To study the neural substrates of this process, we extract word-by-word measures of sentential structure from BERT, a deep language model, which effectively approximates the coherent outcomes of the dynamic interplay among various types of constraints. Using representational similarity analysis, we tested BERT parse depths and relevant corpus-based measures against the spatiotemporally resolved brain activity recorded by electro-/magnetoencephalography when participants were listening to the same sentences. Our results provide a detailed picture of the neurobiological processes involved in the incremental construction of structured interpretations. These findings show when and where coherent interpretations emerge through the evaluation and integration of multifaceted constraints in the brain, which engages bilateral brain regions extending beyond the classical fronto-temporal language system. Furthermore, this study provides empirical evidence supporting the use of artificial neural networks as computational models for revealing the neural dynamics underpinning complex cognitive processes in the brain.
Collapse
Affiliation(s)
| | - William D Marslen-Wilson
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| | - Yuxing Fang
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| | - Lorraine K Tyler
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| |
Collapse
|
30
|
Hosseini EA, Schrimpf M, Zhang Y, Bowman S, Zaslavsky N, Fedorenko E. Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:43-63. [PMID: 38645622 PMCID: PMC11025646 DOI: 10.1162/nol_a_00137] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 01/09/2024] [Indexed: 04/23/2024]
Abstract
Artificial neural networks have emerged as computationally plausible models of human language processing. A major criticism of these models is that the amount of training data they receive far exceeds that of humans during language learning. Here, we use two complementary approaches to ask how the models' ability to capture human fMRI responses to sentences is affected by the amount of training data. First, we evaluate GPT-2 models trained on 1 million, 10 million, 100 million, or 1 billion words against an fMRI benchmark. We consider the 100-million-word model to be developmentally plausible in terms of the amount of training data given that this amount is similar to what children are estimated to be exposed to during the first 10 years of life. Second, we test the performance of a GPT-2 model trained on a 9-billion-token dataset to reach state-of-the-art next-word prediction performance on the human benchmark at different stages during training. Across both approaches, we find that (i) the models trained on a developmentally plausible amount of data already achieve near-maximal performance in capturing fMRI responses to sentences. Further, (ii) lower perplexity-a measure of next-word prediction performance-is associated with stronger alignment with human data, suggesting that models that have received enough training to achieve sufficiently high next-word prediction performance also acquire representations of sentences that are predictive of human fMRI responses. In tandem, these findings establish that although some training is necessary for the models' predictive ability, a developmentally realistic amount of training (∼100 million words) may suffice.
Collapse
Affiliation(s)
- Eghbal A. Hosseini
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Martin Schrimpf
- The MIT Quest for Intelligence Initiative, Cambridge, MA, USA
- Swiss Federal Institute of Technology, Lausanne, Switzerland
| | - Yian Zhang
- Computer Science Department, Stanford University, Stanford, CA, USA
| | - Samuel Bowman
- Center for Data Science, New York University, New York, NY, USA
- Department of Linguistics, New York University, New York, NY, USA
- Department of Computer Science, New York University, New York, NY, USA
| | - Noga Zaslavsky
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Language Science, University of California, Irvine, CA, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- The MIT Quest for Intelligence Initiative, Cambridge, MA, USA
- Speech and Hearing Bioscience and Technology Program, Harvard University, Boston, MA, USA
| |
Collapse
|
31
|
Lopopolo A, Fedorenko E, Levy R, Rabovsky M. Cognitive Computational Neuroscience of Language: Using Computational Models to Investigate Language Processing in the Brain. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:1-6. [PMID: 38645621 PMCID: PMC11025655 DOI: 10.1162/nol_e_00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Affiliation(s)
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Roger Levy
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Milena Rabovsky
- Department of Psychology, University of Potsdam, Potsdam, Germany
| |
Collapse
|
32
|
Kauf C, Tuckute G, Levy R, Andreas J, Fedorenko E. Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:7-42. [PMID: 38645614 PMCID: PMC11025651 DOI: 10.1162/nol_a_00116] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/11/2023] [Indexed: 04/23/2024]
Abstract
Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences' word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence's syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN's embedding space and decrease the ANN's ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result-that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones-aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.
Collapse
Affiliation(s)
- Carina Kauf
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Roger Levy
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jacob Andreas
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
33
|
Sugimoto Y, Yoshida R, Jeong H, Koizumi M, Brennan JR, Oseki Y. Localizing Syntactic Composition with Left-Corner Recurrent Neural Network Grammars. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:201-224. [PMID: 38645619 PMCID: PMC11025653 DOI: 10.1162/nol_a_00118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 07/24/2023] [Indexed: 04/23/2024]
Abstract
In computational neurolinguistics, it has been demonstrated that hierarchical models such as recurrent neural network grammars (RNNGs), which jointly generate word sequences and their syntactic structures via the syntactic composition, better explained human brain activity than sequential models such as long short-term memory networks (LSTMs). However, the vanilla RNNG has employed the top-down parsing strategy, which has been pointed out in the psycholinguistics literature as suboptimal especially for head-final/left-branching languages, and alternatively the left-corner parsing strategy has been proposed as the psychologically plausible parsing strategy. In this article, building on this line of inquiry, we investigate not only whether hierarchical models like RNNGs better explain human brain activity than sequential models like LSTMs, but also which parsing strategy is more neurobiologically plausible, by developing a novel fMRI corpus where participants read newspaper articles in a head-final/left-branching language, namely Japanese, through the naturalistic fMRI experiment. The results revealed that left-corner RNNGs outperformed both LSTMs and top-down RNNGs in the left inferior frontal and temporal-parietal regions, suggesting that there are certain brain regions that localize the syntactic composition with the left-corner parsing strategy.
Collapse
Affiliation(s)
- Yushi Sugimoto
- Graduate School of Arts and Sciences, University of Tokyo, Tokyo, Japan
| | - Ryo Yoshida
- Graduate School of Arts and Sciences, University of Tokyo, Tokyo, Japan
| | - Hyeonjeong Jeong
- Graduate School of International Cultural Studies, Tohoku University, Sendai, Japan
| | - Masatoshi Koizumi
- Department of Linguistics, Graduate School of Arts and Letters, Tohoku University, Sendai, Japan
| | | | - Yohei Oseki
- Graduate School of Arts and Sciences, University of Tokyo, Tokyo, Japan
| |
Collapse
|
34
|
Fitz H, Hagoort P, Petersson KM. Neurobiological Causal Models of Language Processing. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:225-247. [PMID: 38645618 PMCID: PMC11025648 DOI: 10.1162/nol_a_00133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/18/2023] [Indexed: 04/23/2024]
Abstract
The language faculty is physically realized in the neurobiological infrastructure of the human brain. Despite significant efforts, an integrated understanding of this system remains a formidable challenge. What is missing from most theoretical accounts is a specification of the neural mechanisms that implement language function. Computational models that have been put forward generally lack an explicit neurobiological foundation. We propose a neurobiologically informed causal modeling approach which offers a framework for how to bridge this gap. A neurobiological causal model is a mechanistic description of language processing that is grounded in, and constrained by, the characteristics of the neurobiological substrate. It intends to model the generators of language behavior at the level of implementational causality. We describe key features and neurobiological component parts from which causal models can be built and provide guidelines on how to implement them in model simulations. Then we outline how this approach can shed new light on the core computational machinery for language, the long-term storage of words in the mental lexicon and combinatorial processing in sentence comprehension. In contrast to cognitive theories of behavior, causal models are formulated in the "machine language" of neurobiology which is universal to human cognition. We argue that neurobiological causal modeling should be pursued in addition to existing approaches. Eventually, this approach will allow us to develop an explicit computational neurobiology of language.
Collapse
Affiliation(s)
- Hartmut Fitz
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Neurobiology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Peter Hagoort
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Neurobiology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Karl Magnus Petersson
- Neurobiology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Faculty of Medicine and Biomedical Sciences, University of Algarve, Faro, Portugal
| |
Collapse
|
35
|
Antonello R, Huth A. Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:64-79. [PMID: 38645616 PMCID: PMC11025645 DOI: 10.1162/nol_a_00087] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 10/26/2022] [Indexed: 04/23/2024]
Abstract
Many recent studies have shown that representations drawn from neural network language models are extremely effective at predicting brain responses to natural language. But why do these models work so well? One proposed explanation is that language models and brains are similar because they have the same objective: to predict upcoming words before they are perceived. This explanation is attractive because it lends support to the popular theory of predictive coding. We provide several analyses that cast doubt on this claim. First, we show that the ability to predict future words does not uniquely (or even best) explain why some representations are a better match to the brain than others. Second, we show that within a language model, representations that are best at predicting future words are strictly worse brain models than other representations. Finally, we argue in favor of an alternative explanation for the success of language models in neuroscience: These models are effective at predicting brain responses because they generally capture a wide variety of linguistic phenomena.
Collapse
Affiliation(s)
- Richard Antonello
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
| | - Alexander Huth
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
36
|
Jain S, Vo VA, Wehbe L, Huth AG. Computational Language Modeling and the Promise of In Silico Experimentation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:80-106. [PMID: 38645624 PMCID: PMC11025654 DOI: 10.1162/nol_a_00101] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 01/18/2023] [Indexed: 04/23/2024]
Abstract
Language neuroscience currently relies on two major experimental paradigms: controlled experiments using carefully hand-designed stimuli, and natural stimulus experiments. These approaches have complementary advantages which allow them to address distinct aspects of the neurobiology of language, but each approach also comes with drawbacks. Here we discuss a third paradigm-in silico experimentation using deep learning-based encoding models-that has been enabled by recent advances in cognitive computational neuroscience. This paradigm promises to combine the interpretability of controlled experiments with the generalizability and broad scope of natural stimulus experiments. We show four examples of simulating language neuroscience experiments in silico and then discuss both the advantages and caveats of this approach.
Collapse
Affiliation(s)
- Shailee Jain
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
| | - Vy A. Vo
- Brain-Inspired Computing Lab, Intel Labs, Hillsboro, OR, USA
| | - Leila Wehbe
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alexander G. Huth
- Department of Computer Science, University of Texas at Austin, Austin, TX, USA
- Department of Neuroscience, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
37
|
Uchida T, Lair N, Ishiguro H, Dominey PF. Dissociable Neural Mechanisms for Human Inference Processing Predicted by Static and Contextual Language Models. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:248-263. [PMID: 38645620 PMCID: PMC11025649 DOI: 10.1162/nol_a_00090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 11/07/2022] [Indexed: 04/23/2024]
Abstract
Language models (LMs) continue to reveal non-trivial relations to human language performance and the underlying neurophysiology. Recent research has characterized how word embeddings from an LM can be used to generate integrated discourse representations in order to perform inference on events. The current research investigates how such event knowledge may be coded in distinct manners in different classes of LMs and how this maps onto different forms of human inference processing. To do so, we investigate inference on events using two well-documented human experimental protocols from Metusalem et al. (2012) and McKoon and Ratcliff (1986), compared with two protocols for simpler semantic processing. Interestingly, this reveals a dissociation in the relation between local semantics versus event-inference depending on the LM. In a series of experiments, we observed that for the static LMs (word2vec/GloVe), there was a clear dissociation in the relation between semantics and inference for the two inference tasks. In contrast, for the contextual LMs (BERT/RoBERTa), we observed a correlation between semantic and inference processing for both inference tasks. The experimental results suggest that inference as measured by Metusalem and McKoon rely on dissociable processes. While the static models are able to perform Metusalem inference, only the contextual models succeed in McKoon inference. Interestingly, these dissociable processes may be linked to well-characterized automatic versus strategic inference processes in the psychological literature. This allows us to make predictions about dissociable neurophysiological markers that should be found during human inference processing with these tasks.
Collapse
Affiliation(s)
- Takahisa Uchida
- Ishiguro Lab, Graduate School of Engineering Science, Osaka University, Osaka, Japan
| | - Nicolas Lair
- INSERM UMR1093-CAPS, Université Bourgogne Franche-Comté, UFR des Sciences du Sport, Dijon, France
- Robot Cognition Laboratory, Marey Institute, Dijon, France
| | - Hiroshi Ishiguro
- Ishiguro Lab, Graduate School of Engineering Science, Osaka University, Osaka, Japan
| | - Peter Ford Dominey
- INSERM UMR1093-CAPS, Université Bourgogne Franche-Comté, UFR des Sciences du Sport, Dijon, France
- Robot Cognition Laboratory, Marey Institute, Dijon, France
| |
Collapse
|
38
|
Goldstein A, Grinstein-Dabush A, Schain M, Wang H, Hong Z, Aubrey B, Nastase SA, Zada Z, Ham E, Feder A, Gazula H, Buchnik E, Doyle W, Devore S, Dugan P, Reichart R, Friedman D, Brenner M, Hassidim A, Devinsky O, Flinker A, Hasson U. Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns. Nat Commun 2024; 15:2768. [PMID: 38553456 PMCID: PMC10980748 DOI: 10.1038/s41467-024-46631-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 03/04/2024] [Indexed: 04/02/2024] Open
Abstract
Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. We hypothesize that language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent language. To test this hypothesis, we densely record the neural activity patterns in the inferior frontal gyrus (IFG) of three participants using dense intracranial arrays while they listened to a 30-minute podcast. From these fine-grained spatiotemporal neural recordings, we derive a continuous vectorial representation for each word (i.e., a brain embedding) in each patient. Using stringent zero-shot mapping we demonstrate that brain embeddings in the IFG and the DLM contextual embedding space have common geometric patterns. The common geometric patterns allow us to predict the brain embedding in IFG of a given left-out word based solely on its geometrical relationship to other non-overlapping words in the podcast. Furthermore, we show that contextual embeddings capture the geometry of IFG embeddings better than static word embeddings. The continuous brain embedding space exposes a vector-based neural code for natural language processing in the human brain.
Collapse
Affiliation(s)
- Ariel Goldstein
- Business School, Data Science department and Cognitive Department, Hebrew University, Jerusalem, Israel.
- Google Research, Tel Aviv, Israel.
| | | | | | - Haocheng Wang
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Zhuoqiao Hong
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Bobbi Aubrey
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
- New York University Grossman School of Medicine, New York, NY, USA
| | - Samuel A Nastase
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Zaid Zada
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Eric Ham
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | - Harshvardhan Gazula
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | - Werner Doyle
- New York University Grossman School of Medicine, New York, NY, USA
| | - Sasha Devore
- New York University Grossman School of Medicine, New York, NY, USA
| | - Patricia Dugan
- New York University Grossman School of Medicine, New York, NY, USA
| | - Roi Reichart
- Faculty of Industrial Engineering and Management, Technion, Israel Institute of Technology, Haifa, Israel
| | - Daniel Friedman
- New York University Grossman School of Medicine, New York, NY, USA
| | - Michael Brenner
- Google Research, Tel Aviv, Israel
- School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA
| | | | - Orrin Devinsky
- New York University Grossman School of Medicine, New York, NY, USA
| | - Adeen Flinker
- New York University Grossman School of Medicine, New York, NY, USA
- New York University Tandon School of Engineering, Brooklyn, NY, USA
| | - Uri Hasson
- Google Research, Tel Aviv, Israel
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| |
Collapse
|
39
|
Chen C, Dupré la Tour T, Gallant JL, Klein D, Deniz F. The cortical representation of language timescales is shared between reading and listening. Commun Biol 2024; 7:284. [PMID: 38454134 PMCID: PMC11245628 DOI: 10.1038/s42003-024-05909-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 02/12/2024] [Indexed: 03/09/2024] Open
Abstract
Language comprehension involves integrating low-level sensory inputs into a hierarchy of increasingly high-level features. Prior work studied brain representations of different levels of the language hierarchy, but has not determined whether these brain representations are shared between written and spoken language. To address this issue, we analyze fMRI BOLD data that were recorded while participants read and listened to the same narratives in each modality. Levels of the language hierarchy are operationalized as timescales, where each timescale refers to a set of spectral components of a language stimulus. Voxelwise encoding models are used to determine where different timescales are represented across the cerebral cortex, for each modality separately. These models reveal that between the two modalities timescale representations are organized similarly across the cortical surface. Our results suggest that, after low-level sensory processing, language integration proceeds similarly regardless of stimulus modality.
Collapse
Affiliation(s)
- Catherine Chen
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
| | - Tom Dupré la Tour
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Jack L Gallant
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Daniel Klein
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Fatma Deniz
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA.
- Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Berlin, Germany.
- Bernstein Center for Computational Neuroscience, Berlin, Germany.
| |
Collapse
|
40
|
Bzdok D, Thieme A, Levkovskyy O, Wren P, Ray T, Reddy S. Data science opportunities of large language models for neuroscience and biomedicine. Neuron 2024; 112:698-717. [PMID: 38340718 DOI: 10.1016/j.neuron.2024.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/03/2024] [Accepted: 01/17/2024] [Indexed: 02/12/2024]
Abstract
Large language models (LLMs) are a new asset class in the machine-learning landscape. Here we offer a primer on defining properties of these modeling techniques. We then reflect on new modes of investigation in which LLMs can be used to reframe classic neuroscience questions to deliver fresh answers. We reason that LLMs have the potential to (1) enrich neuroscience datasets by adding valuable meta-information, such as advanced text sentiment, (2) summarize vast information sources to overcome divides between siloed neuroscience communities, (3) enable previously unthinkable fusion of disparate information sources relevant to the brain, (4) help deconvolve which cognitive concepts most usefully grasp phenomena in the brain, and much more.
Collapse
Affiliation(s)
- Danilo Bzdok
- Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada; TheNeuro - Montreal Neurological Institute (MNI), Department of Biomedical Engineering, McGill University, Montreal, QC, Canada.
| | | | | | - Paul Wren
- Mindstate Design Labs, San Francisco, CA, USA
| | - Thomas Ray
- Mindstate Design Labs, San Francisco, CA, USA
| | - Siva Reddy
- Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada; Facebook CIFAR AI Chair; ServiceNow Research
| |
Collapse
|
41
|
Shain C. Word Frequency and Predictability Dissociate in Naturalistic Reading. Open Mind (Camb) 2024; 8:177-201. [PMID: 38476662 PMCID: PMC10932590 DOI: 10.1162/opmi_a_00119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 01/10/2024] [Indexed: 03/14/2024] Open
Abstract
Many studies of human language processing have shown that readers slow down at less frequent or less predictable words, but there is debate about whether frequency and predictability effects reflect separable cognitive phenomena: are cognitive operations that retrieve words from the mental lexicon based on sensory cues distinct from those that predict upcoming words based on context? Previous evidence for a frequency-predictability dissociation is mostly based on small samples (both for estimating predictability and frequency and for testing their effects on human behavior), artificial materials (e.g., isolated constructed sentences), and implausible modeling assumptions (discrete-time dynamics, linearity, additivity, constant variance, and invariance over time), which raises the question: do frequency and predictability dissociate in ordinary language comprehension, such as story reading? This study leverages recent progress in open data and computational modeling to address this question at scale. A large collection of naturalistic reading data (six datasets, >2.2 M datapoints) is analyzed using nonlinear continuous-time regression, and frequency and predictability are estimated using statistical language models trained on more data than is currently typical in psycholinguistics. Despite the use of naturalistic data, strong predictability estimates, and flexible regression models, results converge with earlier experimental studies in supporting dissociable and additive frequency and predictability effects.
Collapse
Affiliation(s)
- Cory Shain
- Department of Brain & Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
42
|
Shain C, Meister C, Pimentel T, Cotterell R, Levy R. Large-scale evidence for logarithmic effects of word predictability on reading time. Proc Natl Acad Sci U S A 2024; 121:e2307876121. [PMID: 38422017 PMCID: PMC10927576 DOI: 10.1073/pnas.2307876121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/11/2023] [Indexed: 03/02/2024] Open
Abstract
During real-time language comprehension, our minds rapidly decode complex meanings from sequences of words. The difficulty of doing so is known to be related to words' contextual predictability, but what cognitive processes do these predictability effects reflect? In one view, predictability effects reflect facilitation due to anticipatory processing of words that are predictable from context. This view predicts a linear effect of predictability on processing demand. In another view, predictability effects reflect the costs of probabilistic inference over sentence interpretations. This view predicts either a logarithmic or a superlogarithmic effect of predictability on processing demand, depending on whether it assumes pressures toward a uniform distribution of information over time. The empirical record is currently mixed. Here, we revisit this question at scale: We analyze six reading datasets, estimate next-word probabilities with diverse statistical language models, and model reading times using recent advances in nonlinear regression. Results support a logarithmic effect of word predictability on processing difficulty, which favors probabilistic inference as a key component of human language processing.
Collapse
Affiliation(s)
- Cory Shain
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Clara Meister
- Department of Computer Science, Institute for Machine Learning, ETH Zürich, Zürich8092, Schweiz
| | - Tiago Pimentel
- Department of Computer Science and Technology, University of Cambridge, CambridgeCB3 0FD, United Kingdom
| | - Ryan Cotterell
- Department of Computer Science, Institute for Machine Learning, ETH Zürich, Zürich8092, Schweiz
| | - Roger Levy
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
43
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. Nat Hum Behav 2024; 8:544-561. [PMID: 38172630 DOI: 10.1038/s41562-023-01783-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 11/10/2023] [Indexed: 01/05/2024]
Abstract
Transformer models such as GPT generate human-like language and are predictive of human brain responses to language. Here, using functional-MRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of the brain response associated with each sentence. We then use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress the activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also non-invasively control neural activity in higher-level cortical areas, such as the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- MIT-IBM Watson AI Lab, Cambridge, MA, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA, USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
44
|
Fairhall SL. Sentence-level embeddings reveal dissociable word- and sentence-level cortical representation across coarse- and fine-grained levels of meaning. BRAIN AND LANGUAGE 2024; 250:105389. [PMID: 38306958 DOI: 10.1016/j.bandl.2024.105389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 01/09/2024] [Accepted: 01/26/2024] [Indexed: 02/04/2024]
Abstract
In this large-sample (N = 64) fMRI study, sentence embeddings (text-embedding-ada-002, OpenAI) and representational similarity analysis were used to contrast sentence-level and word-level semantic representation. Overall, sentence-level information resulted in a 20-25 % increase in the model's ability to captures neural representation when compared to word-level only information (word-order scrambled embeddings). This increase was relatively undifferentiated across the cortex. However, when coarse-grained (across thematic category) and fine-grained (within thematic category) combinatorial meaning were separately assessed, word- and sentence-level representations were seen to strongly dissociate across the cortex and to do so differently as a function of grain. Coarse-grained sentence-level representations were evident in occipitotemporal, ventral temporal and medial prefrontal cortex, while fine-grained differences were seen in lateral prefrontal and parietal cortex, middle temporal gyrus, the precuneus, and medial prefrontal cortex. This result indicates dissociable cortical substrates underly single concept versus combinatorial meaning and that different cortical regions specialise for fine- and coarse-grained meaning.
Collapse
Affiliation(s)
- Scott L Fairhall
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Italy.
| |
Collapse
|
45
|
Li J, Armstrong BC. Probing the Representational Structure of Regular Polysemy via Sense Analogy Questions: Insights from Contextual Word Vectors. Cogn Sci 2024; 48:e13416. [PMID: 38482721 DOI: 10.1111/cogs.13416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/02/2024] [Accepted: 02/05/2024] [Indexed: 05/16/2024]
Abstract
Regular polysemes are sets of ambiguous words that all share the same relationship between their meanings, such as CHICKEN and LOBSTER both referring to an animal or its meat. To probe how a distributional semantic model, here exemplified by bidirectional encoder representations from transformers (BERT), represents regular polysemy, we analyzed whether its embeddings support answering sense analogy questions similar to "is the mapping between CHICKEN (as an animal) and CHICKEN (as a meat) similar to that which maps between LOBSTER (as an animal) to LOBSTER (as a meat)?" We did so using the LRcos model, which combines a logistic regression classifier of different categories (e.g., animal vs. meat) with a measure of cosine similarity. We found that (a) the model was sensitive to the shared structure within a given regular relationship; (b) the shared structure varies across different regular relationships (e.g., animal/meat vs. location/organization), potentially reflective of a "regularity continuum;" (c) some high-order latent structure is shared across different regular relationships, suggestive of a similar latent structure across different types of relationships; and (d) there is a lack of evidence for the aforementioned effects being explained by meaning overlap. Lastly, we found that both components of the LRcos model made important contributions to accurate responding and that a variation of this method could yield an accuracy boost of 10% in answering sense analogy questions. These findings enrich previous theoretical work on regular polysemy with a computationally explicit theory and methods, and provide evidence for an important organizational principle for the mental lexicon and the broader conceptual knowledge system.
Collapse
Affiliation(s)
| | - Blair C Armstrong
- Department of Psychology and Department of Computer Science, University of Toronto BCBL, Basque Center on Cognition, Brain, and Language
| |
Collapse
|
46
|
Karunathilake IMD, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are modulated by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers of the computations underlying speech perception and comprehension.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| |
Collapse
|
47
|
Ohmae K, Ohmae S. Emergence of syntax and word prediction in an artificial neural circuit of the cerebellum. Nat Commun 2024; 15:927. [PMID: 38296954 PMCID: PMC10831061 DOI: 10.1038/s41467-024-44801-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 01/03/2024] [Indexed: 02/02/2024] Open
Abstract
The cerebellum, interconnected with the cerebral neocortex, plays a vital role in human-characteristic cognition such as language processing, however, knowledge about the underlying circuit computation of the cerebellum remains very limited. To gain a better understanding of the computation underlying cerebellar language processing, we developed a biologically constrained cerebellar artificial neural network (cANN) model, which implements the recently identified cerebello-cerebellar recurrent pathway. We found that while cANN acquires prediction of future words, another function of syntactic recognition emerges in the middle layer of the prediction circuit. The recurrent pathway of the cANN was essential for the two language functions, whereas cANN variants with further biological constraints preserved these functions. Considering the uniform structure of cerebellar circuitry across all functional domains, the single-circuit computation, which is the common basis of the two language functions, can be generalized to fundamental cerebellar functions of prediction and grammar-like rule extraction from sequences, that underpin a wide range of cerebellar motor and cognitive functions. This is a pioneering study to understand the circuit computation of human-characteristic cognition using biologically-constrained ANNs.
Collapse
Affiliation(s)
- Keiko Ohmae
- Neuroscience Department, Baylor College of Medicine, Houston, TX, USA
- Chinese Institute for Brain Research (CIBR), Beijing, China
| | - Shogo Ohmae
- Neuroscience Department, Baylor College of Medicine, Houston, TX, USA.
- Chinese Institute for Brain Research (CIBR), Beijing, China.
| |
Collapse
|
48
|
Ellwood IT. Short-term Hebbian learning can implement transformer-like attention. PLoS Comput Biol 2024; 20:e1011843. [PMID: 38277432 PMCID: PMC10849393 DOI: 10.1371/journal.pcbi.1011843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 02/07/2024] [Accepted: 01/19/2024] [Indexed: 01/28/2024] Open
Abstract
Transformers have revolutionized machine learning models of language and vision, but their connection with neuroscience remains tenuous. Built from attention layers, they require a mass comparison of queries and keys that is difficult to perform using traditional neural circuits. Here, we show that neurons can implement attention-like computations using short-term, Hebbian synaptic potentiation. We call our mechanism the match-and-control principle and it proposes that when activity in an axon is synchronous, or matched, with the somatic activity of a neuron that it synapses onto, the synapse can be briefly strongly potentiated, allowing the axon to take over, or control, the activity of the downstream neuron for a short time. In our scheme, the keys and queries are represented as spike trains and comparisons between the two are performed in individual spines allowing for hundreds of key comparisons per query and roughly as many keys and queries as there are neurons in the network.
Collapse
Affiliation(s)
- Ian T. Ellwood
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY, United States of America
| |
Collapse
|
49
|
Pasquiou A, Lakretz Y, Thirion B, Pallier C. Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax, and Context. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:611-636. [PMID: 38144237 PMCID: PMC10745090 DOI: 10.1162/nol_a_00125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/28/2023] [Indexed: 12/26/2023]
Abstract
A fundamental question in neurolinguistics concerns the brain regions involved in syntactic and semantic processing during speech comprehension, both at the lexical (word processing) and supra-lexical levels (sentence and discourse processing). To what extent are these regions separated or intertwined? To address this question, we introduce a novel approach exploiting neural language models to generate high-dimensional feature sets that separately encode semantic and syntactic information. More precisely, we train a lexical language model, GloVe, and a supra-lexical language model, GPT-2, on a text corpus from which we selectively removed either syntactic or semantic information. We then assess to what extent the features derived from these information-restricted models are still able to predict the fMRI time courses of humans listening to naturalistic text. Furthermore, to determine the windows of integration of brain regions involved in supra-lexical processing, we manipulate the size of contextual information provided to GPT-2. The analyses show that, while most brain regions involved in language comprehension are sensitive to both syntactic and semantic features, the relative magnitudes of these effects vary across these regions. Moreover, regions that are best fitted by semantic or syntactic features are more spatially dissociated in the left hemisphere than in the right one, and the right hemisphere shows sensitivity to longer contexts than the left. The novelty of our approach lies in the ability to control for the information encoded in the models' embeddings by manipulating the training set. These "information-restricted" models complement previous studies that used language models to probe the neural bases of language, and shed new light on its spatial organization.
Collapse
Affiliation(s)
- Alexandre Pasquiou
- Cognitive Neuroimaging Unit (UNICOG), NeuroSpin, National Institute of Health and Medical Research (Inserm) and French Alternative Energies and Atomic Energy Commission (CEA), Frédéric Joliot Life Sciences Institute, Paris-Saclay University, Gif-sur-Yvette, France
- Models and Inference for Neuroimaging Data (MIND), NeuroSpin, French Alternative Energies and Atomic Energy Commission (CEA), Inria Saclay, Frédéric Joliot Life Sciences Institute, Paris-Saclay University, Gif-sur-Yvette, France
| | - Yair Lakretz
- Cognitive Neuroimaging Unit (UNICOG), NeuroSpin, National Institute of Health and Medical Research (Inserm) and French Alternative Energies and Atomic Energy Commission (CEA), Frédéric Joliot Life Sciences Institute, Paris-Saclay University, Gif-sur-Yvette, France
| | - Bertrand Thirion
- Models and Inference for Neuroimaging Data (MIND), NeuroSpin, French Alternative Energies and Atomic Energy Commission (CEA), Inria Saclay, Frédéric Joliot Life Sciences Institute, Paris-Saclay University, Gif-sur-Yvette, France
| | - Christophe Pallier
- Cognitive Neuroimaging Unit (UNICOG), NeuroSpin, National Institute of Health and Medical Research (Inserm) and French Alternative Energies and Atomic Energy Commission (CEA), Frédéric Joliot Life Sciences Institute, Paris-Saclay University, Gif-sur-Yvette, France
| |
Collapse
|
50
|
Toosi T. Representational constraints underlying similarity between task-optimized neural systems. ARXIV 2023:arXiv:2312.08545v1. [PMID: 38168457 PMCID: PMC10760213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Neural systems, artificial and biological, show similar representations of inputs when optimized to perform similar tasks. In visual systems optimized for tasks similar to object recognition, we propose that representation similarities arise from the constraints imposed by the development of abstractions in the representation across the processing stages. To study the effect of abstraction hierarchy of representations across different visual systems, we constructed a two-dimensional space in which each neural representation is positioned based on its distance from the pixel space and the class space. Trajectories of representations in all the task-optimized visual neural networks start close to the pixel space and gradually move towards higher abstract representations, such as object categories. We also observe that proximity in this abstraction space predicts the similarity of neural representations between visual systems. The gradual similar change of the representations suggests that the similarity across different task-optimized systems could arise from constraints on representational trajectories.
Collapse
Affiliation(s)
- Tahereh Toosi
- Center for Theoretical Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY
| |
Collapse
|