51
|
Pulvermüller F. Neurobiological mechanisms for language, symbols and concepts: Clues from brain-constrained deep neural networks. Prog Neurobiol 2023; 230:102511. [PMID: 37482195 PMCID: PMC10518464 DOI: 10.1016/j.pneurobio.2023.102511] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 05/02/2023] [Accepted: 07/18/2023] [Indexed: 07/25/2023]
Abstract
Neural networks are successfully used to imitate and model cognitive processes. However, to provide clues about the neurobiological mechanisms enabling human cognition, these models need to mimic the structure and function of real brains. Brain-constrained networks differ from classic neural networks by implementing brain similarities at different scales, ranging from the micro- and mesoscopic levels of neuronal function, local neuronal links and circuit interaction to large-scale anatomical structure and between-area connectivity. This review shows how brain-constrained neural networks can be applied to study in silico the formation of mechanisms for symbol and concept processing and to work towards neurobiological explanations of specifically human cognitive abilities. These include verbal working memory and learning of large vocabularies of symbols, semantic binding carried by specific areas of cortex, attention focusing and modulation driven by symbol type, and the acquisition of concrete and abstract concepts partly influenced by symbols. Neuronal assembly activity in the networks is analyzed to deliver putative mechanistic correlates of higher cognitive processes and to develop candidate explanations founded in established neurobiological principles.
Collapse
Affiliation(s)
- Friedemann Pulvermüller
- Brain Language Laboratory, Department of Philosophy and Humanities, WE4, Freie Universität Berlin, 14195 Berlin, Germany; Berlin School of Mind and Brain, Humboldt Universität zu Berlin, 10099 Berlin, Germany; Einstein Center for Neurosciences Berlin, 10117 Berlin, Germany; Cluster of Excellence 'Matters of Activity', Humboldt Universität zu Berlin, 10099 Berlin, Germany.
| |
Collapse
|
52
|
Ryskin R, Nieuwland MS. Prediction during language comprehension: what is next? Trends Cogn Sci 2023; 27:1032-1052. [PMID: 37704456 DOI: 10.1016/j.tics.2023.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 09/15/2023]
Abstract
Prediction is often regarded as an integral aspect of incremental language comprehension, but little is known about the cognitive architectures and mechanisms that support it. We review studies showing that listeners and readers use all manner of contextual information to generate multifaceted predictions about upcoming input. The nature of these predictions may vary between individuals owing to differences in language experience, among other factors. We then turn to unresolved questions which may guide the search for the underlying mechanisms. (i) Is prediction essential to language processing or an optional strategy? (ii) Are predictions generated from within the language system or by domain-general processes? (iii) What is the relationship between prediction and memory? (iv) Does prediction in comprehension require simulation via the production system? We discuss promising directions for making progress in answering these questions and for developing a mechanistic understanding of prediction in language.
Collapse
Affiliation(s)
- Rachel Ryskin
- Department of Cognitive and Information Sciences, University of California Merced, 5200 Lake Road, Merced, CA 95343, USA.
| | - Mante S Nieuwland
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands; Donders Institute for Brain, Cognition, and Behaviour, Nijmegen, The Netherlands
| |
Collapse
|
53
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.16.537080. [PMID: 37090673 PMCID: PMC10120732 DOI: 10.1101/2023.04.16.537080] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Transformer models such as GPT generate human-like language and are highly predictive of human brain responses to language. Here, using fMRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of brain response associated with each sentence. Then, we use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also noninvasively control neural activity in higher-level cortical areas, like the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- MIT-IBM Watson AI Lab, Cambridge, MA 02142, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455 USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- The Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138 USA
| |
Collapse
|
54
|
Nour MM, McNamee DC, Liu Y, Dolan RJ. Trajectories through semantic spaces in schizophrenia and the relationship to ripple bursts. Proc Natl Acad Sci U S A 2023; 120:e2305290120. [PMID: 37816054 PMCID: PMC10589662 DOI: 10.1073/pnas.2305290120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/31/2023] [Indexed: 10/12/2023] Open
Abstract
Human cognition is underpinned by structured internal representations that encode relationships between entities in the world (cognitive maps). Clinical features of schizophrenia-from thought disorder to delusions-are proposed to reflect disorganization in such conceptual representations. Schizophrenia is also linked to abnormalities in neural processes that support cognitive map representations, including hippocampal replay and high-frequency ripple oscillations. Here, we report a computational assay of semantically guided conceptual sampling and exploit this to test a hypothesis that people with schizophrenia (PScz) exhibit abnormalities in semantically guided cognition that relate to hippocampal replay and ripples. Fifty-two participants [26 PScz (13 unmedicated) and 26 age-, gender-, and intelligence quotient (IQ)-matched nonclinical controls] completed a category- and letter-verbal fluency task, followed by a magnetoencephalography (MEG) scan involving a separate sequence-learning task. We used a pretrained word embedding model of semantic similarity, coupled to a computational model of word selection, to quantify the degree to which each participant's verbal behavior was guided by semantic similarity. Using MEG, we indexed neural replay and ripple power in a post-task rest session. Across all participants, word selection was strongly influenced by semantic similarity. The strength of this influence showed sensitivity to task demands (category > letter fluency) and predicted performance. In line with our hypothesis, the influence of semantic similarity on behavior was reduced in schizophrenia relative to controls, predicted negative psychotic symptoms, and correlated with an MEG signature of hippocampal ripple power (but not replay). The findings bridge a gap between phenomenological and neurocomputational accounts of schizophrenia.
Collapse
Affiliation(s)
- Matthew M. Nour
- Department of Psychiatry, University of Oxford, OxfordOX3 7JX, United Kingdom
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, LondonWC1B 5EH, United Kingdom
| | - Daniel C. McNamee
- Champalimaud Research, Centre for the Unknown, 1400-038Lisbon, Portugal
| | - Yunzhe Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing100875, China
- Chinese Institute for Brain Research, Beijing102206, China
| | - Raymond J. Dolan
- Max Planck University College London Centre for Computational Psychiatry and Ageing Research, LondonWC1B 5EH, United Kingdom
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing100875, China
- Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3AR, United Kingdom
| |
Collapse
|
55
|
Kumar M, Goldstein A, Michelmann S, Zacks JM, Hasson U, Norman KA. Bayesian Surprise Predicts Human Event Segmentation in Story Listening. Cogn Sci 2023; 47:e13343. [PMID: 37867379 DOI: 10.1111/cogs.13343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 08/28/2023] [Accepted: 09/01/2023] [Indexed: 10/24/2023]
Abstract
Event segmentation theory posits that people segment continuous experience into discrete events and that event boundaries occur when there are large transient increases in prediction error. Here, we set out to test this theory in the context of story listening, by using a deep learning language model (GPT-2) to compute the predicted probability distribution of the next word, at each point in the story. For three stories, we used the probability distributions generated by GPT-2 to compute the time series of prediction error. We also asked participants to listen to these stories while marking event boundaries. We used regression models to relate the GPT-2 measures to the human segmentation data. We found that event boundaries are associated with transient increases in Bayesian surprise but not with a simpler measure of prediction error (surprisal) that tracks, for each word in the story, how strongly that word was predicted at the previous time point. These results support the hypothesis that prediction error serves as a control mechanism governing event segmentation and point to important differences between operational definitions of prediction error.
Collapse
Affiliation(s)
- Manoj Kumar
- Princeton Neuroscience Institute, Princeton University
| | - Ariel Goldstein
- Department of Cognitive and Brain Sciences and Business School, Hebrew University
- Google Research, Tel-Aviv
| | | | - Jeffrey M Zacks
- Department of Psychological & Brain Sciences, Washington University in St. Louis
| | - Uri Hasson
- Princeton Neuroscience Institute, Princeton University
- Department of Psychology, Princeton University
| | - Kenneth A Norman
- Princeton Neuroscience Institute, Princeton University
- Department of Psychology, Princeton University
| |
Collapse
|
56
|
Moro A, Greco M, Cappa SF. Large languages, impossible languages and human brains. Cortex 2023; 167:82-85. [PMID: 37540953 DOI: 10.1016/j.cortex.2023.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/21/2023] [Accepted: 07/11/2023] [Indexed: 08/06/2023]
Abstract
We aim at offering a contribution to highlight the essential differences between Large Language Models (LLM) and the human language faculty. More explicitly, we claim that the existence of impossible languages for humans does not have any equivalent for LLM making them unsuitable models of the human language faculty, especially for a neurobiological point of view. The core part is preceded by two premises bearing on the distinction between machines and humans and the distinction between competence and performance.
Collapse
Affiliation(s)
- Andrea Moro
- Scuola Universitaria Superiore IUSS, Pavia, Italy
| | - Matteo Greco
- Scuola Universitaria Superiore IUSS, Pavia, Italy
| | - Stefano F Cappa
- Scuola Universitaria Superiore IUSS, Pavia, Italy; IRCCS Mondino Foundation, Pavia, Italy.
| |
Collapse
|
57
|
Palaniyappan L, Benrimoh D, Voppel A, Rocca R. Studying Psychosis Using Natural Language Generation: A Review of Emerging Opportunities. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2023; 8:994-1004. [PMID: 38441079 DOI: 10.1016/j.bpsc.2023.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 04/16/2023] [Accepted: 04/19/2023] [Indexed: 03/07/2024]
Abstract
Disrupted language in psychotic disorders, such as schizophrenia, can manifest as false contents and formal deviations, often described as thought disorder. These features play a critical role in the social dysfunction associated with psychosis, but we continue to lack insights regarding how and why these symptoms develop. Natural language generation (NLG) is a field of computer science that focuses on generating human-like language for various applications. The theory that psychosis is related to the evolution of language in humans suggests that NLG systems that are sufficiently evolved to generate human-like language may also exhibit psychosis-like features. In this conceptual review, we propose using NLG systems that are at various stages of development as in silico tools to study linguistic features of psychosis. We argue that a program of in silico experimental research on the network architecture, function, learning rules, and training of NLG systems can help us understand better why thought disorder occurs in patients. This will allow us to gain a better understanding of the relationship between language and psychosis and potentially pave the way for new therapeutic approaches to address this vexing challenge.
Collapse
Affiliation(s)
- Lena Palaniyappan
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, Quebec, Canada; Robarts Research Institute, Western University, London, Ontario, Canada; Department of Medical Biophysics, Western University, London, Ontario, Canada.
| | - David Benrimoh
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, Quebec, Canada; Department of Psychiatry, Stanford University, Palo Alto, California
| | - Alban Voppel
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, Quebec, Canada; Department of Psychiatry, University of Groningen, Groningen, the Netherlands
| | - Roberta Rocca
- Interacting Minds Centre, Department of Culture, Cognition and Computation, Aarhus University, Aarhus, Denmark
| |
Collapse
|
58
|
Patel T, Morales M, Pickering MJ, Hoffman P. A common neural code for meaning in discourse production and comprehension. Neuroimage 2023; 279:120295. [PMID: 37536526 DOI: 10.1016/j.neuroimage.2023.120295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 06/28/2023] [Accepted: 07/23/2023] [Indexed: 08/05/2023] Open
Abstract
How does the brain code the meanings conveyed by language? Neuroimaging studies have investigated this by linking neural activity patterns during discourse comprehension to semantic models of language content. Here, we applied this approach to the production of discourse for the first time. Participants underwent fMRI while producing and listening to discourse on a range of topics. We used a distributional semantic model to quantify the similarity between different speech passages and identified where similarity in neural activity was predicted by semantic similarity. When people produced discourse, speech on similar topics elicited similar activation patterns in a widely distributed and bilateral brain network. This network was overlapping with, but more extensive than, the regions that showed similarity effects during comprehension. Critically, cross-task neural similarities between comprehension and production were also predicted by similarities in semantic content. This result suggests that discourse semantics engages a common neural code that is shared between comprehension and production. Effects of semantic similarity were bilateral in all three RSA analyses, even while univariate activation contrasts in the same data indicated left-lateralised BOLD responses. This indicates that right-hemisphere regions encode semantic properties even when they are not activated above baseline. We suggest that right-hemisphere regions play a supporting role in processing the meaning of discourse during both comprehension and production.
Collapse
Affiliation(s)
- Tanvi Patel
- School of Philosophy, Psychology & Language Sciences, University of Edinburgh, 7 George Square, Edinburgh EH8 9JZ, UK
| | - Matías Morales
- School of Philosophy, Psychology & Language Sciences, University of Edinburgh, 7 George Square, Edinburgh EH8 9JZ, UK
| | - Martin J Pickering
- School of Philosophy, Psychology & Language Sciences, University of Edinburgh, 7 George Square, Edinburgh EH8 9JZ, UK
| | - Paul Hoffman
- School of Philosophy, Psychology & Language Sciences, University of Edinburgh, 7 George Square, Edinburgh EH8 9JZ, UK.
| |
Collapse
|
59
|
Nour MM, Huys QJM. Natural Language Processing in Psychiatry: A Field at an Inflection Point. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2023; 8:979-981. [PMID: 37805191 DOI: 10.1016/j.bpsc.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 08/01/2023] [Indexed: 10/09/2023]
Affiliation(s)
- Matthew M Nour
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, Queen Square Institute of Neurology, University College London, London, United Kingdom.
| | - Quentin J M Huys
- Applied Computational Psychiatry Lab, Mental Health Neuroscience Department, Division of Psychiatry and Max Planck Centre for Computational Psychiatry and Ageing Research, Queen Square Institute of Neurology, University College London, London, United Kingdom
| |
Collapse
|
60
|
Du C, Fu K, Li J, He H. Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:10760-10777. [PMID: 37030711 DOI: 10.1109/tpami.2023.3263181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Decoding human visual neural representations is a challenging task with great scientific significance in revealing vision-processing mechanisms and developing brain-like intelligent machines. Most existing methods are difficult to generalize to novel categories that have no corresponding neural data for training. The two main reasons are 1) the under-exploitation of the multimodal semantic knowledge underlying the neural data and 2) the small number of paired (stimuli-responses) training data. To overcome these limitations, this paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. Specifically, we leverage the mixture-of-product-of-experts formulation to infer a latent code that enables a coherent joint generation of all three modalities. To learn a more consistent joint representation and improve the data efficiency in the case of limited brain activity data, we exploit both intra- and inter-modality mutual information maximization regularization terms. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories. Finally, we construct three trimodal matching datasets, and the extensive experiments lead to some interesting conclusions and cognitive insights: 1) decoding novel visual categories from human brain activity is practically possible with good accuracy; 2) decoding models using the combination of visual and linguistic features perform much better than those using either of them alone; 3) visual perception may be accompanied by linguistic influences to represent the semantics of visual stimuli.
Collapse
|
61
|
Kozachkov L, Kastanenka KV, Krotov D. Building transformers from neurons and astrocytes. Proc Natl Acad Sci U S A 2023; 120:e2219150120. [PMID: 37579149 PMCID: PMC10450673 DOI: 10.1073/pnas.2219150120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 06/22/2023] [Indexed: 08/16/2023] Open
Abstract
Glial cells account for between 50% and 90% of all human brain cells, and serve a variety of important developmental, structural, and metabolic functions. Recent experimental efforts suggest that astrocytes, a type of glial cell, are also directly involved in core cognitive processes such as learning and memory. While it is well established that astrocytes and neurons are connected to one another in feedback loops across many timescales and spatial scales, there is a gap in understanding the computational role of neuron-astrocyte interactions. To help bridge this gap, we draw on recent advances in AI and astrocyte imaging technology. In particular, we show that neuron-astrocyte networks can naturally perform the core computation of a Transformer, a particularly successful type of AI architecture. In doing so, we provide a concrete, normative, and experimentally testable account of neuron-astrocyte communication. Because Transformers are so successful across a wide variety of task domains, such as language, vision, and audition, our analysis may help explain the ubiquity, flexibility, and power of the brain's neuron-astrocyte networks.
Collapse
Affiliation(s)
- Leo Kozachkov
- Massachusetts Institute of Technology-International Business Machines, Watson Artificial Intelligence Laboratory, IBM Research, Cambridge, MA02142
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Ksenia V. Kastanenka
- Department of Neurology, MassGeneral Institute for Neurodegenerative Diseases, Massachusetts General Hospital and Harvard Medical School, Boston, MA02115
| | - Dmitry Krotov
- Massachusetts Institute of Technology-International Business Machines, Watson Artificial Intelligence Laboratory, IBM Research, Cambridge, MA02142
| |
Collapse
|
62
|
Valle-Lisboa JC, Pomi A, Mizraji E. Multiplicative processing in the modeling of cognitive activities in large neural networks. Biophys Rev 2023; 15:767-785. [PMID: 37681105 PMCID: PMC10480136 DOI: 10.1007/s12551-023-01074-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/04/2023] [Indexed: 09/09/2023] Open
Abstract
Explaining the foundation of cognitive abilities in the processing of information by neural systems has been in the beginnings of biophysics since McCulloch and Pitts pioneered work within the biophysics school of Chicago in the 1940s and the interdisciplinary cybernetists meetings in the 1950s, inseparable from the birth of computing and artificial intelligence. Since then, neural network models have traveled a long path, both in the biophysical and the computational disciplines. The biological, neurocomputational aspect reached its representational maturity with the Distributed Associative Memory models developed in the early 70 s. In this framework, the inclusion of signal-signal multiplication within neural network models was presented as a necessity to provide matrix associative memories with adaptive, context-sensitive associations, while greatly enhancing their computational capabilities. In this review, we show that several of the most successful neural network models use a form of multiplication of signals. We present several classical models that included such kind of multiplication and the computational reasons for the inclusion. We then turn to the different proposals about the possible biophysical implementation that underlies these computational capacities. We pinpoint the important ideas put forth by different theoretical models using a tensor product representation and show that these models endow memories with the context-dependent adaptive capabilities necessary to allow for evolutionary adaptation to changing and unpredictable environments. Finally, we show how the powerful abilities of contemporary computationally deep-learning models, inspired in neural networks, also depend on multiplications, and discuss some perspectives in view of the wide panorama unfolded. The computational relevance of multiplications calls for the development of new avenues of research that uncover the mechanisms our nervous system uses to achieve multiplication.
Collapse
Affiliation(s)
- Juan C. Valle-Lisboa
- Group of Cognitive Systems Modeling, Biophysics and Systems Biology Section, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay
- Centro Interdisciplinario en Cognición para la Enseñanza y el Aprendizaje (CICEA), Universidad de la República, Espacio Interdisciplinario, 11200 Montevideo, Uruguay
| | - Andrés Pomi
- Group of Cognitive Systems Modeling, Biophysics and Systems Biology Section, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay
| | - Eduardo Mizraji
- Group of Cognitive Systems Modeling, Biophysics and Systems Biology Section, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay
| |
Collapse
|
63
|
Pavlick E. Symbols and grounding in large language models. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220041. [PMID: 37271171 PMCID: PMC10239679 DOI: 10.1098/rsta.2022.0041] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/20/2023] [Indexed: 06/06/2023]
Abstract
Large language models (LLMs) are one of the most impressive achievements of artificial intelligence in recent years. However, their relevance to the study of language more broadly remains unclear. This article considers the potential of LLMs to serve as models of language understanding in humans. While debate on this question typically centres around models' performance on challenging language understanding tasks, this article argues that the answer depends on models' underlying competence, and thus that the focus of the debate should be on empirical work which seeks to characterize the representations and processing algorithms that underlie model behaviour. From this perspective, the article offers counterarguments to two commonly cited reasons why LLMs cannot serve as plausible models of language in humans: their lack of symbolic structure and their lack of grounding. For each, a case is made that recent empirical trends undermine the common assumptions about LLMs, and thus that it is premature to draw conclusions about LLMs' ability (or lack thereof) to offer insights on human language representation and understanding. This article is part of a discussion meeting issue 'Cognitive artificial intelligence'.
Collapse
Affiliation(s)
- Ellie Pavlick
- Department of Computer Science, Brown University, Providence, RI, USA
| |
Collapse
|
64
|
Desbordes T, Lakretz Y, Chanoine V, Oquab M, Badier JM, Trébuchon A, Carron R, Bénar CG, Dehaene S, King JR. Dimensionality and Ramping: Signatures of Sentence Integration in the Dynamics of Brains and Deep Language Models. J Neurosci 2023; 43:5350-5364. [PMID: 37217308 PMCID: PMC10359032 DOI: 10.1523/jneurosci.1163-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 02/07/2023] [Accepted: 02/19/2023] [Indexed: 05/24/2023] Open
Abstract
A sentence is more than the sum of its words: its meaning depends on how they combine with one another. The brain mechanisms underlying such semantic composition remain poorly understood. To shed light on the neural vector code underlying semantic composition, we introduce two hypotheses: (1) the intrinsic dimensionality of the space of neural representations should increase as a sentence unfolds, paralleling the growing complexity of its semantic representation; and (2) this progressive integration should be reflected in ramping and sentence-final signals. To test these predictions, we designed a dataset of closely matched normal and jabberwocky sentences (composed of meaningless pseudo words) and displayed them to deep language models and to 11 human participants (5 men and 6 women) monitored with simultaneous MEG and intracranial EEG. In both deep language models and electrophysiological data, we found that representational dimensionality was higher for meaningful sentences than jabberwocky. Furthermore, multivariate decoding of normal versus jabberwocky confirmed three dynamic patterns: (1) a phasic pattern following each word, peaking in temporal and parietal areas; (2) a ramping pattern, characteristic of bilateral inferior and middle frontal gyri; and (3) a sentence-final pattern in left superior frontal gyrus and right orbitofrontal cortex. These results provide a first glimpse into the neural geometry of semantic integration and constrain the search for a neural code of linguistic composition.SIGNIFICANCE STATEMENT Starting from general linguistic concepts, we make two sets of predictions in neural signals evoked by reading multiword sentences. First, the intrinsic dimensionality of the representation should grow with additional meaningful words. Second, the neural dynamics should exhibit signatures of encoding, maintaining, and resolving semantic composition. We successfully validated these hypotheses in deep neural language models, artificial neural networks trained on text and performing very well on many natural language processing tasks. Then, using a unique combination of MEG and intracranial electrodes, we recorded high-resolution brain data from human participants while they read a controlled set of sentences. Time-resolved dimensionality analysis showed increasing dimensionality with meaning, and multivariate decoding allowed us to isolate the three dynamical patterns we had hypothesized.
Collapse
Affiliation(s)
- Théo Desbordes
- Meta AI Research, Paris 75002, France; and Cognitive Neuroimaging Unit NeuroSpin center, 91191, Gif-sur-Yvette, France
| | - Yair Lakretz
- Cognitive Neuroimaging Unit NeuroSpin center, Gif-sur-Yvette, 91191, France
| | - Valérie Chanoine
- Institute of Language, Communication and the Brain, Aix-en-Provence, 13100, France; and Aix-Marseille Université, Centre National de la Recherche Scientifique, LPL, Aix-en-Provence, 13100, France
| | | | - Jean-Michel Badier
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, CNRS, LPL, Aix-en-Provence 13100; and Inst Neurosci Syst, Marseille, 13005, France
| | - Agnès Trébuchon
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, CNRS, LPL, Aix-en-Provence 13100, France; and Inst Neurosci Syst, Marseille, 13005, France; and Assistance Publique Hopitaux de Marseille, Timone hospital, Epileptology and Cerebral Rythmology, Marseille, 13385, France
| | - Romain Carron
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, CNRS, LPL, Aix-en-Provence 13100, France; and Inst Neurosci Syst, Marseille, 13005, France; and Assistance Publique Hopitaux de Marseille, Timone hospital, Functional and Stereotactic Neurosurgery, Marseille, 13385, France
| | - Christian-G Bénar
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale, CNRS, LPL, Aix-en-Provence 13100, France; and Inst Neurosci Syst, Marseille, 13005, France
| | - Stanislas Dehaene
- Université Paris Saclay, Institut National de la Santé et de la Recherche Médicale, Commissariat à l'Energie Atomique, Cognitive Neuroimaging Unit, NeuroSpin center, Saclay, 91191, France; and Collège de France, PSL University, Paris, 75231, France
| | - Jean-Rémi King
- Meta AI Research, Paris 75002, France; and Cognitive Neuroimaging Unit NeuroSpin center, 91191, Gif-sur-Yvette, France
- LSP, École normale supérieure, PSL (Paris Sciences & Lettres) University, CNRS, 75005 Paris, France
| |
Collapse
|
65
|
Xia Y, Geng M, Chen Y, Sun S, Liao C, Zhu Z, Li Z, Ochieng WY, Angeloudis P, Elhajj M, Zhang L, Zeng Z, Zhang B, Gao Z, Chen X(M. Understanding common human driving semantics for autonomous vehicles. PATTERNS (NEW YORK, N.Y.) 2023; 4:100730. [PMID: 37521046 PMCID: PMC10382946 DOI: 10.1016/j.patter.2023.100730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/05/2022] [Accepted: 03/20/2023] [Indexed: 08/01/2023]
Abstract
Autonomous vehicles will share roads with human-driven vehicles until the transition to fully autonomous transport systems is complete. The critical challenge of improving mutual understanding between both vehicle types cannot be addressed only by feeding extensive driving data into data-driven models but by enabling autonomous vehicles to understand and apply common driving behaviors analogous to human drivers. Therefore, we designed and conducted two electroencephalography experiments for comparing the cerebral activities of human linguistics and driving understanding. The results showed that driving activates hierarchical neural functions in the auditory cortex, which is analogous to abstraction in linguistic understanding. Subsequently, we proposed a neural-informed, semantics-driven framework to understand common human driving behavior in a brain-inspired manner. This study highlights the pathway of fusing neuroscience into complex human behavior understanding tasks and provides a computational neural model to understand human driving behaviors, which will enable autonomous vehicles to perceive and think like human drivers.
Collapse
Affiliation(s)
- Yingji Xia
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Maosi Geng
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Polytechnic Institute & Institute of Intelligent Transportation Systems, Zhejiang University, Hangzhou 310015, China
| | - Yong Chen
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Sudan Sun
- School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Chenlei Liao
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
| | - Zheng Zhu
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Hangzhou 310027, China
- Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| | - Zhihui Li
- School of Transportation, Jilin University, Changchun 130022, China
| | - Washington Yotto Ochieng
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Panagiotis Angeloudis
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Mireille Elhajj
- Department of Civil and Environmental Engineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Lei Zhang
- Alibaba Group, Hangzhou 310052, China
| | | | | | - Ziyou Gao
- School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China
| | - Xiqun (Michael) Chen
- Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
- Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Hangzhou 310027, China
- Zhejiang University/University of Illinois Urbana-Champaign (ZJU-UIUC) Institute, Zhejiang University, Haining 314400, China
- Zhejiang Provincial Engineering Research Center for Intelligent Transportation, Hangzhou 310058, China
| |
Collapse
|
66
|
Yang E, Milisav F, Kopal J, Holmes AJ, Mitsis GD, Misic B, Finn ES, Bzdok D. The default network dominates neural responses to evolving movie stories. Nat Commun 2023; 14:4197. [PMID: 37452058 PMCID: PMC10349102 DOI: 10.1038/s41467-023-39862-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 06/27/2023] [Indexed: 07/18/2023] Open
Abstract
Neuroscientific studies exploring real-world dynamic perception often overlook the influence of continuous changes in narrative content. In our research, we utilize machine learning tools for natural language processing to examine the relationship between movie narratives and neural responses. By analyzing over 50,000 brain images of participants watching Forrest Gump from the studyforrest dataset, we find distinct brain states that capture unique semantic aspects of the unfolding story. The default network, associated with semantic information integration, is the most engaged during movie watching. Furthermore, we identify two mechanisms that underlie how the default network liaises with the amygdala and hippocampus. Our findings demonstrate effective approaches to understanding neural processes in everyday situations and their relation to conscious awareness.
Collapse
Affiliation(s)
- Enning Yang
- Department of Biomedical Engineering, TheNeuro-Montreal Neurological Institute (MNI), McConnell Brain Imaging Centre (BIC), McGill University, Montreal, QC, Canada
- Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada
| | - Filip Milisav
- Department of Biomedical Engineering, TheNeuro-Montreal Neurological Institute (MNI), McConnell Brain Imaging Centre (BIC), McGill University, Montreal, QC, Canada
| | - Jakub Kopal
- Department of Biomedical Engineering, TheNeuro-Montreal Neurological Institute (MNI), McConnell Brain Imaging Centre (BIC), McGill University, Montreal, QC, Canada
- Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada
| | - Avram J Holmes
- Department of Psychology and Psychiatry, Yale University, New Haven, CT, USA
| | - Georgios D Mitsis
- Department of Bioengineering, McGill University, Montreal, QC, Canada
| | - Bratislav Misic
- Department of Biomedical Engineering, TheNeuro-Montreal Neurological Institute (MNI), McConnell Brain Imaging Centre (BIC), McGill University, Montreal, QC, Canada
| | - Emily S Finn
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Danilo Bzdok
- Department of Biomedical Engineering, TheNeuro-Montreal Neurological Institute (MNI), McConnell Brain Imaging Centre (BIC), McGill University, Montreal, QC, Canada.
- Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada.
| |
Collapse
|
67
|
Stanojević M, Brennan JR, Dunagan D, Steedman M, Hale JT. Modeling Structure-Building in the Brain With CCG Parsing and Large Language Models. Cogn Sci 2023; 47:e13312. [PMID: 37417470 DOI: 10.1111/cogs.13312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 06/07/2023] [Accepted: 06/17/2023] [Indexed: 07/08/2023]
Abstract
To model behavioral and neural correlates of language comprehension in naturalistic environments, researchers have turned to broad-coverage tools from natural-language processing and machine learning. Where syntactic structure is explicitly modeled, prior work has relied predominantly on context-free grammars (CFGs), yet such formalisms are not sufficiently expressive for human languages. Combinatory categorial grammars (CCGs) are sufficiently expressive directly compositional models of grammar with flexible constituency that affords incremental interpretation. In this work, we evaluate whether a more expressive CCG provides a better model than a CFG for human neural signals collected with functional magnetic resonance imaging (fMRI) while participants listen to an audiobook story. We further test between variants of CCG that differ in how they handle optional adjuncts. These evaluations are carried out against a baseline that includes estimates of next-word predictability from a transformer neural network language model. Such a comparison reveals unique contributions of CCG structure-building predominantly in the left posterior temporal lobe: CCG-derived measures offer a superior fit to neural signals compared to those derived from a CFG. These effects are spatially distinct from bilateral superior temporal effects that are unique to predictability. Neural effects for structure-building are thus separable from predictability during naturalistic listening, and those effects are best characterized by a grammar whose expressive power is motivated on independent linguistic grounds.
Collapse
Affiliation(s)
| | | | | | | | - John T Hale
- Google DeepMind
- Department of Linguistics, University of Georgia
| |
Collapse
|
68
|
Zada Z, Goldstein A, Michelmann S, Simony E, Price A, Hasenfratz L, Barham E, Zadbood A, Doyle W, Friedman D, Dugan P, Melloni L, Devore S, Flinker A, Devinsky O, Nastase SA, Hasson U. A shared linguistic space for transmitting our thoughts from brain to brain in natural conversations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.27.546708. [PMID: 37425747 PMCID: PMC10327051 DOI: 10.1101/2023.06.27.546708] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Effective communication hinges on a mutual understanding of word meaning in different contexts. The embedding space learned by large language models can serve as an explicit model of the shared, context-rich meaning space humans use to communicate their thoughts. We recorded brain activity using electrocorticography during spontaneous, face-to-face conversations in five pairs of epilepsy patients. We demonstrate that the linguistic embedding space can capture the linguistic content of word-by-word neural alignment between speaker and listener. Linguistic content emerged in the speaker's brain before word articulation, and the same linguistic content rapidly reemerged in the listener's brain after word articulation. These findings establish a computational framework to study how human brains transmit their thoughts to one another in real-world contexts.
Collapse
Affiliation(s)
- Zaid Zada
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Ariel Goldstein
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
- Department of Cognitive and Brain Sciences and Business School, Hebrew University; Jerusalem, 9190501, Israel
| | - Sebastian Michelmann
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Erez Simony
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
- Faculty of Engineering, Holon Institute of Technology, Holon, 5810201, Israel
| | - Amy Price
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Liat Hasenfratz
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Emily Barham
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Asieh Zadbood
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
- Department of Psychology, Columbia University; New York, 10027, USA
| | - Werner Doyle
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Daniel Friedman
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Patricia Dugan
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Lucia Melloni
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Sasha Devore
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Adeen Flinker
- Grossman School of Medicine, New York University; New York, 10016, USA
- Tandon School of Engineering, New York University; New York, 10016, USA
| | - Orrin Devinsky
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Samuel A. Nastase
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Uri Hasson
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| |
Collapse
|
69
|
Thoret E, Ystad S, Kronland-Martinet R. Hearing as adaptive cascaded envelope interpolation. Commun Biol 2023; 6:671. [PMID: 37355702 PMCID: PMC10290642 DOI: 10.1038/s42003-023-05040-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/12/2023] [Indexed: 06/26/2023] Open
Abstract
The human auditory system is designed to capture and encode sounds from our surroundings and conspecifics. However, the precise mechanisms by which it adaptively extracts the most important spectro-temporal information from sounds are still not fully understood. Previous auditory models have explained sound encoding at the cochlear level using static filter banks, but this vision is incompatible with the nonlinear and adaptive properties of the auditory system. Here we propose an approach that considers the cochlear processes as envelope interpolations inspired by cochlear physiology. It unifies linear and nonlinear adaptive behaviors into a single comprehensive framework that provides a data-driven understanding of auditory coding. It allows simulating a broad range of psychophysical phenomena from virtual pitches and combination tones to consonance and dissonance of harmonic sounds. It further predicts the properties of the cochlear filters such as frequency selectivity. Here we propose a possible link between the parameters of the model and the density of hair cells on the basilar membrane. Cascaded Envelope Interpolation may lead to improvements in sound processing for hearing aids by providing a non-linear, data-driven, way to preprocessing of acoustic signals consistent with peripheral processes.
Collapse
Affiliation(s)
- Etienne Thoret
- Aix Marseille Univ, CNRS, UMR7061 PRISM, UMR7020 LIS, Marseille, France.
- Institute of Language, Communication, and the Brain (ILCB), Marseille, France.
| | - Sølvi Ystad
- CNRS, Aix Marseille Univ, UMR 7061 PRISM, Marseille, France
| | | |
Collapse
|
70
|
Margalit E, Lee H, Finzi D, DiCarlo JJ, Grill-Spector K, Yamins DLK. A Unifying Principle for the Functional Organization of Visual Cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541361. [PMID: 37292946 PMCID: PMC10245753 DOI: 10.1101/2023.05.18.541361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A key feature of many cortical systems is functional organization: the arrangement of neurons with specific functional properties in characteristic spatial patterns across the cortical surface. However, the principles underlying the emergence and utility of functional organization are poorly understood. Here we develop the Topographic Deep Artificial Neural Network (TDANN), the first unified model to accurately predict the functional organization of multiple cortical areas in the primate visual system. We analyze the key factors responsible for the TDANN's success and find that it strikes a balance between two specific objectives: achieving a task-general sensory representation that is self-supervised, and maximizing the smoothness of responses across the cortical sheet according to a metric that scales relative to cortical surface area. In turn, the representations learned by the TDANN are lower dimensional and more brain-like than those in models that lack a spatial smoothness constraint. Finally, we provide evidence that the TDANN's functional organization balances performance with inter-area connection length, and use the resulting models for a proof-of-principle optimization of cortical prosthetic design. Our results thus offer a unified principle for understanding functional organization and a novel view of the functional role of the visual system in particular.
Collapse
Affiliation(s)
- Eshed Margalit
- Neurosciences Graduate Program, Stanford University, Stanford, CA 94305
| | - Hyodong Lee
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Dawn Finzi
- Department of Psychology, Stanford University, Stanford, CA 94305
- Department of Computer Science, Stanford University, Stanford, CA 94305
| | - James J DiCarlo
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
- Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Kalanit Grill-Spector
- Department of Psychology, Stanford University, Stanford, CA 94305
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305
| | - Daniel L K Yamins
- Department of Psychology, Stanford University, Stanford, CA 94305
- Department of Computer Science, Stanford University, Stanford, CA 94305
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA 94305
| |
Collapse
|
71
|
Tang J, LeBel A, Jain S, Huth AG. Semantic reconstruction of continuous language from non-invasive brain recordings. Nat Neurosci 2023; 26:858-866. [PMID: 37127759 DOI: 10.1038/s41593-023-01304-9] [Citation(s) in RCA: 40] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 03/15/2023] [Indexed: 05/03/2023]
Abstract
A brain-computer interface that decodes continuous language from non-invasive recordings would have many scientific and practical applications. Currently, however, non-invasive language decoders can only identify stimuli from among a small set of words or phrases. Here we introduce a non-invasive decoder that reconstructs continuous language from cortical semantic representations recorded using functional magnetic resonance imaging (fMRI). Given novel brain recordings, this decoder generates intelligible word sequences that recover the meaning of perceived speech, imagined speech and even silent videos, demonstrating that a single decoder can be applied to a range of tasks. We tested the decoder across cortex and found that continuous language can be separately decoded from multiple regions. As brain-computer interfaces should respect mental privacy, we tested whether successful decoding requires subject cooperation and found that subject cooperation is required both to train and to apply the decoder. Our findings demonstrate the viability of non-invasive language brain-computer interfaces.
Collapse
Affiliation(s)
- Jerry Tang
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - Amanda LeBel
- Department of Neuroscience, The University of Texas at Austin, Austin, TX, USA
| | - Shailee Jain
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - Alexander G Huth
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA.
- Department of Neuroscience, The University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
72
|
Deniz F, Tseng C, Wehbe L, Dupré la Tour T, Gallant JL. Semantic Representations during Language Comprehension Are Affected by Context. J Neurosci 2023; 43:3144-3158. [PMID: 36973013 PMCID: PMC10146529 DOI: 10.1523/jneurosci.2459-21.2023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 02/17/2023] [Accepted: 02/26/2023] [Indexed: 03/29/2023] Open
Abstract
The meaning of words in natural language depends crucially on context. However, most neuroimaging studies of word meaning use isolated words and isolated sentences with little context. Because the brain may process natural language differently from how it processes simplified stimuli, there is a pressing need to determine whether prior results on word meaning generalize to natural language. fMRI was used to record human brain activity while four subjects (two female) read words in four conditions that vary in context: narratives, isolated sentences, blocks of semantically similar words, and isolated words. We then compared the signal-to-noise ratio (SNR) of evoked brain responses, and we used a voxelwise encoding modeling approach to compare the representation of semantic information across the four conditions. We find four consistent effects of varying context. First, stimuli with more context evoke brain responses with higher SNR across bilateral visual, temporal, parietal, and prefrontal cortices compared with stimuli with little context. Second, increasing context increases the representation of semantic information across bilateral temporal, parietal, and prefrontal cortices at the group level. In individual subjects, only natural language stimuli consistently evoke widespread representation of semantic information. Third, context affects voxel semantic tuning. Finally, models estimated using stimuli with little context do not generalize well to natural language. These results show that context has large effects on the quality of neuroimaging data and on the representation of meaning in the brain. Thus, neuroimaging studies that use stimuli with little context may not generalize well to the natural regime.SIGNIFICANCE STATEMENT Context is an important part of understanding the meaning of natural language, but most neuroimaging studies of meaning use isolated words and isolated sentences with little context. Here, we examined whether the results of neuroimaging studies that use out-of-context stimuli generalize to natural language. We find that increasing context improves the quality of neuro-imaging data and changes where and how semantic information is represented in the brain. These results suggest that findings from studies using out-of-context stimuli may not generalize to natural language used in daily life.
Collapse
Affiliation(s)
- Fatma Deniz
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
- Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Berlin 10623, Germany
| | - Christine Tseng
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
| | - Leila Wehbe
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Tom Dupré la Tour
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
| | - Jack L Gallant
- Helen Wills Neuroscience Institute, University of California, Berkeley, California 94720
- Department of Psychology, University of California, Berkeley, California 94720
| |
Collapse
|
73
|
Beguš G, Zhou A, Zhao TC. Encoding of speech in convolutional layers and the brain stem based on language experience. Sci Rep 2023; 13:6480. [PMID: 37081119 PMCID: PMC10119295 DOI: 10.1038/s41598-023-33384-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 04/12/2023] [Indexed: 04/22/2023] Open
Abstract
Comparing artificial neural networks with outputs of neuroimaging techniques has recently seen substantial advances in (computer) vision and text-based language models. Here, we propose a framework to compare biological and artificial neural computations of spoken language representations and propose several new challenges to this paradigm. The proposed technique is based on a similar principle that underlies electroencephalography (EEG): averaging of neural (artificial or biological) activity across neurons in the time domain, and allows to compare encoding of any acoustic property in the brain and in intermediate convolutional layers of an artificial neural network. Our approach allows a direct comparison of responses to a phonetic property in the brain and in deep neural networks that requires no linear transformations between the signals. We argue that the brain stem response (cABR) and the response in intermediate convolutional layers to the exact same stimulus are highly similar without applying any transformations, and we quantify this observation. The proposed technique not only reveals similarities, but also allows for analysis of the encoding of actual acoustic properties in the two signals: we compare peak latency (i) in cABR relative to the stimulus in the brain stem and in (ii) intermediate convolutional layers relative to the input/output in deep convolutional networks. We also examine and compare the effect of prior language exposure on the peak latency in cABR and in intermediate convolutional layers. Substantial similarities in peak latency encoding between the human brain and intermediate convolutional networks emerge based on results from eight trained networks (including a replication experiment). The proposed technique can be used to compare encoding between the human brain and intermediate convolutional layers for any acoustic property and for other neuroimaging techniques.
Collapse
Affiliation(s)
- Gašper Beguš
- Department of Linguistics, University of California, Berkeley, USA.
| | - Alan Zhou
- Department of Cognitive Science, Johns Hopkins University, Baltimore, USA
| | - T Christina Zhao
- Institute for Learning and Brain Sciences, University of Washington, Seattle, USA
- Department of Speech and Hearing Sciences, University of Washington, Seattle, USA
| |
Collapse
|
74
|
Tomov MS, Tsividis PA, Pouncy T, Tenenbaum JB, Gershman SJ. The neural architecture of theory-based reinforcement learning. Neuron 2023; 111:1331-1344.e8. [PMID: 36898374 PMCID: PMC10200004 DOI: 10.1016/j.neuron.2023.01.023] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/06/2022] [Accepted: 01/27/2023] [Indexed: 03/11/2023]
Abstract
Humans learn internal models of the world that support planning and generalization in complex environments. Yet it remains unclear how such internal models are represented and learned in the brain. We approach this question using theory-based reinforcement learning, a strong form of model-based reinforcement learning in which the model is a kind of intuitive theory. We analyzed fMRI data from human participants learning to play Atari-style games. We found evidence of theory representations in prefrontal cortex and of theory updating in prefrontal cortex, occipital cortex, and fusiform gyrus. Theory updates coincided with transient strengthening of theory representations. Effective connectivity during theory updating suggests that information flows from prefrontal theory-coding regions to posterior theory-updating regions. Together, our results are consistent with a neural architecture in which top-down theory representations originating in prefrontal regions shape sensory predictions in visual areas, where factored theory prediction errors are computed and trigger bottom-up updates of the theory.
Collapse
Affiliation(s)
- Momchil S Tomov
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Motional AD, Inc., Boston, MA 02210, USA.
| | - Pedro A Tsividis
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Thomas Pouncy
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA
| | - Joshua B Tenenbaum
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, MA 02138, USA; Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
75
|
Nakai T, Nishimoto S. Artificial neural network modelling of the neural population code underlying mathematical operations. Neuroimage 2023; 270:119980. [PMID: 36848969 DOI: 10.1016/j.neuroimage.2023.119980] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 02/10/2023] [Accepted: 02/23/2023] [Indexed: 02/28/2023] Open
Abstract
Mathematical operations have long been regarded as a sparse, symbolic process in neuroimaging studies. In contrast, advances in artificial neural networks (ANN) have enabled extracting distributed representations of mathematical operations. Recent neuroimaging studies have compared distributed representations of the visual, auditory and language domains in ANNs and biological neural networks (BNNs). However, such a relationship has not yet been examined in mathematics. Here we hypothesise that ANN-based distributed representations can explain brain activity patterns of symbolic mathematical operations. We used the fMRI data of a series of mathematical problems with nine different combinations of operators to construct voxel-wise encoding/decoding models using both sparse operator and latent ANN features. Representational similarity analysis demonstrated shared representations between ANN and BNN, an effect particularly evident in the intraparietal sulcus. Feature-brain similarity (FBS) analysis served to reconstruct a sparse representation of mathematical operations based on distributed ANN features in each cortical voxel. Such reconstruction was more efficient when using features from deeper ANN layers. Moreover, latent ANN features allowed the decoding of novel operators not used during model training from brain activity. The current study provides novel insights into the neural code underlying mathematical thought.
Collapse
Affiliation(s)
- Tomoya Nakai
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan; Lyon Neuroscience Research Center (CRNL), INSERM U1028 - CNRS UMR5292, University of Lyon, Bron, France.
| | - Shinji Nishimoto
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan; Graduate School of Frontier Biosciences, Osaka University, Suita, Japan; Graduate School of Medicine, Osaka University, Suita, Japan
| |
Collapse
|
76
|
Hu J, Small H, Kean H, Takahashi A, Zekelman L, Kleinman D, Ryan E, Nieto-Castañón A, Ferreira V, Fedorenko E. Precision fMRI reveals that the language-selective network supports both phrase-structure building and lexical access during language production. Cereb Cortex 2023; 33:4384-4404. [PMID: 36130104 PMCID: PMC10110436 DOI: 10.1093/cercor/bhac350] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 08/01/2022] [Accepted: 08/02/2022] [Indexed: 11/13/2022] Open
Abstract
A fronto-temporal brain network has long been implicated in language comprehension. However, this network's role in language production remains debated. In particular, it remains unclear whether all or only some language regions contribute to production, and which aspects of production these regions support. Across 3 functional magnetic resonance imaging experiments that rely on robust individual-subject analyses, we characterize the language network's response to high-level production demands. We report 3 novel results. First, sentence production, spoken or typed, elicits a strong response throughout the language network. Second, the language network responds to both phrase-structure building and lexical access demands, although the response to phrase-structure building is stronger and more spatially extensive, present in every language region. Finally, contra some proposals, we find no evidence of brain regions-within or outside the language network-that selectively support phrase-structure building in production relative to comprehension. Instead, all language regions respond more strongly during production than comprehension, suggesting that production incurs a greater cost for the language network. Together, these results align with the idea that language comprehension and production draw on the same knowledge representations, which are stored in a distributed manner within the language-selective network and are used to both interpret and generate linguistic utterances.
Collapse
Affiliation(s)
- Jennifer Hu
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
| | - Hannah Small
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Hope Kean
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Atsushi Takahashi
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
| | - Leo Zekelman
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, United States
| | | | - Elizabeth Ryan
- St. George’s Medical School, St. George’s University, Grenada, West Indies
| | - Alfonso Nieto-Castañón
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215, United States
| | - Victor Ferreira
- Department of Psychology, UCSD, La Jolla, CA 92093, United States
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, United States
- McGovern Institute for Brain Research, MIT, Cambridge, MA 02139, United States
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, United States
| |
Collapse
|
77
|
Su Y, MacGregor LJ, Olasagasti I, Giraud AL. A deep hierarchy of predictions enables online meaning extraction in a computational model of human speech comprehension. PLoS Biol 2023; 21:e3002046. [PMID: 36947552 PMCID: PMC10079236 DOI: 10.1371/journal.pbio.3002046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 04/06/2023] [Accepted: 02/22/2023] [Indexed: 03/23/2023] Open
Abstract
Understanding speech requires mapping fleeting and often ambiguous soundwaves to meaning. While humans are known to exploit their capacity to contextualize to facilitate this process, how internal knowledge is deployed online remains an open question. Here, we present a model that extracts multiple levels of information from continuous speech online. The model applies linguistic and nonlinguistic knowledge to speech processing, by periodically generating top-down predictions and incorporating bottom-up incoming evidence in a nested temporal hierarchy. We show that a nonlinguistic context level provides semantic predictions informed by sensory inputs, which are crucial for disambiguating among multiple meanings of the same word. The explicit knowledge hierarchy of the model enables a more holistic account of the neurophysiological responses to speech compared to using lexical predictions generated by a neural network language model (GPT-2). We also show that hierarchical predictions reduce peripheral processing via minimizing uncertainty and prediction error. With this proof-of-concept model, we demonstrate that the deployment of hierarchical predictions is a possible strategy for the brain to dynamically utilize structured knowledge and make sense of the speech input.
Collapse
Affiliation(s)
- Yaqing Su
- Department of Fundamental Neuroscience, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Swiss National Centre of Competence in Research "Evolving Language" (NCCR EvolvingLanguage), Geneva, Switzerland
| | - Lucy J MacGregor
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Itsaso Olasagasti
- Department of Fundamental Neuroscience, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Swiss National Centre of Competence in Research "Evolving Language" (NCCR EvolvingLanguage), Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Fundamental Neuroscience, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Swiss National Centre of Competence in Research "Evolving Language" (NCCR EvolvingLanguage), Geneva, Switzerland
- Institut Pasteur, Université Paris Cité, Inserm, Institut de l'Audition, Paris, France
| |
Collapse
|
78
|
Semantic surprise predicts the N400 brain potential. NEUROIMAGE: REPORTS 2023. [DOI: 10.1016/j.ynirp.2023.100161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
|
79
|
Caucheteux C, Gramfort A, King JR. Evidence of a predictive coding hierarchy in the human brain listening to speech. Nat Hum Behav 2023; 7:430-441. [PMID: 36864133 PMCID: PMC10038805 DOI: 10.1038/s41562-022-01516-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 12/15/2022] [Indexed: 03/04/2023]
Abstract
Considerable progress has recently been made in natural language processing: deep learning algorithms are increasingly able to generate, summarize, translate and classify texts. Yet, these language models still fail to match the language abilities of humans. Predictive coding theory offers a tentative explanation to this discrepancy: while language models are optimized to predict nearby words, the human brain would continuously predict a hierarchy of representations that spans multiple timescales. To test this hypothesis, we analysed the functional magnetic resonance imaging brain signals of 304 participants listening to short stories. First, we confirmed that the activations of modern language models linearly map onto the brain responses to speech. Second, we showed that enhancing these algorithms with predictions that span multiple timescales improves this brain mapping. Finally, we showed that these predictions are organized hierarchically: frontoparietal cortices predict higher-level, longer-range and more contextual representations than temporal cortices. Overall, these results strengthen the role of hierarchical predictive coding in language processing and illustrate how the synergy between neuroscience and artificial intelligence can unravel the computational bases of human cognition.
Collapse
Affiliation(s)
- Charlotte Caucheteux
- Meta AI, Paris, France.
- Université Paris-Saclay, Inria, Commissariat à l'Énergie Atomique et aux Énergies Alternatives, Paris, France.
| | - Alexandre Gramfort
- Meta AI, Paris, France
- Université Paris-Saclay, Inria, Commissariat à l'Énergie Atomique et aux Énergies Alternatives, Paris, France
| | - Jean-Rémi King
- Meta AI, Paris, France.
- Laboratoire des systèmes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, Paris, France.
| |
Collapse
|
80
|
Setti F, Handjaras G, Bottari D, Leo A, Diano M, Bruno V, Tinti C, Cecchetti L, Garbarini F, Pietrini P, Ricciardi E. A modality-independent proto-organization of human multisensory areas. Nat Hum Behav 2023; 7:397-410. [PMID: 36646839 PMCID: PMC10038796 DOI: 10.1038/s41562-022-01507-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 12/05/2022] [Indexed: 01/18/2023]
Abstract
The processing of multisensory information is based upon the capacity of brain regions, such as the superior temporal cortex, to combine information across modalities. However, it is still unclear whether the representation of coherent auditory and visual events requires any prior audiovisual experience to develop and function. Here we measured brain synchronization during the presentation of an audiovisual, audio-only or video-only version of the same narrative in distinct groups of sensory-deprived (congenitally blind and deaf) and typically developed individuals. Intersubject correlation analysis revealed that the superior temporal cortex was synchronized across auditory and visual conditions, even in sensory-deprived individuals who lack any audiovisual experience. This synchronization was primarily mediated by low-level perceptual features, and relied on a similar modality-independent topographical organization of slow temporal dynamics. The human superior temporal cortex is naturally endowed with a functional scaffolding to yield a common representation across multisensory events.
Collapse
Affiliation(s)
- Francesca Setti
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Davide Bottari
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Andrea Leo
- Department of Translational Research and Advanced Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Matteo Diano
- Department of Psychology, University of Turin, Turin, Italy
| | - Valentina Bruno
- Manibus Lab, Department of Psychology, University of Turin, Turin, Italy
| | - Carla Tinti
- Department of Psychology, University of Turin, Turin, Italy
| | - Luca Cecchetti
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Pietro Pietrini
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | |
Collapse
|
81
|
Kanwisher N, Khosla M, Dobs K. Using artificial neural networks to ask 'why' questions of minds and brains. Trends Neurosci 2023; 46:240-254. [PMID: 36658072 DOI: 10.1016/j.tins.2022.12.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/29/2022] [Accepted: 12/22/2022] [Indexed: 01/19/2023]
Abstract
Neuroscientists have long characterized the properties and functions of the nervous system, and are increasingly succeeding in answering how brains perform the tasks they do. But the question 'why' brains work the way they do is asked less often. The new ability to optimize artificial neural networks (ANNs) for performance on human-like tasks now enables us to approach these 'why' questions by asking when the properties of networks optimized for a given task mirror the behavioral and neural characteristics of humans performing the same task. Here we highlight the recent success of this strategy in explaining why the visual and auditory systems work the way they do, at both behavioral and neural levels.
Collapse
Affiliation(s)
- Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Meenakshi Khosla
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Katharina Dobs
- Department of Psychology, Justus Liebig University Giessen, Giessen, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University, Giessen, Germany.
| |
Collapse
|
82
|
Pérez A, Davis MH. Speaking and listening to inter-brain relationships. Cortex 2023; 159:54-63. [PMID: 36608420 DOI: 10.1016/j.cortex.2022.12.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 10/11/2022] [Accepted: 12/06/2022] [Indexed: 12/23/2022]
Abstract
Studies of inter-brain relationships thrive, and yet many reservations regarding their scope and interpretation of these phenomena have been raised by the scientific community. It is thus essential to establish common ground on methodological and conceptual definitions related to this topic and to open debate about any remaining points of uncertainty. We here offer insights to improve the conceptual clarity and empirical standards offered by social neuroscience studies of inter-personal interaction using hyperscanning with a particular focus on verbal communication.
Collapse
Affiliation(s)
- Alejandro Pérez
- MRC Cognition and Brain Sciences Unit, University of Cambridge, UK.
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, UK
| |
Collapse
|
83
|
Momennejad I. A rubric for human-like agents and NeuroAI. Philos Trans R Soc Lond B Biol Sci 2023; 378:20210446. [PMID: 36511409 PMCID: PMC9745874 DOI: 10.1098/rstb.2021.0446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 10/27/2022] [Indexed: 12/15/2022] Open
Abstract
Researchers across cognitive, neuro- and computer sciences increasingly reference 'human-like' artificial intelligence and 'neuroAI'. However, the scope and use of the terms are often inconsistent. Contributed research ranges widely from mimicking behaviour, to testing machine learning methods as neurally plausible hypotheses at the cellular or functional levels, or solving engineering problems. However, it cannot be assumed nor expected that progress on one of these three goals will automatically translate to progress in others. Here, a simple rubric is proposed to clarify the scope of individual contributions, grounded in their commitments to human-like behaviour, neural plausibility or benchmark/engineering/computer science goals. This is clarified using examples of weak and strong neuroAI and human-like agents, and discussing the generative, corroborate and corrective ways in which the three dimensions interact with one another. The author maintains that future progress in artificial intelligence will need strong interactions across the disciplines, with iterative feedback loops and meticulous validity tests-leading to both known and yet-unknown advances that may span decades to come. This article is part of a discussion meeting issue 'New approaches to 3D vision'.
Collapse
Affiliation(s)
- Ida Momennejad
- Microsoft Research NYC, Reinforcement Learning Station, 300 Lafayette, New York, NY 10012, USA
| |
Collapse
|
84
|
Mind the gap: challenges of deep learning approaches to Theory of Mind. Artif Intell Rev 2023. [DOI: 10.1007/s10462-023-10401-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
85
|
Suomala J, Kauttonen J. Computational meaningfulness as the source of beneficial cognitive biases. Front Psychol 2023; 14:1189704. [PMID: 37205079 PMCID: PMC10187636 DOI: 10.3389/fpsyg.2023.1189704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 04/05/2023] [Indexed: 05/21/2023] Open
Abstract
The human brain has evolved to solve the problems it encounters in multiple environments. In solving these challenges, it forms mental simulations about multidimensional information about the world. These processes produce context-dependent behaviors. The brain as overparameterized modeling organ is an evolutionary solution for producing behavior in a complex world. One of the most essential characteristics of living creatures is that they compute the values of information they receive from external and internal contexts. As a result of this computation, the creature can behave in optimal ways in each environment. Whereas most other living creatures compute almost exclusively biological values (e.g., how to get food), the human as a cultural creature computes meaningfulness from the perspective of one's activity. The computational meaningfulness means the process of the human brain, with the help of which an individual tries to make the respective situation comprehensible to herself to know how to behave optimally. This paper challenges the bias-centric approach of behavioral economics by exploring different possibilities opened up by computational meaningfulness with insight into wider perspectives. We concentrate on confirmation bias and framing effect as behavioral economics examples of cognitive biases. We conclude that from the computational meaningfulness perspective of the brain, the use of these biases are indispensable property of an optimally designed computational system of what the human brain is like. From this perspective, cognitive biases can be rational under some conditions. Whereas the bias-centric approach relies on small-scale interpretable models which include only a few explanatory variables, the computational meaningfulness perspective emphasizes the behavioral models, which allow multiple variables in these models. People are used to working in multidimensional and varying environments. The human brain is at its best in such an environment and scientific study should increasingly take place in such situations simulating the real environment. By using naturalistic stimuli (e.g., videos and VR) we can create more realistic, life-like contexts for research purposes and analyze resulting data using machine learning algorithms. In this manner, we can better explain, understand and predict human behavior and choice in different contexts.
Collapse
Affiliation(s)
- Jyrki Suomala
- Department of NeuroLab, Laurea University of Applied Sciences, Vantaa, Finland
- *Correspondence: Jyrki Suomala,
| | - Janne Kauttonen
- Competences, RDI and Digitalization, Haaga-Helia University of Applied Sciences, Helsinki, Finland
| |
Collapse
|
86
|
Interpretability of artificial neural network models in artificial intelligence versus neuroscience. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00592-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
87
|
Chen Y, Wei Z, Gou H, Liu H, Gao L, He X, Zhang X. How far is brain-inspired artificial intelligence away from brain? Front Neurosci 2022; 16:1096737. [PMID: 36570836 PMCID: PMC9783913 DOI: 10.3389/fnins.2022.1096737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022] Open
Abstract
Fueled by the development of neuroscience and artificial intelligence (AI), recent advances in the brain-inspired AI have manifested a tipping-point in the collaboration of the two fields. AI began with the inspiration of neuroscience, but has evolved to achieve a remarkable performance with little dependence upon neuroscience. However, in a recent collaboration, research into neurobiological explainability of AI models found that these highly accurate models may resemble the neurobiological representation of the same computational processes in the brain, although these models have been developed in the absence of such neuroscientific references. In this perspective, we review the cooperation and separation between neuroscience and AI, and emphasize on the current advance, that is, a new cooperation, the neurobiological explainability of AI. Under the intertwined development of the two fields, we propose a practical framework to evaluate the brain-likeness of AI models, paving the way for their further improvements.
Collapse
Affiliation(s)
- Yucan Chen
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Radiology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science & Technology of China, Hefei, China
| | - Zhengde Wei
- Department of Psychology, School of Humanities and Social Sciences, University of Science and Technology of China, Hefei, Anhui, China
| | - Huixing Gou
- Division of Life Sciences and Medicine, School of Life Sciences, University of Science and Technology of China, Hefei, Anhui, China
| | - Haiyi Liu
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, China
| | - Li Gao
- SILC Business School, Shanghai University, Shanghai, China,*Correspondence: Li Gao,
| | - Xiaosong He
- Department of Psychology, School of Humanities and Social Sciences, University of Science and Technology of China, Hefei, Anhui, China,Xiaosong He,
| | - Xiaochu Zhang
- Hefei National Research Center for Physical Sciences at the Microscale, and Department of Radiology, the First Affiliated Hospital of USTC, Division of Life Science and Medicine, University of Science & Technology of China, Hefei, China,Department of Psychology, School of Humanities and Social Sciences, University of Science and Technology of China, Hefei, Anhui, China,Application Technology Center of Physical Therapy to Brain Disorders, Institute of Advanced Technology, University of Science and Technology of China, Hefei, China,Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, China,Xiaochu Zhang,
| |
Collapse
|
88
|
Blasi DE, Henrich J, Adamou E, Kemmerer D, Majid A. Over-reliance on English hinders cognitive science. Trends Cogn Sci 2022; 26:1153-1170. [PMID: 36253221 DOI: 10.1016/j.tics.2022.09.015] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 09/19/2022] [Accepted: 09/22/2022] [Indexed: 11/05/2022]
Abstract
English is the dominant language in the study of human cognition and behavior: the individuals studied by cognitive scientists, as well as most of the scientists themselves, are frequently English speakers. However, English differs from other languages in ways that have consequences for the whole of the cognitive sciences, reaching far beyond the study of language itself. Here, we review an emerging body of evidence that highlights how the particular characteristics of English and the linguistic habits of English speakers bias the field by both warping research programs (e.g., overemphasizing features and mechanisms present in English over others) and overgeneralizing observations from English speakers' behaviors, brains, and cognition to our entire species. We propose mitigating strategies that could help avoid some of these pitfalls.
Collapse
Affiliation(s)
- Damián E Blasi
- Department of Human Evolutionary Biology, Harvard University, 11 Divinity Street, 02138 Cambridge, MA, USA; Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, Deutscher Pl. 6, 04103 Leipzig, Germany; Human Relations Area Files, 755 Prospect Street, New Haven, CT 06511-1225, USA.
| | - Joseph Henrich
- Department of Human Evolutionary Biology, Harvard University, 11 Divinity Street, 02138 Cambridge, MA, USA
| | - Evangelia Adamou
- Languages and Cultures of Oral Tradition lab, National Center for Scientific Research (CNRS), 7 Rue Guy Môquet, 94801 Villejuif, France
| | - David Kemmerer
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907, USA; Department of Psychological Sciences, Purdue University, 703 3rd Street, West Lafayette, IN 47907, USA
| | - Asifa Majid
- Department of Experimental Psychology, University of Oxford, Woodstock Road, Oxford OX2 6GG, UK.
| |
Collapse
|
89
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
90
|
Toneva M, Mitchell TM, Wehbe L. Combining computational controls with natural text reveals aspects of meaning composition. NATURE COMPUTATIONAL SCIENCE 2022; 2:745-757. [PMID: 36777107 PMCID: PMC9912822 DOI: 10.1038/s43588-022-00354-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
To study a core component of human intelligence-our ability to combine the meaning of words-neuroscientists have looked to linguistics. However, linguistic theories are insufficient to account for all brain responses reflecting linguistic composition. In contrast, we adopt a data-driven approach to study the composed meaning of words beyond their individual meaning, which we term 'supra-word meaning'. We construct a computational representation for supra-word meaning and study its brain basis through brain recordings from two complementary imaging modalities. Using functional magnetic resonance imaging, we reveal that hubs that are thought to process lexical meaning also maintain supra-word meaning, suggesting a common substrate for lexical and combinatorial semantics. Surprisingly, we cannot detect supra-word meaning in magnetoencephalography, which suggests that composed meaning might be maintained through a different neural mechanism than the synchronized firing of pyramidal cells. This sensitivity difference has implications for past neuroimaging results and future wearable neurotechnology.
Collapse
Affiliation(s)
- Mariya Toneva
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.,Max Planck Institute for Software Systems, Saarbrücken, Germany
| | - Tom M. Mitchell
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Leila Wehbe
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.,Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA.,Correspondence and requests for materials should be addressed to Leila Wehbe.
| |
Collapse
|
91
|
Rorot W. Counting with Cilia: The Role of Morphological Computation in Basal Cognition Research. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1581. [PMID: 36359671 PMCID: PMC9689127 DOI: 10.3390/e24111581] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/15/2022] [Accepted: 10/26/2022] [Indexed: 06/16/2023]
Abstract
"Morphological computation" is an increasingly important concept in robotics, artificial intelligence, and philosophy of the mind. It is used to understand how the body contributes to cognition and control of behavior. Its understanding in terms of "offloading" computation from the brain to the body has been criticized as misleading, and it has been suggested that the use of the concept conflates three classes of distinct processes. In fact, these criticisms implicitly hang on accepting a semantic definition of what constitutes computation. Here, I argue that an alternative, mechanistic view on computation offers a significantly different understanding of what morphological computation is. These theoretical considerations are then used to analyze the existing research program in developmental biology, which understands morphogenesis, the process of development of shape in biological systems, as a computational process. This important line of research shows that cognition and intelligence can be found across all scales of life, as the proponents of the basal cognition research program propose. Hence, clarifying the connection between morphological computation and morphogenesis allows for strengthening the role of the former concept in this emerging research field.
Collapse
Affiliation(s)
- Wiktor Rorot
- Human Interactivity and Language Lab, Faculty of Psychology, University of Warsaw, 00-927 Warszawa, Poland
| |
Collapse
|
92
|
Explaining neural activity in human listeners with deep learning via natural language processing of narrative text. Sci Rep 2022; 12:17838. [PMID: 36284195 PMCID: PMC9596412 DOI: 10.1038/s41598-022-21782-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 10/04/2022] [Indexed: 01/20/2023] Open
Abstract
Deep learning (DL) approaches may also inform the analysis of human brain activity. Here, a state-of-art DL tool for natural language processing, the Generative Pre-trained Transformer version 2 (GPT-2), is shown to generate meaningful neural encodings in functional MRI during narrative listening. Linguistic features of word unpredictability (surprisal) and contextual importance (saliency) were derived from the GPT-2 applied to the text of a 12-min narrative. Segments of variable duration (from 15 to 90 s) defined the context for the next word, resulting in different sets of neural predictors for functional MRI signals recorded in 27 healthy listeners of the narrative. GPT-2 surprisal, estimating word prediction errors from the artificial network, significantly explained the neural data in superior and middle temporal gyri (bilaterally), in anterior and posterior cingulate cortices, and in the left prefrontal cortex. GPT-2 saliency, weighing the importance of context words, significantly explained the neural data for longer segments in left superior and middle temporal gyri. These results add novel support to the use of DL tools in the search for neural encodings in functional MRI. A DL language model like the GPT-2 may feature useful data about neural processes subserving language comprehension in humans, including next-word context-related prediction.
Collapse
|
93
|
Betz G, Richardson K. Judgment aggregation, discursive dilemma and reflective equilibrium: Neural language models as self-improving doxastic agents. Front Artif Intell 2022; 5:900943. [PMID: 36329681 PMCID: PMC9623417 DOI: 10.3389/frai.2022.900943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 09/15/2022] [Indexed: 12/03/2022] Open
Abstract
Neural language models (NLMs) are susceptible to producing inconsistent output. This paper proposes a new diagnosis as well as a novel remedy for NLMs' incoherence. We train NLMs on synthetic text corpora that are created by simulating text production in a society. For diagnostic purposes, we explicitly model the individual belief systems of artificial agents (authors) who produce corpus texts. NLMs, trained on those texts, can be shown to aggregate the judgments of individual authors during pre-training according to sentence-wise vote ratios (roughly, reporting frequencies), which inevitably leads to so-called discursive dilemmas: aggregate judgments are inconsistent even though all individual belief states are consistent. As a remedy for such inconsistencies, we develop a self-training procedure-inspired by the concept of reflective equilibrium-that effectively reduces the extent of logical incoherence in a model's belief system, corrects global mis-confidence, and eventually allows the model to settle on a new, epistemically superior belief state. Thus, social choice theory helps to understand why NLMs are prone to produce inconsistencies; epistemology suggests how to get rid of them.
Collapse
Affiliation(s)
- Gregor Betz
- Karlsruhe Institute of Technology, Department of Philosophy, Karlsruhe, Germany,*Correspondence: Gregor Betz
| | - Kyle Richardson
- Allen Institute for Artificial Intelligence, Aristo, Seattle, WA, United States
| |
Collapse
|
94
|
Wang S, Zhang X, Zhang J, Zong C. A synchronized multimodal neuroimaging dataset for studying brain language processing. Sci Data 2022; 9:590. [PMID: 36180444 PMCID: PMC9525723 DOI: 10.1038/s41597-022-01708-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 08/22/2022] [Indexed: 11/15/2022] Open
Abstract
We present a synchronized multimodal neuroimaging dataset for studying brain language processing (SMN4Lang) that contains functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) data on the same 12 healthy volunteers while the volunteers listened to 6 hours of naturalistic stories, as well as high-resolution structural (T1, T2), diffusion MRI and resting-state fMRI data for each participant. We also provide rich linguistic annotations for the stimuli, including word frequencies, syntactic tree structures, time-aligned characters and words, and various types of word and character embeddings. Quality assessment indicators verify that this is a high-quality neuroimaging dataset. Such synchronized data is separately collected by the same group of participants first listening to story materials in fMRI and then in MEG which are well suited to studying the dynamic processing of language comprehension, such as the time and location of different linguistic features encoded in the brain. In addition, this dataset, comprising a large vocabulary from stories with various topics, can serve as a brain benchmark to evaluate and improve computational language models. Measurement(s) | functional brain measurement • Magnetoencephalography | Technology Type(s) | Functional Magnetic Resonance Imaging • Magnetoencephalography | Factor Type(s) | naturalistic stimuli listening | Sample Characteristic - Organism | humanbeings |
Collapse
Affiliation(s)
- Shaonan Wang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China. .,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
| | - Xiaohan Zhang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Jiajun Zhang
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Chengqing Zong
- National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China.,School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
95
|
Caucheteux C, Gramfort A, King JR. Deep language algorithms predict semantic comprehension from brain activity. Sci Rep 2022; 12:16327. [PMID: 36175483 PMCID: PMC9522791 DOI: 10.1038/s41598-022-20460-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 09/13/2022] [Indexed: 11/09/2022] Open
Abstract
Deep language algorithms, like GPT-2, have demonstrated remarkable abilities to process text, and now constitute the backbone of automatic translation, summarization and dialogue. However, whether these models encode information that relates to human comprehension still remains controversial. Here, we show that the representations of GPT-2 not only map onto the brain responses to spoken stories, but they also predict the extent to which subjects understand the corresponding narratives. To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear mapping model to predict brain activity from GPT-2's activations. Finally, we show that this mapping reliably correlates ([Formula: see text]) with subjects' comprehension scores as assessed for each story. This effect peaks in the angular, medial temporal and supra-marginal gyri, and is best accounted for by the long-distance dependencies generated in the deep layers of GPT-2. Overall, this study shows how deep language models help clarify the brain computations underlying language comprehension.
Collapse
Affiliation(s)
- Charlotte Caucheteux
- Meta AI Research, Paris, France.
- Université Paris-Saclay, Inria, CEA, Palaiseau, France.
| | | | - Jean-Rémi King
- Meta AI Research, Paris, France
- École normale supérieure, PSL University, CNRS, Paris, France
| |
Collapse
|
96
|
Shain C, Blank IA, Fedorenko E, Gibson E, Schuler W. Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex. J Neurosci 2022; 42:7412-7430. [PMID: 36002263 PMCID: PMC9525168 DOI: 10.1523/jneurosci.1894-21.2022] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 07/06/2022] [Accepted: 07/11/2022] [Indexed: 11/21/2022] Open
Abstract
To understand language, we must infer structured meanings from real-time auditory or visual signals. Researchers have long focused on word-by-word structure building in working memory as a mechanism that might enable this feat. However, some have argued that language processing does not typically involve rich word-by-word structure building, and/or that apparent working memory effects are underlyingly driven by surprisal (how predictable a word is in context). Consistent with this alternative, some recent behavioral studies of naturalistic language processing that control for surprisal have not shown clear working memory effects. In this fMRI study, we investigate a range of theory-driven predictors of word-by-word working memory demand during naturalistic language comprehension in humans of both sexes under rigorous surprisal controls. In addition, we address a related debate about whether the working memory mechanisms involved in language comprehension are language specialized or domain general. To do so, in each participant, we functionally localize (1) the language-selective network and (2) the "multiple-demand" network, which supports working memory across domains. Results show robust surprisal-independent effects of memory demand in the language network and no effect of memory demand in the multiple-demand network. Our findings thus support the view that language comprehension involves computationally demanding word-by-word structure building operations in working memory, in addition to any prediction-related mechanisms. Further, these memory operations appear to be primarily conducted by the same neural resources that store linguistic knowledge, with no evidence of involvement of brain regions known to support working memory across domains.SIGNIFICANCE STATEMENT This study uses fMRI to investigate signatures of working memory (WM) demand during naturalistic story listening, using a broad range of theoretically motivated estimates of WM demand. Results support a strong effect of WM demand in the brain that is distinct from effects of word predictability. Further, these WM demands register primarily in language-selective regions, rather than in "multiple-demand" regions that have previously been associated with WM in nonlinguistic domains. Our findings support a core role for WM in incremental language processing, using WM resources that are specialized for language.
Collapse
Affiliation(s)
- Cory Shain
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02478
| | - Idan A Blank
- University of California, Los Angeles, Los Angeles, California 90095
| | - Evelina Fedorenko
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02478
| | - Edward Gibson
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02478
| | | |
Collapse
|
97
|
Rafiq M, Jucla M, Guerrier L, Péran P, Pariente J, Pistono A. The functional connectivity of language network across the life span: Disentangling the effects of typical aging from Alzheimer's disease. Front Aging Neurosci 2022; 14:959405. [PMID: 36212038 PMCID: PMC9537133 DOI: 10.3389/fnagi.2022.959405] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/29/2022] [Indexed: 11/20/2022] Open
Abstract
Language is usually characterized as the most preserved cognitive function during typical aging. Several neuroimaging studies have shown that healthy aging is characterized by inter-network compensation which correlates with better language performance. On the contrary, language deficits occur early in the course of Alzheimer's disease (AD). Therefore, this study compares young participants, healthy older participants, and prodromal AD participants, to characterize functional connectivity changes in language due to healthy aging or prodromal AD. We first compared measures of integrated local correlations (ILCs) and fractional amplitude of low-frequency oscillations (fALFFs) in language areas. We showed that both groups of older adults had lower connectivity values within frontal language-related areas. In the healthy older group, higher integrated local correlation (ILC) and fALFF values in frontal areas were positively correlated with fluency and naming tasks. We then performed seed-based analyses for more precise discrimination between healthy aging and prodromal AD. Healthy older adults showed no functional alterations at a seed-based level when the seed area was not or only slightly impaired compared to the young adults [i.e., inferior frontal gyrus (IFG)], while prodromal AD participants also showed decreased connectivity at a seed-based level. On the contrary, when the seed area was similarly impaired in healthy older adults and prodromal AD participants on ILC and fALFF measures, their connectivity maps were also similar during seed-to-voxel analyses [i.e., superior frontal gyrus (SFG)]. Current results show that functional connectivity measures at a voxel level (ILC and fALFF) are already impacted in healthy aging. These findings imply that the functional compensations observed in healthy aging depend on the functional integrity of brain areas at a voxel level.
Collapse
Affiliation(s)
- Marie Rafiq
- Department of Neurology, Neuroscience Centre, Toulouse University Hospital, Toulouse, France
- Toulouse NeuroImaging Center, Toulouse University, Inserm, University Paul Sabatier (UPS), Toulouse, France
| | - Mélanie Jucla
- Octogone-Lordat Interdisciplinary Research Unit (EA 4156), University of Toulouse II-Jean Jaurès, Toulouse, France
| | - Laura Guerrier
- Toulouse NeuroImaging Center, Toulouse University, Inserm, University Paul Sabatier (UPS), Toulouse, France
| | - Patrice Péran
- Toulouse NeuroImaging Center, Toulouse University, Inserm, University Paul Sabatier (UPS), Toulouse, France
| | - Jérémie Pariente
- Department of Neurology, Neuroscience Centre, Toulouse University Hospital, Toulouse, France
- Toulouse NeuroImaging Center, Toulouse University, Inserm, University Paul Sabatier (UPS), Toulouse, France
| | - Aurélie Pistono
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| |
Collapse
|
98
|
Abstract
Neuroimaging using more ecologically valid stimuli such as audiobooks has advanced our understanding of natural language comprehension in the brain. However, prior naturalistic stimuli have typically been restricted to a single language, which limited generalizability beyond small typological domains. Here we present the Le Petit Prince fMRI Corpus (LPPC–fMRI), a multilingual resource for research in the cognitive neuroscience of speech and language during naturalistic listening (OpenNeuro: ds003643). 49 English speakers, 35 Chinese speakers and 28 French speakers listened to the same audiobook The Little Prince in their native language while multi-echo functional magnetic resonance imaging was acquired. We also provide time-aligned speech annotation and word-by-word predictors obtained using natural language processing tools. The resulting timeseries data are shown to be of high quality with good temporal signal-to-noise ratio and high inter-subject correlation. Data-driven functional analyses provide further evidence of data quality. This annotated, multilingual fMRI dataset facilitates future re-analysis that addresses cross-linguistic commonalities and differences in the neural substrate of language processing on multiple perceptual and linguistic levels. Measurement(s) | Blood Oxygen Level-Dependent Functional MRI | Technology Type(s) | Magnetization-Prepared Rapid Gradient Echo MRI | Sample Characteristic - Organism | Homo sapiens |
Collapse
|
99
|
Lipkin B, Tuckute G, Affourtit J, Small H, Mineroff Z, Kean H, Jouravlev O, Rakocevic L, Pritchett B, Siegelman M, Hoeflin C, Pongos A, Blank IA, Struhl MK, Ivanova A, Shannon S, Sathe A, Hoffmann M, Nieto-Castañón A, Fedorenko E. Probabilistic atlas for the language network based on precision fMRI data from >800 individuals. Sci Data 2022; 9:529. [PMID: 36038572 PMCID: PMC9424256 DOI: 10.1038/s41597-022-01645-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 08/09/2022] [Indexed: 11/13/2022] Open
Abstract
Two analytic traditions characterize fMRI language research. One relies on averaging activations across individuals. This approach has limitations: because of inter-individual variability in the locations of language areas, any given voxel/vertex in a common brain space is part of the language network in some individuals but in others, may belong to a distinct network. An alternative approach relies on identifying language areas in each individual using a functional ‘localizer’. Because of its greater sensitivity, functional resolution, and interpretability, functional localization is gaining popularity, but it is not always feasible, and cannot be applied retroactively to past studies. To bridge these disjoint approaches, we created a probabilistic functional atlas using fMRI data for an extensively validated language localizer in 806 individuals. This atlas enables estimating the probability that any given location in a common space belongs to the language network, and thus can help interpret group-level activation peaks and lesion locations, or select voxels/electrodes for analysis. More meaningful comparisons of findings across studies should increase robustness and replicability in language research. Measurement(s) | Brain activity measurement | Technology Type(s) | fMRI | Sample Characteristic - Organism | Homo sapiens |
Collapse
Affiliation(s)
- Benjamin Lipkin
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA. .,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Josef Affourtit
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Hannah Small
- Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA
| | - Zachary Mineroff
- Human-computer Interaction Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Hope Kean
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Olessia Jouravlev
- Department of Cognitive Science, Carleton University, Ottawa, ON, Canada
| | - Lara Rakocevic
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Brianna Pritchett
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Caitlyn Hoeflin
- Harris School of Public Policy, University of Chicago, Chicago, IL, USA
| | - Alvincé Pongos
- Department of Bioengineering, University of California, Berkeley, CA, USA
| | - Idan A Blank
- Department of Psychology, University of California, Los Angeles, CA, USA
| | - Melissa Kline Struhl
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Anna Ivanova
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Steven Shannon
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Malte Hoffmann
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Cambridge, MA, USA
| | - Alfonso Nieto-Castañón
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.,Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA. .,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA. .,Department of Speech, Hearing, Bioscience, and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
100
|
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
|