1
|
Kumar S, Sumers TR, Yamakoshi T, Goldstein A, Hasson U, Norman KA, Griffiths TL, Hawkins RD, Nastase SA. Shared functional specialization in transformer-based language models and the human brain. Nat Commun 2024; 15:5523. [PMID: 38951520 PMCID: PMC11217339 DOI: 10.1038/s41467-024-49173-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 05/24/2024] [Indexed: 07/03/2024] Open
Abstract
When processing language, the brain is thought to deploy specialized computations to construct meaning from complex linguistic structures. Recently, artificial neural networks based on the Transformer architecture have revolutionized the field of natural language processing. Transformers integrate contextual information across words via structured circuit computations. Prior work has focused on the internal representations ("embeddings") generated by these circuits. In this paper, we instead analyze the circuit computations directly: we deconstruct these computations into the functionally-specialized "transformations" that integrate contextual information across words. Using functional MRI data acquired while participants listened to naturalistic stories, we first verify that the transformations account for considerable variance in brain activity across the cortical language network. We then demonstrate that the emergent computations performed by individual, functionally-specialized "attention heads" differentially predict brain activity in specific cortical regions. These heads fall along gradients corresponding to different layers and context lengths in a low-dimensional cortical space.
Collapse
Affiliation(s)
- Sreejan Kumar
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| | - Theodore R Sumers
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.
| | - Takateru Yamakoshi
- Faculty of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Ariel Goldstein
- Department of Cognitive and Brain Sciences and Business School, Hebrew University, Jerusalem, 9190401, Israel
| | - Uri Hasson
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Kenneth A Norman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Thomas L Griffiths
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Robert D Hawkins
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Samuel A Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| |
Collapse
|
2
|
Ferrante M, Boccato T, Passamonti L, Toschi N. Retrieving and reconstructing conceptually similar images from fMRI with latent diffusion models and a neuro-inspired brain decoding model. J Neural Eng 2024; 21:046001. [PMID: 38885689 DOI: 10.1088/1741-2552/ad593c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 06/17/2024] [Indexed: 06/20/2024]
Abstract
Objective.Brain decoding is a field of computational neuroscience that aims to infer mental states or internal representations of perceptual inputs from measurable brain activity. This study proposes a novel approach to brain decoding that relies on semantic and contextual similarity.Approach.We use several functional magnetic resonance imaging (fMRI) datasets of natural images as stimuli and create a deep learning decoding pipeline inspired by the bottom-up and top-down processes in human vision. Our pipeline includes a linear brain-to-feature model that maps fMRI activity to semantic visual stimuli features. We assume that the brain projects visual information onto a space that is homeomorphic to the latent space of last layer of a pretrained neural network, which summarizes and highlights similarities and differences between concepts. These features are categorized in the latent space using a nearest-neighbor strategy, and the results are used to retrieve images or condition a generative latent diffusion model to create novel images.Main results.We demonstrate semantic classification and image retrieval on three different fMRI datasets: Generic Object Decoding (vision perception and imagination), BOLD5000, and NSD. In all cases, a simple mapping between fMRI and a deep semantic representation of the visual stimulus resulted in meaningful classification and retrieved or generated images. We assessed quality using quantitative metrics and a human evaluation experiment that reproduces the multiplicity of conscious and unconscious criteria that humans use to evaluate image similarity. Our method achieved correct evaluation in over 80% of the test set.Significance.Our study proposes a novel approach to brain decoding that relies on semantic and contextual similarity. The results demonstrate that measurable neural correlates can be linearly mapped onto the latent space of a neural network to synthesize images that match the original content. These findings have implications for both cognitive neuroscience and artificial intelligence.
Collapse
Affiliation(s)
- Matteo Ferrante
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
| | - Tommaso Boccato
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
| | - Luca Passamonti
- CNR, Istituto di Bioimmagini e Fisiologia Molecolare, Milan, Italy
| | - Nicola Toschi
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
- Martinos Center for Biomedical Imaging, MGH and Harvard Medical School, Boston, MA, United States of America
| |
Collapse
|
3
|
Morgan AM, Devinsky O, Doyle WK, Dugan P, Friedman D, Flinker A. A low-activity cortical network selectively encodes syntax. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.20.599931. [PMID: 38948730 PMCID: PMC11212956 DOI: 10.1101/2024.06.20.599931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Syntax, the abstract structure of language, is a hallmark of human cognition. Despite its importance, its neural underpinnings remain obscured by inherent limitations of non-invasive brain measures and a near total focus on comprehension paradigms. Here, we address these limitations with high-resolution neurosurgical recordings (electrocorticography) and a controlled sentence production experiment. We uncover three syntactic networks that are broadly distributed across traditional language regions, but with focal concentrations in middle and inferior frontal gyri. In contrast to previous findings from comprehension studies, these networks process syntax mostly to the exclusion of words and meaning, supporting a cognitive architecture with a distinct syntactic system. Most strikingly, our data reveal an unexpected property of syntax: it is encoded independent of neural activity levels. We propose that this "low-activity coding" scheme represents a novel mechanism for encoding information, reserved for higher-order cognition more broadly.
Collapse
Affiliation(s)
- Adam M. Morgan
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Orrin Devinsky
- Neurosurgery Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Werner K. Doyle
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Patricia Dugan
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Daniel Friedman
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Adeen Flinker
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
- Biomedical Engineering Department, NYU Tandon School of Engineering, 6 MetroTech Center Ave, Brooklyn, 11201, NY, USA
| |
Collapse
|
4
|
Alcalá-López D, Mei N, Margolles P, Soto D. Brain-wide representation of social knowledge. Soc Cogn Affect Neurosci 2024; 19:nsae032. [PMID: 38804694 PMCID: PMC11173195 DOI: 10.1093/scan/nsae032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/28/2024] [Accepted: 05/30/2024] [Indexed: 05/29/2024] Open
Abstract
Understanding how the human brain maps different dimensions of social conceptualizations remains a key unresolved issue. We performed a functional magnetic resonance imaging (MRI) study in which participants were exposed to audio definitions of personality traits and asked to simulate experiences associated with the concepts. Half of the concepts were affective (e.g. empathetic), and the other half were non-affective (e.g. intelligent). Orthogonally, half of the concepts were highly likable (e.g. sincere) and half were socially undesirable (e.g. liar). Behaviourally, we observed that the dimension of social desirability reflected the participant's subjective ratings better than affect. FMRI decoding results showed that both social desirability and affect could be decoded in local patterns of activity through distributed brain regions including the superior temporal, inferior frontal, precuneus and key nodes of the default mode network in posterior/anterior cingulate and ventromedial prefrontal cortex. Decoding accuracy was better for social desirability than affect. A representational similarity analysis further demonstrated that a deep language model significantly predicted brain activity associated with the concepts in bilateral regions of superior and anterior temporal lobes. The results demonstrate a brain-wide representation of social knowledge, involving default model network systems that support the multimodal simulation of social experience, with a further reliance on language-related preprocessing.
Collapse
Affiliation(s)
- Daniel Alcalá-López
- Consciousness group, Basque Center on Cognition, Brain and Language, San Sebastian 20009, Spain
| | - Ning Mei
- Psychology Department, Shenzhen University, Nanshan district, Guangdong province 3688, China
| | - Pedro Margolles
- Consciousness group, Basque Center on Cognition, Brain and Language, San Sebastian 20009, Spain
| | - David Soto
- Consciousness group, Basque Center on Cognition, Brain and Language, San Sebastian 20009, Spain
| |
Collapse
|
5
|
Yu S, Gu C, Huang K, Li P. Predicting the next sentence (not word) in large language models: What model-brain alignment tells us about discourse comprehension. SCIENCE ADVANCES 2024; 10:eadn7744. [PMID: 38781343 PMCID: PMC11114233 DOI: 10.1126/sciadv.adn7744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/18/2024] [Indexed: 05/25/2024]
Abstract
Current large language models (LLMs) rely on word prediction as their backbone pretraining task. Although word prediction is an important mechanism underlying language processing, human language comprehension occurs at multiple levels, involving the integration of words and sentences to achieve a full understanding of discourse. This study models language comprehension by using the next sentence prediction (NSP) task to investigate mechanisms of discourse-level comprehension. We show that NSP pretraining enhanced a model's alignment with brain data especially in the right hemisphere and in the multiple demand network, highlighting the contributions of nonclassical language regions to high-level language understanding. Our results also suggest that NSP can enable the model to better capture human comprehension performance and to better encode contextual information. Our study demonstrates that the inclusion of diverse learning objectives in a model leads to more human-like representations, and investigating the neurocognitive plausibility of pretraining tasks in LLMs can shed light on outstanding questions in language neuroscience.
Collapse
Affiliation(s)
- Shaoyun Yu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Chanyuan Gu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kexin Huang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Ping Li
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
- Centre for Immersive Learning and Metaverse in Education, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
6
|
Fedorenko E, Ivanova AA, Regev TI. The language network as a natural kind within the broader landscape of the human brain. Nat Rev Neurosci 2024; 25:289-312. [PMID: 38609551 DOI: 10.1038/s41583-024-00802-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 04/14/2024]
Abstract
Language behaviour is complex, but neuroscientific evidence disentangles it into distinct components supported by dedicated brain areas or networks. In this Review, we describe the 'core' language network, which includes left-hemisphere frontal and temporal areas, and show that it is strongly interconnected, independent of input and output modalities, causally important for language and language-selective. We discuss evidence that this language network plausibly stores language knowledge and supports core linguistic computations related to accessing words and constructions from memory and combining them to interpret (decode) or generate (encode) linguistic messages. We emphasize that the language network works closely with, but is distinct from, both lower-level - perceptual and motor - mechanisms and higher-level systems of knowledge and reasoning. The perceptual and motor mechanisms process linguistic signals, but, in contrast to the language network, are sensitive only to these signals' surface properties, not their meanings; the systems of knowledge and reasoning (such as the system that supports social reasoning) are sometimes engaged during language use but are not language-selective. This Review lays a foundation both for in-depth investigations of these different components of the language processing pipeline and for probing inter-component interactions.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- The Program in Speech and Hearing in Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| | - Anna A Ivanova
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
7
|
Kobelt M, Waldhauser GT, Rupietta A, Heinen R, Rau EMB, Kessler H, Axmacher N. The memory trace of an intrusive trauma-analog episode. Curr Biol 2024; 34:1657-1669.e5. [PMID: 38537637 DOI: 10.1016/j.cub.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 12/05/2023] [Accepted: 03/06/2024] [Indexed: 04/25/2024]
Abstract
Intrusive memories are a core symptom of posttraumatic stress disorder. Compared with memories of everyday events, they are characterized by several seemingly contradictory features: intrusive memories contain distinct sensory and emotional details of the traumatic event and can be triggered by various perceptually similar cues, but they are poorly integrated into conceptual memory. Here, we conduct exploratory whole-brain analyses to investigate the neural representations of trauma-analog experiences and how they are reactivated during memory intrusions. We show that trauma-analog movies induce excessive processing and generalized representations in sensory areas but decreased blood-oxygen-level-dependent (BOLD) responses and highly distinct representations in conceptual/semantic areas. Intrusive memories activate generalized representations in sensory areas and reactivate memory traces specific to trauma-analog events in the anterior cingulate cortex. These findings provide the first evidence of how traumatic events could distort memory representations in the human brain, which may form the basis for future confirmatory research on the neural representations of traumatic experiences.
Collapse
Affiliation(s)
- M Kobelt
- Department of Neuropsychology, Ruhr-Universität Bochum, Bochum 44801, North Rhine-Westphalia, Germany.
| | - G T Waldhauser
- Department of Neuropsychology, Ruhr-Universität Bochum, Bochum 44801, North Rhine-Westphalia, Germany.
| | - A Rupietta
- Department of Clinical Psychology and Psychotherapy, Ruhr-Universität Bochum, Bochum 44787, North Rhine-Westphalia, Germany
| | - R Heinen
- Department of Neuropsychology, Ruhr-Universität Bochum, Bochum 44801, North Rhine-Westphalia, Germany
| | - E M B Rau
- Department of Neuropsychology, Ruhr-Universität Bochum, Bochum 44801, North Rhine-Westphalia, Germany
| | - H Kessler
- Department of Psychosomatic Medicine and Psychotherapy, Campus Fulda, Universität Marburg, Marburg 35032, Hessen, Germany; Department of Psychosomatic Medicine and Psychotherapy, LWL University Hospital, Ruhr-Universität Bochum, Bochum 44791, North Rhine-Westphalia, Germany
| | - N Axmacher
- Department of Neuropsychology, Ruhr-Universität Bochum, Bochum 44801, North Rhine-Westphalia, Germany.
| |
Collapse
|
8
|
Lyu B, Marslen-Wilson WD, Fang Y, Tyler LK. Finding structure during incremental speech comprehension. eLife 2024; 12:RP89311. [PMID: 38577982 PMCID: PMC10997333 DOI: 10.7554/elife.89311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024] Open
Abstract
A core aspect of human speech comprehension is the ability to incrementally integrate consecutive words into a structured and coherent interpretation, aligning with the speaker's intended meaning. This rapid process is subject to multidimensional probabilistic constraints, including both linguistic knowledge and non-linguistic information within specific contexts, and it is their interpretative coherence that drives successful comprehension. To study the neural substrates of this process, we extract word-by-word measures of sentential structure from BERT, a deep language model, which effectively approximates the coherent outcomes of the dynamic interplay among various types of constraints. Using representational similarity analysis, we tested BERT parse depths and relevant corpus-based measures against the spatiotemporally resolved brain activity recorded by electro-/magnetoencephalography when participants were listening to the same sentences. Our results provide a detailed picture of the neurobiological processes involved in the incremental construction of structured interpretations. These findings show when and where coherent interpretations emerge through the evaluation and integration of multifaceted constraints in the brain, which engages bilateral brain regions extending beyond the classical fronto-temporal language system. Furthermore, this study provides empirical evidence supporting the use of artificial neural networks as computational models for revealing the neural dynamics underpinning complex cognitive processes in the brain.
Collapse
Affiliation(s)
| | - William D Marslen-Wilson
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| | - Yuxing Fang
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| | - Lorraine K Tyler
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| |
Collapse
|
9
|
Goldstein A, Grinstein-Dabush A, Schain M, Wang H, Hong Z, Aubrey B, Schain M, Nastase SA, Zada Z, Ham E, Feder A, Gazula H, Buchnik E, Doyle W, Devore S, Dugan P, Reichart R, Friedman D, Brenner M, Hassidim A, Devinsky O, Flinker A, Hasson U. Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns. Nat Commun 2024; 15:2768. [PMID: 38553456 PMCID: PMC10980748 DOI: 10.1038/s41467-024-46631-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 03/04/2024] [Indexed: 04/02/2024] Open
Abstract
Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. We hypothesize that language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent language. To test this hypothesis, we densely record the neural activity patterns in the inferior frontal gyrus (IFG) of three participants using dense intracranial arrays while they listened to a 30-minute podcast. From these fine-grained spatiotemporal neural recordings, we derive a continuous vectorial representation for each word (i.e., a brain embedding) in each patient. Using stringent zero-shot mapping we demonstrate that brain embeddings in the IFG and the DLM contextual embedding space have common geometric patterns. The common geometric patterns allow us to predict the brain embedding in IFG of a given left-out word based solely on its geometrical relationship to other non-overlapping words in the podcast. Furthermore, we show that contextual embeddings capture the geometry of IFG embeddings better than static word embeddings. The continuous brain embedding space exposes a vector-based neural code for natural language processing in the human brain.
Collapse
Affiliation(s)
- Ariel Goldstein
- Business School, Data Science department and Cognitive Department, Hebrew University, Jerusalem, Israel.
- Google Research, Tel Aviv, Israel.
| | | | | | - Haocheng Wang
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Zhuoqiao Hong
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Bobbi Aubrey
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
- New York University Grossman School of Medicine, New York, NY, USA
| | | | - Samuel A Nastase
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Zaid Zada
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Eric Ham
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | - Harshvardhan Gazula
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | - Werner Doyle
- New York University Grossman School of Medicine, New York, NY, USA
| | - Sasha Devore
- New York University Grossman School of Medicine, New York, NY, USA
| | - Patricia Dugan
- New York University Grossman School of Medicine, New York, NY, USA
| | - Roi Reichart
- Faculty of Industrial Engineering and Management, Technion, Israel Institute of Technology, Haifa, Israel
| | - Daniel Friedman
- New York University Grossman School of Medicine, New York, NY, USA
| | - Michael Brenner
- Google Research, Tel Aviv, Israel
- School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA
| | | | - Orrin Devinsky
- New York University Grossman School of Medicine, New York, NY, USA
| | - Adeen Flinker
- New York University Grossman School of Medicine, New York, NY, USA
- New York University Tandon School of Engineering, Brooklyn, NY, USA
| | - Uri Hasson
- Google Research, Tel Aviv, Israel
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| |
Collapse
|
10
|
Di Liberto GM, Nidiffer A, Crosse MJ, Zuk N, Haro S, Cantisani G, Winchester MM, Igoe A, McCrann R, Chandra S, Lalor EC, Baruzzo G. A standardised open science framework for sharing and re-analysing neural data acquired to continuous stimuli. ARXIV 2024:arXiv:2309.07671v3. [PMID: 37744463 PMCID: PMC10516115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Neurophysiology research has demonstrated that it is possible and valuable to investigate sensory processing in scenarios involving continuous sensory streams, such as speech and music. Over the past 10 years or so, novel analytic frameworks combined with the growing participation in data sharing has led to a surge of publicly available datasets involving continuous sensory experiments. However, open science efforts in this domain of research remain scattered, lacking a cohesive set of guidelines. This paper presents an end-to-end open science framework for the storage, analysis, sharing, and re-analysis of neural data recorded during continuous sensory experiments. The framework has been designed to interface easily with existing toolboxes, such as EelBrain, NapLib, MNE, and the mTRF-Toolbox. We present guidelines by taking both the user view (how to rapidly re-analyse existing data) and the experimenter view (how to store, analyse, and share), making the process as straightforward and accessible as possible for all users. Additionally, we introduce a web-based data browser that enables the effortless replication of published results and data re-analysis.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- School of Computer Science and Statistics, University of Dublin, Trinity College, Ireland; ADAPT Centre, Trinity College Institute of Neuroscience
| | - Aaron Nidiffer
- Dept. Biomedical Engineering, Dept. of Neuroscience, Del Monte Institute for Neuroscience, Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Michael J Crosse
- Segotia, Galway, Ireland
- Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity Centre for Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Nathaniel Zuk
- Department of Psychology, Nottingham Trent University, Nottingham, UK
| | - Stephanie Haro
- Human Health and Performance Systems, MIT Lincoln Laboratory, Lexington, Massachusetts, USA
- Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Giorgia Cantisani
- Laboratoire des systémes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, 75005 Paris, France
- School of Computer Science and Statistics, University of Dublin, Trinity College, Ireland; ADAPT Centre, Trinity College Institute of Neuroscience
| | - Martin M Winchester
- School of Computer Science and Statistics, University of Dublin, Trinity College, Ireland; ADAPT Centre, Trinity College Institute of Neuroscience
| | - Aoife Igoe
- School of Computer Science and Statistics, University of Dublin, Trinity College, Ireland; ADAPT Centre, Trinity College Institute of Neuroscience
| | - Ross McCrann
- School of Computer Science and Statistics, University of Dublin, Trinity College, Ireland; ADAPT Centre, Trinity College Institute of Neuroscience
| | - Satwik Chandra
- School of Computer Science and Statistics, University of Dublin, Trinity College, Ireland; ADAPT Centre, Trinity College Institute of Neuroscience
| | - Edmund C Lalor
- Dept. Biomedical Engineering, Dept. of Neuroscience, Del Monte Institute for Neuroscience, Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Giacomo Baruzzo
- Department of Information Engineering, University of Padova, Padova, Italy
| |
Collapse
|
11
|
Orepic P, Truccolo W, Halgren E, Cash SS, Giraud AL, Proix T. Neural manifolds carry reactivation of phonetic representations during semantic processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.30.564638. [PMID: 37961305 PMCID: PMC10634964 DOI: 10.1101/2023.10.30.564638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Traditional models of speech perception posit that neural activity encodes speech through a hierarchy of cognitive processes, from low-level representations of acoustic and phonetic features to high-level semantic encoding. Yet it remains unknown how neural representations are transformed across levels of the speech hierarchy. Here, we analyzed unique microelectrode array recordings of neuronal spiking activity from the human left anterior superior temporal gyrus, a brain region at the interface between phonetic and semantic speech processing, during a semantic categorization task and natural speech perception. We identified distinct neural manifolds for semantic and phonetic features, with a functional separation of the corresponding low-dimensional trajectories. Moreover, phonetic and semantic representations were encoded concurrently and reflected in power increases in the beta and low-gamma local field potentials, suggesting top-down predictive and bottom-up cumulative processes. Our results are the first to demonstrate mechanisms for hierarchical speech transformations that are specific to neuronal population dynamics.
Collapse
Affiliation(s)
- Pavo Orepic
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Wilson Truccolo
- Department of Neuroscience, Brown University, Providence, Rhode Island, United States of America
- Carney Institute for Brain Science, Brown University, Providence, Rhode Island, United States of America
| | - Eric Halgren
- Department of Neuroscience & Radiology, University of California San Diego, La Jolla, California, United States of America
| | - Sydney S Cash
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Institut Pasteur, Université Paris Cité, Hearing Institute, Paris, France
| | - Timothée Proix
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
12
|
Bowers JS, Malhotra G, Dujmović M, Montero ML, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Clarifying status of DNNs as models of human vision. Behav Brain Sci 2023; 46:e415. [PMID: 38054298 DOI: 10.1017/s0140525x23002777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN-human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN-human correspondences in vision and other domains of cognition. We explore all these issues in this response.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton L Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | | | - Federico Adolfi
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Psychology Department, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Psychology Department, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|
13
|
Palaniyappan L, Benrimoh D, Voppel A, Rocca R. Studying Psychosis Using Natural Language Generation: A Review of Emerging Opportunities. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2023; 8:994-1004. [PMID: 38441079 DOI: 10.1016/j.bpsc.2023.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 04/16/2023] [Accepted: 04/19/2023] [Indexed: 03/07/2024]
Abstract
Disrupted language in psychotic disorders, such as schizophrenia, can manifest as false contents and formal deviations, often described as thought disorder. These features play a critical role in the social dysfunction associated with psychosis, but we continue to lack insights regarding how and why these symptoms develop. Natural language generation (NLG) is a field of computer science that focuses on generating human-like language for various applications. The theory that psychosis is related to the evolution of language in humans suggests that NLG systems that are sufficiently evolved to generate human-like language may also exhibit psychosis-like features. In this conceptual review, we propose using NLG systems that are at various stages of development as in silico tools to study linguistic features of psychosis. We argue that a program of in silico experimental research on the network architecture, function, learning rules, and training of NLG systems can help us understand better why thought disorder occurs in patients. This will allow us to gain a better understanding of the relationship between language and psychosis and potentially pave the way for new therapeutic approaches to address this vexing challenge.
Collapse
Affiliation(s)
- Lena Palaniyappan
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, Quebec, Canada; Robarts Research Institute, Western University, London, Ontario, Canada; Department of Medical Biophysics, Western University, London, Ontario, Canada.
| | - David Benrimoh
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, Quebec, Canada; Department of Psychiatry, Stanford University, Palo Alto, California
| | - Alban Voppel
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, Quebec, Canada; Department of Psychiatry, University of Groningen, Groningen, the Netherlands
| | - Roberta Rocca
- Interacting Minds Centre, Department of Culture, Cognition and Computation, Aarhus University, Aarhus, Denmark
| |
Collapse
|
14
|
Taylor J, Kriegeskorte N. Extracting and visualizing hidden activations and computational graphs of PyTorch models with TorchLens. Sci Rep 2023; 13:14375. [PMID: 37658079 PMCID: PMC10474256 DOI: 10.1038/s41598-023-40807-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 08/16/2023] [Indexed: 09/03/2023] Open
Abstract
Deep neural network models (DNNs) are essential to modern AI and provide powerful models of information processing in biological neural networks. Researchers in both neuroscience and engineering are pursuing a better understanding of the internal representations and operations that undergird the successes and failures of DNNs. Neuroscientists additionally evaluate DNNs as models of brain computation by comparing their internal representations to those found in brains. It is therefore essential to have a method to easily and exhaustively extract and characterize the results of the internal operations of any DNN. Many models are implemented in PyTorch, the leading framework for building DNN models. Here we introduce TorchLens, a new open-source Python package for extracting and characterizing hidden-layer activations in PyTorch models. Uniquely among existing approaches to this problem, TorchLens has the following features: (1) it exhaustively extracts the results of all intermediate operations, not just those associated with PyTorch module objects, yielding a full record of every step in the model's computational graph, (2) it provides an intuitive visualization of the model's complete computational graph along with metadata about each computational step in a model's forward pass for further analysis, (3) it contains a built-in validation procedure to algorithmically verify the accuracy of all saved hidden-layer activations, and (4) the approach it uses can be automatically applied to any PyTorch model with no modifications, including models with conditional (if-then) logic in their forward pass, recurrent models, branching models where layer outputs are fed into multiple subsequent layers in parallel, and models with internally generated tensors (e.g., injections of noise). Furthermore, using TorchLens requires minimal additional code, making it easy to incorporate into existing pipelines for model development and analysis, and useful as a pedagogical aid when teaching deep learning concepts. We hope this contribution will help researchers in AI and neuroscience understand the internal representations of DNNs.
Collapse
Affiliation(s)
- JohnMark Taylor
- Zuckerman Mind Brain Behavior Institute, Columbia University, 3227 Broadway, New York, NY, 10027, USA.
| | - Nikolaus Kriegeskorte
- Zuckerman Mind Brain Behavior Institute, Columbia University, 3227 Broadway, New York, NY, 10027, USA
| |
Collapse
|
15
|
Alickovic E, Dorszewski T, Christiansen TU, Eskelund K, Gizzi L, Skoglund MA, Wendt D. Predicting EEG Responses to Attended Speech via Deep Neural Networks for Speech. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083171 DOI: 10.1109/embc40787.2023.10340027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Attending to the speech stream of interest in multi-talker environments can be a challenging task, particularly for listeners with hearing impairment. Research suggests that neural responses assessed with electroencephalography (EEG) are modulated by listener's auditory attention, revealing selective neural tracking (NT) of the attended speech. NT methods mostly rely on hand-engineered acoustic and linguistic speech features to predict the neural response. Only recently, deep neural network (DNN) models without specific linguistic information have been used to extract speech features for NT, demonstrating that speech features in hierarchical DNN layers can predict neural responses throughout the auditory pathway. In this study, we go one step further to investigate the suitability of similar DNN models for speech to predict neural responses to competing speech observed in EEG. We recorded EEG data using a 64-channel acquisition system from 17 listeners with normal hearing instructed to attend to one of two competing talkers. Our data revealed that EEG responses are significantly better predicted by DNN-extracted speech features than by hand-engineered acoustic features. Furthermore, analysis of hierarchical DNN layers showed that early layers yielded the highest predictions. Moreover, we found a significant increase in auditory attention classification accuracies with the use of DNN-extracted speech features over the use of hand-engineered acoustic features. These findings open a new avenue for development of new NT measures to evaluate and further advance hearing technology.
Collapse
|
16
|
Caucheteux C, Gramfort A, King JR. Evidence of a predictive coding hierarchy in the human brain listening to speech. Nat Hum Behav 2023; 7:430-441. [PMID: 36864133 PMCID: PMC10038805 DOI: 10.1038/s41562-022-01516-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 12/15/2022] [Indexed: 03/04/2023]
Abstract
Considerable progress has recently been made in natural language processing: deep learning algorithms are increasingly able to generate, summarize, translate and classify texts. Yet, these language models still fail to match the language abilities of humans. Predictive coding theory offers a tentative explanation to this discrepancy: while language models are optimized to predict nearby words, the human brain would continuously predict a hierarchy of representations that spans multiple timescales. To test this hypothesis, we analysed the functional magnetic resonance imaging brain signals of 304 participants listening to short stories. First, we confirmed that the activations of modern language models linearly map onto the brain responses to speech. Second, we showed that enhancing these algorithms with predictions that span multiple timescales improves this brain mapping. Finally, we showed that these predictions are organized hierarchically: frontoparietal cortices predict higher-level, longer-range and more contextual representations than temporal cortices. Overall, these results strengthen the role of hierarchical predictive coding in language processing and illustrate how the synergy between neuroscience and artificial intelligence can unravel the computational bases of human cognition.
Collapse
Affiliation(s)
- Charlotte Caucheteux
- Meta AI, Paris, France.
- Université Paris-Saclay, Inria, Commissariat à l'Énergie Atomique et aux Énergies Alternatives, Paris, France.
| | - Alexandre Gramfort
- Meta AI, Paris, France
- Université Paris-Saclay, Inria, Commissariat à l'Énergie Atomique et aux Énergies Alternatives, Paris, France
| | - Jean-Rémi King
- Meta AI, Paris, France.
- Laboratoire des systèmes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, Paris, France.
| |
Collapse
|
17
|
Antonello RJ, Vaidya AR, Huth AG. Scaling laws for language encoding models in fMRI. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:21895-21907. [PMID: 39035676 PMCID: PMC11258918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar logarithmic behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.
Collapse
Affiliation(s)
| | - Aditya R Vaidya
- Department of Computer Science, The University of Texas at Austin
| | - Alexander G Huth
- Departments of Computer Science and Neuroscience, The University of Texas at Austin
| |
Collapse
|
18
|
Bowers JS, Malhotra G, Dujmović M, Llera Montero M, Tsvetkov C, Biscione V, Puebla G, Adolfi F, Hummel JE, Heaton RF, Evans BD, Mitchell J, Blything R. Deep problems with neural network models of human vision. Behav Brain Sci 2022; 46:e385. [PMID: 36453586 DOI: 10.1017/s0140525x22002813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.
Collapse
Affiliation(s)
- Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Gaurav Malhotra
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Marin Dujmović
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Milton Llera Montero
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Christian Tsvetkov
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Guillermo Puebla
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
| | - Federico Adolfi
- School of Psychological Science, University of Bristol, Bristol, UK ; https://jeffbowers.blogs.bristol.ac.uk/
- Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - John E Hummel
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Rachel F Heaton
- Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Benjamin D Evans
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Jeffrey Mitchell
- Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK
| | - Ryan Blything
- School of Psychology, Aston University, Birmingham, UK
| |
Collapse
|