1
|
Tabas A, von Kriegstein K. Multiple Concurrent Predictions Inform Prediction Error in the Human Auditory Pathway. J Neurosci 2024; 44:e2219222023. [PMID: 37949655 PMCID: PMC10851690 DOI: 10.1523/jneurosci.2219-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 09/08/2023] [Accepted: 09/16/2023] [Indexed: 11/12/2023] Open
Abstract
The key assumption of the predictive coding framework is that internal representations are used to generate predictions on how the sensory input will look like in the immediate future. These predictions are tested against the actual input by the so-called prediction error units, which encode the residuals of the predictions. What happens to prediction errors, however, if predictions drawn by different stages of the sensory hierarchy contradict each other? To answer this question, we conducted two fMRI experiments while female and male human participants listened to sequences of sounds: pure tones in the first experiment and frequency-modulated sweeps in the second experiment. In both experiments, we used repetition to induce predictions based on stimulus statistics (stats-informed predictions) and abstract rules disclosed in the task instructions to induce an orthogonal set of (task-informed) predictions. We tested three alternative scenarios: neural responses in the auditory sensory pathway encode prediction error with respect to (1) the stats-informed predictions, (2) the task-informed predictions, or (3) a combination of both. Results showed that neural populations in all recorded regions (bilateral inferior colliculus, medial geniculate body, and primary and secondary auditory cortices) encode prediction error with respect to a combination of the two orthogonal sets of predictions. The findings suggest that predictive coding exploits the non-linear architecture of the auditory pathway for the transmission of predictions. Such non-linear transmission of predictions might be crucial for the predictive coding of complex auditory signals like speech.Significance Statement Sensory systems exploit our subjective expectations to make sense of an overwhelming influx of sensory signals. It is still unclear how expectations at each stage of the processing pipeline are used to predict the representations at the other stages. The current view is that this transmission is hierarchical and linear. Here we measured fMRI responses in auditory cortex, sensory thalamus, and midbrain while we induced two sets of mutually inconsistent expectations on the sensory input, each putatively encoded at a different stage. We show that responses at all stages are concurrently shaped by both sets of expectations. The results challenge the hypothesis that expectations are transmitted linearly and provide for a normative explanation of the non-linear physiology of the corticofugal sensory system.
Collapse
Affiliation(s)
- Alejandro Tabas
- Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
- Department of Psychology, Technische Universität Dresden, 01062 Dresden, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany
| | - Katharina von Kriegstein
- Department of Psychology, Technische Universität Dresden, 01062 Dresden, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany
| |
Collapse
|
2
|
Kumar M, Goldstein A, Michelmann S, Zacks JM, Hasson U, Norman KA. Bayesian Surprise Predicts Human Event Segmentation in Story Listening. Cogn Sci 2023; 47:e13343. [PMID: 37867379 DOI: 10.1111/cogs.13343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 08/28/2023] [Accepted: 09/01/2023] [Indexed: 10/24/2023]
Abstract
Event segmentation theory posits that people segment continuous experience into discrete events and that event boundaries occur when there are large transient increases in prediction error. Here, we set out to test this theory in the context of story listening, by using a deep learning language model (GPT-2) to compute the predicted probability distribution of the next word, at each point in the story. For three stories, we used the probability distributions generated by GPT-2 to compute the time series of prediction error. We also asked participants to listen to these stories while marking event boundaries. We used regression models to relate the GPT-2 measures to the human segmentation data. We found that event boundaries are associated with transient increases in Bayesian surprise but not with a simpler measure of prediction error (surprisal) that tracks, for each word in the story, how strongly that word was predicted at the previous time point. These results support the hypothesis that prediction error serves as a control mechanism governing event segmentation and point to important differences between operational definitions of prediction error.
Collapse
Affiliation(s)
- Manoj Kumar
- Princeton Neuroscience Institute, Princeton University
| | - Ariel Goldstein
- Department of Cognitive and Brain Sciences and Business School, Hebrew University
- Google Research, Tel-Aviv
| | | | - Jeffrey M Zacks
- Department of Psychological & Brain Sciences, Washington University in St. Louis
| | - Uri Hasson
- Princeton Neuroscience Institute, Princeton University
- Department of Psychology, Princeton University
| | - Kenneth A Norman
- Princeton Neuroscience Institute, Princeton University
- Department of Psychology, Princeton University
| |
Collapse
|
3
|
Zada Z, Goldstein A, Michelmann S, Simony E, Price A, Hasenfratz L, Barham E, Zadbood A, Doyle W, Friedman D, Dugan P, Melloni L, Devore S, Flinker A, Devinsky O, Nastase SA, Hasson U. A shared linguistic space for transmitting our thoughts from brain to brain in natural conversations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.27.546708. [PMID: 37425747 PMCID: PMC10327051 DOI: 10.1101/2023.06.27.546708] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Effective communication hinges on a mutual understanding of word meaning in different contexts. The embedding space learned by large language models can serve as an explicit model of the shared, context-rich meaning space humans use to communicate their thoughts. We recorded brain activity using electrocorticography during spontaneous, face-to-face conversations in five pairs of epilepsy patients. We demonstrate that the linguistic embedding space can capture the linguistic content of word-by-word neural alignment between speaker and listener. Linguistic content emerged in the speaker's brain before word articulation, and the same linguistic content rapidly reemerged in the listener's brain after word articulation. These findings establish a computational framework to study how human brains transmit their thoughts to one another in real-world contexts.
Collapse
Affiliation(s)
- Zaid Zada
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Ariel Goldstein
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
- Department of Cognitive and Brain Sciences and Business School, Hebrew University; Jerusalem, 9190501, Israel
| | - Sebastian Michelmann
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Erez Simony
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
- Faculty of Engineering, Holon Institute of Technology, Holon, 5810201, Israel
| | - Amy Price
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Liat Hasenfratz
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Emily Barham
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Asieh Zadbood
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
- Department of Psychology, Columbia University; New York, 10027, USA
| | - Werner Doyle
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Daniel Friedman
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Patricia Dugan
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Lucia Melloni
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Sasha Devore
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Adeen Flinker
- Grossman School of Medicine, New York University; New York, 10016, USA
- Tandon School of Engineering, New York University; New York, 10016, USA
| | - Orrin Devinsky
- Grossman School of Medicine, New York University; New York, 10016, USA
| | - Samuel A. Nastase
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| | - Uri Hasson
- Princeton Neuroscience Institute and Department of Psychology, Princeton University; New Jersey, 08544, USA
| |
Collapse
|
4
|
Szewczyk JM, Federmeier KD. Context-based facilitation of semantic access follows both logarithmic and linear functions of stimulus probability. JOURNAL OF MEMORY AND LANGUAGE 2022; 123:104311. [PMID: 36337731 PMCID: PMC9631957 DOI: 10.1016/j.jml.2021.104311] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Stimuli are easier to process when context makes them predictable, but does context-based facilitation arise from preactivation of a limited set of relatively probable upcoming stimuli (with facilitation then linearly related to probability) or, instead, because the system maintains and updates a probability distribution across all items (with facilitation logarithmically related to probability)? We measured the N400, an index of semantic access, to words of varying probability, including unpredictable words. Word predictability was measured using both cloze probabilities and a state-of-the-art machine learning language model (GPT-2). We reanalyzed five datasets (n = 138) to demonstrate and then replicate that context-based facilitation on the N400 is graded, even among unpredictable words. Furthermore, we established that the relationship between word predictability and context-based facilitation combines linear and logarithmic functions. We argue that this composite function reveals properties of the mapping between words and semantic features and how feature- and word-related information is activated on-line.
Collapse
Affiliation(s)
- Jakub M. Szewczyk
- Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
- Corresponding author at: Donders Institute for Brain, Cognition and Behaviour, Radboud University, Heyendaalseweg 135, 6525 AJ Nijmegen, the Netherlands. (J.M. Szewczyk)
| | - Kara D. Federmeier
- Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
- Program in Neuroscience, University of Illinois at Urbana-champaign, Champaign, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana–Champaign, Champaign, IL, USA
| |
Collapse
|
5
|
Goldstein A, Zada Z, Buchnik E, Schain M, Price A, Aubrey B, Nastase SA, Feder A, Emanuel D, Cohen A, Jansen A, Gazula H, Choe G, Rao A, Kim C, Casto C, Fanda L, Doyle W, Friedman D, Dugan P, Melloni L, Reichart R, Devore S, Flinker A, Hasenfratz L, Levy O, Hassidim A, Brenner M, Matias Y, Norman KA, Devinsky O, Hasson U. Shared computational principles for language processing in humans and deep language models. Nat Neurosci 2022; 25:369-380. [PMID: 35260860 PMCID: PMC8904253 DOI: 10.1038/s41593-022-01026-4] [Citation(s) in RCA: 77] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 01/27/2022] [Indexed: 11/17/2022]
Abstract
Departing from traditional linguistic models, advances in deep learning have resulted in a new type of predictive (autoregressive) deep language models (DLMs). Using a self-supervised next-word prediction task, these models generate appropriate linguistic responses in a given context. In the current study, nine participants listened to a 30-min podcast while their brain responses were recorded using electrocorticography (ECoG). We provide empirical evidence that the human brain and autoregressive DLMs share three fundamental computational principles as they process the same natural narrative: (1) both are engaged in continuous next-word prediction before word onset; (2) both match their pre-onset predictions to the incoming word to calculate post-onset surprise; (3) both rely on contextual embeddings to represent words in natural contexts. Together, our findings suggest that autoregressive DLMs provide a new and biologically feasible computational framework for studying the neural basis of language. Deep language models have revolutionized natural language processing. The paper discovers three computational principles shared between deep language models and the human brain, which can transform our understanding of the neural basis of language.
Collapse
Affiliation(s)
- Ariel Goldstein
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA. .,Google Research, Mountain View, CA, USA.
| | - Zaid Zada
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | | | - Amy Price
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Bobbi Aubrey
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA.,New York University Grossman School of Medicine, New York, NY, USA
| | - Samuel A Nastase
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | | | | | | | - Harshvardhan Gazula
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Gina Choe
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA.,New York University Grossman School of Medicine, New York, NY, USA
| | - Aditi Rao
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA.,New York University Grossman School of Medicine, New York, NY, USA
| | - Catherine Kim
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA.,New York University Grossman School of Medicine, New York, NY, USA
| | - Colton Casto
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Lora Fanda
- New York University Grossman School of Medicine, New York, NY, USA
| | - Werner Doyle
- New York University Grossman School of Medicine, New York, NY, USA
| | - Daniel Friedman
- New York University Grossman School of Medicine, New York, NY, USA
| | - Patricia Dugan
- New York University Grossman School of Medicine, New York, NY, USA
| | - Lucia Melloni
- Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| | - Roi Reichart
- Faculty of Industrial Engineering and Management, Technion, Israel Institute of Technology, Haifa, Israel
| | - Sasha Devore
- New York University Grossman School of Medicine, New York, NY, USA
| | - Adeen Flinker
- New York University Grossman School of Medicine, New York, NY, USA
| | - Liat Hasenfratz
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Omer Levy
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | | | - Michael Brenner
- Google Research, Mountain View, CA, USA.,School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA
| | | | - Kenneth A Norman
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Orrin Devinsky
- New York University Grossman School of Medicine, New York, NY, USA
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA.,Google Research, Mountain View, CA, USA
| |
Collapse
|
6
|
Caucheteux C, King JR. Brains and algorithms partially converge in natural language processing. Commun Biol 2022; 5:134. [PMID: 35173264 PMCID: PMC8850612 DOI: 10.1038/s42003-022-03036-1] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 12/29/2021] [Indexed: 11/29/2022] Open
Abstract
Deep learning algorithms trained to predict masked words from large amount of text have recently been shown to generate activations similar to those of the human brain. However, what drives this similarity remains currently unknown. Here, we systematically compare a variety of deep language models to identify the computational principles that lead them to generate brain-like representations of sentences. Specifically, we analyze the brain responses to 400 isolated sentences in a large cohort of 102 subjects, each recorded for two hours with functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). We then test where and when each of these algorithms maps onto the brain responses. Finally, we estimate how the architecture, training, and performance of these models independently account for the generation of brain-like representations. Our analyses reveal two main findings. First, the similarity between the algorithms and the brain primarily depends on their ability to predict words from context. Second, this similarity reveals the rise and maintenance of perceptual, lexical, and compositional representations within each cortical region. Overall, this study shows that modern language algorithms partially converge towards brain-like solutions, and thus delineates a promising path to unravel the foundations of natural language processing. Charlotte Caucheteux and Jean-Rémi King examine the ability of transformer neural networks trained on word prediction tasks to fit representations in the human brain measured with fMRI and MEG. Their results provide further insight into the workings of transformer language models and their relevance to brain responses.
Collapse
Affiliation(s)
- Charlotte Caucheteux
- Facebook AI Research, Paris, France. .,Université Paris-Saclay, Inria, CEA, Palaiseau, France.
| | - Jean-Rémi King
- Facebook AI Research, Paris, France. .,École normale supérieure, PSL University, CNRS, Paris, France.
| |
Collapse
|
7
|
Kern P, Heilbron M, de Lange FP, Spaak E. Cortical activity during naturalistic music listening reflects short-range predictions based on long-term experience. eLife 2022; 11:80935. [PMID: 36562532 PMCID: PMC9836393 DOI: 10.7554/elife.80935] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 12/22/2022] [Indexed: 12/24/2022] Open
Abstract
Expectations shape our experience of music. However, the internal model upon which listeners form melodic expectations is still debated. Do expectations stem from Gestalt-like principles or statistical learning? If the latter, does long-term experience play an important role, or are short-term regularities sufficient? And finally, what length of context informs contextual expectations? To answer these questions, we presented human listeners with diverse naturalistic compositions from Western classical music, while recording neural activity using MEG. We quantified note-level melodic surprise and uncertainty using various computational models of music, including a state-of-the-art transformer neural network. A time-resolved regression analysis revealed that neural activity over fronto-temporal sensors tracked melodic surprise particularly around 200ms and 300-500ms after note onset. This neural surprise response was dissociated from sensory-acoustic and adaptation effects. Neural surprise was best predicted by computational models that incorporated long-term statistical learning-rather than by simple, Gestalt-like principles. Yet, intriguingly, the surprise reflected primarily short-range musical contexts of less than ten notes. We present a full replication of our novel MEG results in an openly available EEG dataset. Together, these results elucidate the internal model that shapes melodic predictions during naturalistic music listening.
Collapse
Affiliation(s)
- Pius Kern
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Micha Heilbron
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Floris P de Lange
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| | - Eelke Spaak
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and BehaviourNijmegenNetherlands
| |
Collapse
|
8
|
Meyer L, Lakatos P, He Y. Language Dysfunction in Schizophrenia: Assessing Neural Tracking to Characterize the Underlying Disorder(s)? Front Neurosci 2021; 15:640502. [PMID: 33692672 PMCID: PMC7937925 DOI: 10.3389/fnins.2021.640502] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 02/03/2021] [Indexed: 12/19/2022] Open
Abstract
Deficits in language production and comprehension are characteristic of schizophrenia. To date, it remains unclear whether these deficits arise from dysfunctional linguistic knowledge, or dysfunctional predictions derived from the linguistic context. Alternatively, the deficits could be a result of dysfunctional neural tracking of auditory information resulting in decreased auditory information fidelity and even distorted information. Here, we discuss possible ways for clinical neuroscientists to employ neural tracking methodology to independently characterize deficiencies on the auditory-sensory and abstract linguistic levels. This might lead to a mechanistic understanding of the deficits underlying language related disorder(s) in schizophrenia. We propose to combine naturalistic stimulation, measures of speech-brain synchronization, and computational modeling of abstract linguistic knowledge and predictions. These independent but likely interacting assessments may be exploited for an objective and differential diagnosis of schizophrenia, as well as a better understanding of the disorder on the functional level-illustrating the potential of neural tracking methodology as translational tool in a range of psychotic populations.
Collapse
Affiliation(s)
- Lars Meyer
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- Clinic for Phoniatrics and Pedaudiology, University Hospital Münster, Münster, Germany
| | - Peter Lakatos
- Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute, Orangeburg, NY, United States
| | - Yifei He
- Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Marburg, Germany
| |
Collapse
|