1
|
Nour Eddine S, Brothers T, Wang L, Spratling M, Kuperberg GR. A predictive coding model of the N400. Cognition 2024; 246:105755. [PMID: 38428168 PMCID: PMC10984641 DOI: 10.1016/j.cognition.2024.105755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 02/14/2024] [Accepted: 02/19/2024] [Indexed: 03/03/2024]
Abstract
The N400 event-related component has been widely used to investigate the neural mechanisms underlying real-time language comprehension. However, despite decades of research, there is still no unifying theory that can explain both its temporal dynamics and functional properties. In this work, we show that predictive coding - a biologically plausible algorithm for approximating Bayesian inference - offers a promising framework for characterizing the N400. Using an implemented predictive coding computational model, we demonstrate how the N400 can be formalized as the lexico-semantic prediction error produced as the brain infers meaning from the linguistic form of incoming words. We show that the magnitude of lexico-semantic prediction error mirrors the functional sensitivity of the N400 to various lexical variables, priming, contextual effects, as well as their higher-order interactions. We further show that the dynamics of the predictive coding algorithm provides a natural explanation for the temporal dynamics of the N400, and a biologically plausible link to neural activity. Together, these findings directly situate the N400 within the broader context of predictive coding research. More generally, they raise the possibility that the brain may use the same computational mechanism for inference across linguistic and non-linguistic domains.
Collapse
Affiliation(s)
- Samer Nour Eddine
- Department of Psychology and Center for Cognitive Science, Tufts University, United States of America.
| | - Trevor Brothers
- Department of Psychology and Center for Cognitive Science, Tufts University, United States of America; Department of Psychology, North Carolina A&T, United States of America
| | - Lin Wang
- Department of Psychology and Center for Cognitive Science, Tufts University, United States of America; Department of Psychiatry and the Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, United States of America
| | | | - Gina R Kuperberg
- Department of Psychology and Center for Cognitive Science, Tufts University, United States of America; Department of Psychiatry and the Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, United States of America
| |
Collapse
|
2
|
Kauf C, Tuckute G, Levy R, Andreas J, Fedorenko E. Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:7-42. [PMID: 38645614 PMCID: PMC11025651 DOI: 10.1162/nol_a_00116] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/11/2023] [Indexed: 04/23/2024]
Abstract
Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences' word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence's syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN's embedding space and decrease the ANN's ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result-that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones-aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.
Collapse
Affiliation(s)
- Carina Kauf
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Roger Levy
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jacob Andreas
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
3
|
Michaelov JA, Bardolph MD, Van Petten CK, Bergen BK, Coulson S. Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:107-135. [PMID: 38645623 PMCID: PMC11025652 DOI: 10.1162/nol_a_00105] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 03/24/2023] [Indexed: 04/23/2024]
Abstract
Theoretical accounts of the N400 are divided as to whether the amplitude of the N400 response to a stimulus reflects the extent to which the stimulus was predicted, the extent to which the stimulus is semantically similar to its preceding context, or both. We use state-of-the-art machine learning tools to investigate which of these three accounts is best supported by the evidence. GPT-3, a neural language model trained to compute the conditional probability of any word based on the words that precede it, was used to operationalize contextual predictability. In particular, we used an information-theoretic construct known as surprisal (the negative logarithm of the conditional probability). Contextual semantic similarity was operationalized by using two high-quality co-occurrence-derived vector-based meaning representations for words: GloVe and fastText. The cosine between the vector representation of the sentence frame and final word was used to derive contextual cosine similarity estimates. A series of regression models were constructed, where these variables, along with cloze probability and plausibility ratings, were used to predict single trial N400 amplitudes recorded from healthy adults as they read sentences whose final word varied in its predictability, plausibility, and semantic relationship to the likeliest sentence completion. Statistical model comparison indicated GPT-3 surprisal provided the best account of N400 amplitude and suggested that apparently disparate N400 effects of expectancy, plausibility, and contextual semantic similarity can be reduced to variation in the predictability of words. The results are argued to support predictive coding in the human language network.
Collapse
Affiliation(s)
- James A. Michaelov
- Department of Cognitive Science, University of California, San Diego, La Jolla, CA, USA
| | - Megan D. Bardolph
- Department of Cognitive Science, University of California, San Diego, La Jolla, CA, USA
| | - Cyma K. Van Petten
- Department of Psychology, Binghamton University, State University of New York, Binghamton, NY, USA
| | - Benjamin K. Bergen
- Department of Cognitive Science, University of California, San Diego, La Jolla, CA, USA
| | - Seana Coulson
- Department of Cognitive Science, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
4
|
Huber E, Sauppe S, Isasi-Isasmendi A, Bornkessel-Schlesewsky I, Merlo P, Bickel B. Surprisal From Language Models Can Predict ERPs in Processing Predicate-Argument Structures Only if Enriched by an Agent Preference Principle. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:167-200. [PMID: 38645615 PMCID: PMC11025647 DOI: 10.1162/nol_a_00121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 08/30/2023] [Indexed: 04/23/2024]
Abstract
Language models based on artificial neural networks increasingly capture key aspects of how humans process sentences. Most notably, model-based surprisals predict event-related potentials such as N400 amplitudes during parsing. Assuming that these models represent realistic estimates of human linguistic experience, their success in modeling language processing raises the possibility that the human processing system relies on no other principles than the general architecture of language models and on sufficient linguistic input. Here, we test this hypothesis on N400 effects observed during the processing of verb-final sentences in German, Basque, and Hindi. By stacking Bayesian generalised additive models, we show that, in each language, N400 amplitudes and topographies in the region of the verb are best predicted when model-based surprisals are complemented by an Agent Preference principle that transiently interprets initial role-ambiguous noun phrases as agents, leading to reanalysis when this interpretation fails. Our findings demonstrate the need for this principle independently of usage frequencies and structural differences between languages. The principle has an unequal force, however. Compared to surprisal, its effect is weakest in German, stronger in Hindi, and still stronger in Basque. This gradient is correlated with the extent to which grammars allow unmarked NPs to be patients, a structural feature that boosts reanalysis effects. We conclude that language models gain more neurobiological plausibility by incorporating an Agent Preference. Conversely, theories of human processing profit from incorporating surprisal estimates in addition to principles like the Agent Preference, which arguably have distinct evolutionary roots.
Collapse
Affiliation(s)
- Eva Huber
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
| | - Sebastian Sauppe
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
- Department of Psychology, University of Zurich, Zurich, Switzerland
| | - Arrate Isasi-Isasmendi
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
| | - Ina Bornkessel-Schlesewsky
- Cognitive Neuroscience Laboratory, Australian Research Centre for Interactive and Virtual Environments, University of South Australia, Adelaide, Australia
| | - Paola Merlo
- Department of Linguistics, University of Geneva, Geneva, Switzerland
- University Center for Computer Science, University of Geneva, Geneva, Switzerland
| | - Balthasar Bickel
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
| |
Collapse
|
5
|
Shain C, Schuler W. A Deep Learning Approach to Analyzing Continuous-Time Cognitive Processes. Open Mind (Camb) 2024; 8:235-264. [PMID: 38528907 PMCID: PMC10962694 DOI: 10.1162/opmi_a_00126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 01/31/2024] [Indexed: 03/27/2024] Open
Abstract
The dynamics of the mind are complex. Mental processes unfold continuously in time and may be sensitive to a myriad of interacting variables, especially in naturalistic settings. But statistical models used to analyze data from cognitive experiments often assume simplistic dynamics. Recent advances in deep learning have yielded startling improvements to simulations of dynamical cognitive processes, including speech comprehension, visual perception, and goal-directed behavior. But due to poor interpretability, deep learning is generally not used for scientific analysis. Here, we bridge this gap by showing that deep learning can be used, not just to imitate, but to analyze complex processes, providing flexible function approximation while preserving interpretability. To do so, we define and implement a nonlinear regression model in which the probability distribution over the response variable is parameterized by convolving the history of predictors over time using an artificial neural network, thereby allowing the shape and continuous temporal extent of effects to be inferred directly from time series data. Our approach relaxes standard simplifying assumptions (e.g., linearity, stationarity, and homoscedasticity) that are implausible for many cognitive processes and may critically affect the interpretation of data. We demonstrate substantial improvements on behavioral and neuroimaging data from the language processing domain, and we show that our model enables discovery of novel patterns in exploratory analyses, controls for diverse confounds in confirmatory analyses, and opens up research questions in cognitive (neuro)science that are otherwise hard to study.
Collapse
Affiliation(s)
- Cory Shain
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - William Schuler
- Department of Linguistics, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
6
|
Shain C. Word Frequency and Predictability Dissociate in Naturalistic Reading. Open Mind (Camb) 2024; 8:177-201. [PMID: 38476662 PMCID: PMC10932590 DOI: 10.1162/opmi_a_00119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 01/10/2024] [Indexed: 03/14/2024] Open
Abstract
Many studies of human language processing have shown that readers slow down at less frequent or less predictable words, but there is debate about whether frequency and predictability effects reflect separable cognitive phenomena: are cognitive operations that retrieve words from the mental lexicon based on sensory cues distinct from those that predict upcoming words based on context? Previous evidence for a frequency-predictability dissociation is mostly based on small samples (both for estimating predictability and frequency and for testing their effects on human behavior), artificial materials (e.g., isolated constructed sentences), and implausible modeling assumptions (discrete-time dynamics, linearity, additivity, constant variance, and invariance over time), which raises the question: do frequency and predictability dissociate in ordinary language comprehension, such as story reading? This study leverages recent progress in open data and computational modeling to address this question at scale. A large collection of naturalistic reading data (six datasets, >2.2 M datapoints) is analyzed using nonlinear continuous-time regression, and frequency and predictability are estimated using statistical language models trained on more data than is currently typical in psycholinguistics. Despite the use of naturalistic data, strong predictability estimates, and flexible regression models, results converge with earlier experimental studies in supporting dissociable and additive frequency and predictability effects.
Collapse
Affiliation(s)
- Cory Shain
- Department of Brain & Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
7
|
Shain C, Meister C, Pimentel T, Cotterell R, Levy R. Large-scale evidence for logarithmic effects of word predictability on reading time. Proc Natl Acad Sci U S A 2024; 121:e2307876121. [PMID: 38422017 DOI: 10.1073/pnas.2307876121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/11/2023] [Indexed: 03/02/2024] Open
Abstract
During real-time language comprehension, our minds rapidly decode complex meanings from sequences of words. The difficulty of doing so is known to be related to words' contextual predictability, but what cognitive processes do these predictability effects reflect? In one view, predictability effects reflect facilitation due to anticipatory processing of words that are predictable from context. This view predicts a linear effect of predictability on processing demand. In another view, predictability effects reflect the costs of probabilistic inference over sentence interpretations. This view predicts either a logarithmic or a superlogarithmic effect of predictability on processing demand, depending on whether it assumes pressures toward a uniform distribution of information over time. The empirical record is currently mixed. Here, we revisit this question at scale: We analyze six reading datasets, estimate next-word probabilities with diverse statistical language models, and model reading times using recent advances in nonlinear regression. Results support a logarithmic effect of word predictability on processing difficulty, which favors probabilistic inference as a key component of human language processing.
Collapse
Affiliation(s)
- Cory Shain
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Clara Meister
- Department of Computer Science, Institute for Machine Learning, ETH Zürich, Zürich 8092, Schweiz
| | - Tiago Pimentel
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, United Kingdom
| | - Ryan Cotterell
- Department of Computer Science, Institute for Machine Learning, ETH Zürich, Zürich 8092, Schweiz
| | - Roger Levy
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
| |
Collapse
|
8
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. Nat Hum Behav 2024; 8:544-561. [PMID: 38172630 DOI: 10.1038/s41562-023-01783-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 11/10/2023] [Indexed: 01/05/2024]
Abstract
Transformer models such as GPT generate human-like language and are predictive of human brain responses to language. Here, using functional-MRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of the brain response associated with each sentence. We then use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress the activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also non-invasively control neural activity in higher-level cortical areas, such as the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- MIT-IBM Watson AI Lab, Cambridge, MA, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA, USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
9
|
Angulo-Chavira AQ, Castellón-Flores AM, Ciria A, Arias-Trejo N. Sentence-final completion norms for 2925 Mexican Spanish sentence contexts. Behav Res Methods 2024; 56:2486-2498. [PMID: 37407787 PMCID: PMC10991019 DOI: 10.3758/s13428-023-02160-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2023] [Indexed: 07/07/2023]
Abstract
Sentence-final completion tasks serve as valuable tools in studying language processing and the associated predictive mechanisms. There are several established sentence-completion norms for languages like English, Portuguese, French, and Spanish, each tailored to the language it was designed for and evaluated in. Yet, cultural variations among native speakers of the same language complicate the claim of a universal application of these norms. In this study, we developed a corpus of 2925 sentence-completion norms specifically for Mexican Spanish. This corpus is distinctive for several reasons: Firstly, it is the most comprehensive set of sentence-completion norms for Mexican Spanish to date. Secondly, it offers a substantial range of experimental stimuli with considerable variability in terms of the predictability of word sentence completion (cloze probability/surprisal) and the level of uncertainty inherent in the sentence context (entropy). Thirdly, the syntactic complexity of the sentences in the corpus is varied, as are the characteristics of the final word nouns (including aspects of concreteness/abstractness, length, and frequency). This paper details the generation of the sentence contexts, explains the methodology employed for data collection from a total of 1470 participants, and outlines the approach to data analysis for the establishment of sentence-completion norms. These norms provide a significant contribution to fields such as linguistics, cognitive science, and machine learning, among others, by enhancing our understanding of language, predictive mechanisms, knowledge representation, and context representation. The collected data is accessible through the Open Science Framework (OSF) at the following link: https://osf.io/js359/?view_only=bb1b328d37d643df903ed69bb2405ac0 .
Collapse
Affiliation(s)
| | | | - Alejandra Ciria
- Facultad de Psicología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Natalia Arias-Trejo
- Facultad de Psicología, Universidad Nacional Autónoma de México, Mexico City, Mexico.
| |
Collapse
|
10
|
Karimi H, Weber P, Zinn J. Information entropy facilitates (not impedes) lexical processing during language comprehension. Psychon Bull Rev 2024:10.3758/s13423-024-02463-x. [PMID: 38361106 DOI: 10.3758/s13423-024-02463-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/15/2024] [Indexed: 02/17/2024]
Abstract
It is well known that contextual predictability facilitates word identification, but it is less clear whether the uncertainty associated with the current context (i.e., its lexical entropy) influences sentence processing. On the one hand, high entropy contexts may lead to interference due to greater number of lexical competitors. On the other hand, predicting multiple lexical competitors may facilitate processing through the preactivation of shared semantic features. In this study, we examined whether entropy measured at the trial level (i.e., for each participant, for each item) corresponds to facilitatory or inhibitory effects. Trial-level entropy captures each individual's knowledge about specific contexts and is therefore a more valid and sensitive measure of entropy (relative to the commonly employed item-level entropy). Participants (N = 112) completed two experimental sessions (with counterbalanced orders) that were separated by a 3- to 14-day interval. In one session, they produced up to 10 completions for sentence fragments (N = 647). In another session, they read the same sentences including a target word (whose entropy value was calculated based on the produced completions) while reading times were measured. We observed a facilitatory (not inhibitory) effect of trial-level entropy on lexical processing over and above item-level measures of lexical predictability (including cloze probability, surprisal, and semantic constraint). Extra analyses revealed that greater semantic overlap between the target and the produced responses facilitated target processing. Thus, the results lend support to theories of lexical prediction maintaining that prediction involves broad activation of semantic features rather than activation of full lexical forms.
Collapse
Affiliation(s)
- Hossein Karimi
- Department of Psychology, Mississippi State University, 215 Magruder Hall, Mississippi State, MS, USA.
| | - Pete Weber
- Department of Psychology, Mississippi State University, 215 Magruder Hall, Mississippi State, MS, USA
| | - Jaden Zinn
- Department of Psychology, Mississippi State University, 215 Magruder Hall, Mississippi State, MS, USA
| |
Collapse
|
11
|
Troyer M, Kutas M, Batterink L, McRae K. Nuances of knowing: Brain potentials reveal implicit effects of domain knowledge on word processing in the absence of sentence-level knowledge. Psychophysiology 2024; 61:e14422. [PMID: 37638492 DOI: 10.1111/psyp.14422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 06/22/2023] [Accepted: 07/29/2023] [Indexed: 08/29/2023]
Abstract
In previous work investigating the relationship between domain knowledge (of the fictional world of Harry Potter) and sentence comprehension, domain knowledge had a greater impact on electrical brain potentials to words which completed sentences about fictional "facts" participants reported they did not know compared to facts they did. This suggests that individuals use domain knowledge continuously to activate relevant/related concepts as they process sentences, even with only partial knowledge. As that study relied on subjective reports, it may have resulted in response bias related to an individual's overall domain knowledge. In the present study, we therefore asked participants with varying degrees of domain knowledge to complete sentences describing fictional "facts" as an objective measure of sentence-level knowledge. We then recorded EEG as the same individuals (re-)read the same sentences, including their appropriate final words, and sorted these according to their objective knowledge scores. Replicating and extending Troyer et al., domain knowledge immediately facilitated access to meaning for unknown words; greater domain knowledge was associated with reduced N400 amplitudes for unknown words. These findings constitute novel evidence for graded preactivation of conceptual knowledge (e.g., at the level of semantic features and/or relations) in the absence of lexical prediction. Knowledge also influenced post-N400 memory/integration processes for these same unknown words; greater domain knowledge was associated with enhanced late positive components (LPCs), suggesting that deeper encoding during language processing may be engendered when knowledgeable individuals encounter an apparent gap in their knowledge.
Collapse
Affiliation(s)
- Melissa Troyer
- Department of Psychology, University of Nevada, Las Vegas, Nevada, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA
- Department of Psychology, Brain & Mind Institute, University of Western Ontario, London, Ontario, Canada
| | - Marta Kutas
- Department of Cognitive Science, University of California, San Diego, California, USA
- Department of Neuroscience, University of California, San Diego, California, USA
| | - Laura Batterink
- Department of Psychology, Brain & Mind Institute, University of Western Ontario, London, Ontario, Canada
| | - Ken McRae
- Department of Psychology, Brain & Mind Institute, University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
12
|
Huizeling E, Alday PM, Peeters D, Hagoort P. Combining EEG and 3D-eye-tracking to study the prediction of upcoming speech in naturalistic virtual environments: A proof of principle. Neuropsychologia 2023; 191:108730. [PMID: 37939871 DOI: 10.1016/j.neuropsychologia.2023.108730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 09/15/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023]
Abstract
EEG and eye-tracking provide complementary information when investigating language comprehension. Evidence that speech processing may be facilitated by speech prediction comes from the observation that a listener's eye gaze moves towards a referent before it is mentioned if the remainder of the spoken sentence is predictable. However, changes to the trajectory of anticipatory fixations could result from a change in prediction or an attention shift. Conversely, N400 amplitudes and concurrent spectral power provide information about the ease of word processing the moment the word is perceived. In a proof-of-principle investigation, we combined EEG and eye-tracking to study linguistic prediction in naturalistic, virtual environments. We observed increased processing, reflected in theta band power, either during verb processing - when the verb was predictive of the noun - or during noun processing - when the verb was not predictive of the noun. Alpha power was higher in response to the predictive verb and unpredictable nouns. We replicated typical effects of noun congruence but not predictability on the N400 in response to the noun. Thus, the rich visual context that accompanied speech in virtual reality influenced language processing compared to previous reports, where the visual context may have facilitated processing of unpredictable nouns. Finally, anticipatory fixations were predictive of spectral power during noun processing and the length of time fixating the target could be predicted by spectral power at verb onset, conditional on the object having been fixated. Overall, we show that combining EEG and eye-tracking provides a promising new method to answer novel research questions about the prediction of upcoming linguistic input, for example, regarding the role of extralinguistic cues in prediction during language comprehension.
Collapse
Affiliation(s)
- Eleanor Huizeling
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands.
| | | | - David Peeters
- Department of Communication and Cognition, TiCC, Tilburg University, Tilburg, the Netherlands
| | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands; Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, the Netherlands
| |
Collapse
|
13
|
Brothers T, Morgan E, Yacovone A, Kuperberg G. Multiple predictions during language comprehension: Friends, foes, or indifferent companions? Cognition 2023; 241:105602. [PMID: 37716311 PMCID: PMC10783882 DOI: 10.1016/j.cognition.2023.105602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 08/14/2023] [Indexed: 09/18/2023]
Abstract
To comprehend language, we continually use prior context to pre-activate expected upcoming information, resulting in facilitated processing of incoming words that confirm these predictions. But what are the consequences of disconfirming prior predictions? To address this question, most previous studies have examined unpredictable words appearing in contexts that constrain strongly for a single continuation. However, during natural language processing, it is far more common to encounter contexts that constrain for multiple potential continuations, each with some probability. Here, we ask whether and how pre-activating both higher and lower probability alternatives influences the processing of the lower probability incoming word. One possibility is that, similar to language production, there is continuous pressure to select the higher-probability pre-activated alternative through competitive inhibition. During comprehension, this would result in relative costs in processing the lower probability target. A second possibility is that if the two pre-activated alternatives share semantic features, they mutually enhance each other's pre-activation. This would result in greater facilitation in processing the lower probability target. To distinguish between these accounts, we recorded ERPs as participants read three-sentence scenarios that constrained either for a single word or for two potential continuations - a higher probability expected candidate and a lower probability second-best candidate. We found no evidence that competitive pre-activation between the expected and second-best candidates resulted in costs in processing the second-best target, either during lexico-semantic processing (indexed by the N400) or at later stages of processing (indexed by a later frontal positivity). Instead, we found only benefits of pre-activating multiple alternatives, with evidence of enhanced graded facilitation on lower-probability targets that were semantically related to a higher-probability pre-activated alternative. These findings are consistent with a previous eye-tracking study by Luke and Christianson (2016, Cogn Psychol) using corpus-based materials. They have significant theoretical implications for models of predictive language processing, indicating that routine graded prediction in language comprehension does not operate through the same competitive mechanisms that are engaged in language production. Instead, our results align more closely with hierarchical probabilistic accounts of language comprehension, such as predictive coding.
Collapse
Affiliation(s)
- Trevor Brothers
- Department of Psychology, North Carolina A&T, United States of America; Department of Psychology, Tufts University, United States of America
| | - Emily Morgan
- Department of Linguistics, University of California, Davis, United States of America
| | - Anthony Yacovone
- Department of Psychology, Tufts University, United States of America; Department of Psychiatry and the Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, United States of America
| | - Gina Kuperberg
- Department of Psychology, Tufts University, United States of America; Department of Psychiatry and the Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, United States of America.
| |
Collapse
|
14
|
Ryskin R, Nieuwland MS. Prediction during language comprehension: what is next? Trends Cogn Sci 2023; 27:1032-1052. [PMID: 37704456 DOI: 10.1016/j.tics.2023.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 09/15/2023]
Abstract
Prediction is often regarded as an integral aspect of incremental language comprehension, but little is known about the cognitive architectures and mechanisms that support it. We review studies showing that listeners and readers use all manner of contextual information to generate multifaceted predictions about upcoming input. The nature of these predictions may vary between individuals owing to differences in language experience, among other factors. We then turn to unresolved questions which may guide the search for the underlying mechanisms. (i) Is prediction essential to language processing or an optional strategy? (ii) Are predictions generated from within the language system or by domain-general processes? (iii) What is the relationship between prediction and memory? (iv) Does prediction in comprehension require simulation via the production system? We discuss promising directions for making progress in answering these questions and for developing a mechanistic understanding of prediction in language.
Collapse
Affiliation(s)
- Rachel Ryskin
- Department of Cognitive and Information Sciences, University of California Merced, 5200 Lake Road, Merced, CA 95343, USA.
| | - Mante S Nieuwland
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands; Donders Institute for Brain, Cognition, and Behaviour, Nijmegen, The Netherlands
| |
Collapse
|
15
|
Michaelov JA, Bergen BK. Ignoring the alternatives: The N400 is sensitive to stimulus preactivation alone. Cortex 2023; 168:82-101. [PMID: 37678069 DOI: 10.1016/j.cortex.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 05/12/2023] [Accepted: 08/03/2023] [Indexed: 09/09/2023]
Abstract
The N400 component of the event-related brain potential is a neural signal of processing difficulty. In the language domain, it is widely believed to be sensitive to the degree to which a given word or its semantic features have been preactivated in the brain based on the preceding context. However, it has also been shown that the brain often preactivates many words in parallel. It is currently unknown whether the N400 is also affected by the preactivations of alternative words other than the stimulus that is actually presented. This leaves a weak link in the derivation chain-how can we use the N400 to understand the mechanisms of preactivation if we do not know what it indexes? This study directly addresses this gap. We estimate the extent to which all words in a lexicon are preactivated in a given context using the predictions of contemporary large language models. We then directly compare two competing possibilities: that the amplitude of the N400 is sensitive only to the extent to which the stimulus is preactivated, and that it is also sensitive to the preactivation states of the alternatives. We find evidence of the former. This result allows for better grounded inferences about the mechanisms underlying the N400, lexical preactivation in the brain, and language processing more generally.
Collapse
Affiliation(s)
- James A Michaelov
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, USA.
| | - Benjamin K Bergen
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
16
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.16.537080. [PMID: 37090673 PMCID: PMC10120732 DOI: 10.1101/2023.04.16.537080] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Transformer models such as GPT generate human-like language and are highly predictive of human brain responses to language. Here, using fMRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of brain response associated with each sentence. Then, we use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also noninvasively control neural activity in higher-level cortical areas, like the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- MIT-IBM Watson AI Lab, Cambridge, MA 02142, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455 USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- The Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138 USA
| |
Collapse
|
17
|
de Varda AG, Marelli M, Amenta S. Cloze probability, predictability ratings, and computational estimates for 205 English sentences, aligned with existing EEG and reading time data. Behav Res Methods 2023:10.3758/s13428-023-02261-8. [PMID: 37880511 DOI: 10.3758/s13428-023-02261-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2023] [Indexed: 10/27/2023]
Abstract
We release a database of cloze probability values, predictability ratings, and computational estimates for a sample of 205 English sentences (1726 words), aligned with previously released word-by-word reading time data (both self-paced reading and eye-movement records; Frank et al., Behavior Research Methods, 45(4), 1182-1190. 2013) and EEG responses (Frank et al., Brain and Language, 140, 1-11. 2015). Our analyses show that predictability ratings are the best predictors of the EEG signal (N400, P600, LAN) self-paced reading times, and eye movement patterns, when spillover effects are taken into account. The computational estimates are particularly effective at explaining variance in the eye-tracking data without spillover. Cloze probability estimates have decent overall psychometric accuracy and are the best predictors of early fixation patterns (first fixation duration). Our results indicate that the choice of the best measurement of word predictability in context critically depends on the processing index being considered.
Collapse
Affiliation(s)
- Andrea Gregor de Varda
- Department of Psychology, University of Milano - Bicocca, Piazza dell'Ateneo Nuovo 1, Milano, MI 20126, Italy.
| | - Marco Marelli
- Department of Psychology, University of Milano - Bicocca, Piazza dell'Ateneo Nuovo 1, Milano, MI 20126, Italy
| | - Simona Amenta
- Department of Psychology, University of Milano - Bicocca, Piazza dell'Ateneo Nuovo 1, Milano, MI 20126, Italy
| |
Collapse
|
18
|
Hoover JL, Sonderegger M, Piantadosi ST, O’Donnell TJ. The Plausibility of Sampling as an Algorithmic Theory of Sentence Processing. Open Mind (Camb) 2023; 7:350-391. [PMID: 37637302 PMCID: PMC10449406 DOI: 10.1162/opmi_a_00086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/21/2023] [Indexed: 08/29/2023] Open
Abstract
Words that are more surprising given context take longer to process. However, no incremental parsing algorithm has been shown to directly predict this phenomenon. In this work, we focus on a class of algorithms whose runtime does naturally scale in surprisal-those that involve repeatedly sampling from the prior. Our first contribution is to show that simple examples of such algorithms predict runtime to increase superlinearly with surprisal, and also predict variance in runtime to increase. These two predictions stand in contrast with literature on surprisal theory (Hale, 2001; Levy, 2008a) which assumes that the expected processing cost increases linearly with surprisal, and makes no prediction about variance. In the second part of this paper, we conduct an empirical study of the relationship between surprisal and reading time, using a collection of modern language models to estimate surprisal. We find that with better language models, reading time increases superlinearly in surprisal, and also that variance increases. These results are consistent with the predictions of sampling-based algorithms.
Collapse
Affiliation(s)
- Jacob Louis Hoover
- McGill University, Montréal, Canada
- Mila Québec AI Institute, Montréal, Canada
| | | | | | - Timothy J. O’Donnell
- McGill University, Montréal, Canada
- Mila Québec AI Institute, Montréal, Canada
- Canada CIFAR AI Chair, Mila
| |
Collapse
|
19
|
Kauf C, Tuckute G, Levy R, Andreas J, Fedorenko E. Lexical semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539646. [PMID: 37205405 PMCID: PMC10187317 DOI: 10.1101/2023.05.05.539646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI dataset of responses to n=627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we i) perturbed sentences' word order, ii) removed different subsets of words, or iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical semantic content of the sentence (largely carried by content words) rather than the sentence's syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN's embedding space and decrease the ANN's ability to predict upcoming tokens in those stimuli. Further, results are robust to whether the mapping model is trained on intact or perturbed stimuli, and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result-that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones-aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.
Collapse
Affiliation(s)
- Carina Kauf
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
- McGovern Institute for Brain Research, Massachusetts Institute of Technology
| | - Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
- McGovern Institute for Brain Research, Massachusetts Institute of Technology
| | - Roger Levy
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Jacob Andreas
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
- McGovern Institute for Brain Research, Massachusetts Institute of Technology
- Program in Speech and Hearing Bioscience and Technology, Harvard University
| |
Collapse
|
20
|
Burleson AM, Souza PE. Cognitive and linguistic abilities and perceptual restoration of missing speech: Evidence from online assessment. Front Psychol 2022; 13:1059192. [PMID: 36571056 PMCID: PMC9773209 DOI: 10.3389/fpsyg.2022.1059192] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 11/23/2022] [Indexed: 12/13/2022] Open
Abstract
When speech is clear, speech understanding is a relatively simple and automatic process. However, when the acoustic signal is degraded, top-down cognitive and linguistic abilities, such as working memory capacity, lexical knowledge (i.e., vocabulary), inhibitory control, and processing speed can often support speech understanding. This study examined whether listeners aged 22-63 (mean age 42 years) with better cognitive and linguistic abilities would be better able to perceptually restore missing speech information than those with poorer scores. Additionally, the role of context and everyday speech was investigated using high-context, low-context, and realistic speech corpi to explore these effects. Sixty-three adult participants with self-reported normal hearing completed a short cognitive and linguistic battery before listening to sentences interrupted by silent gaps or noise bursts. Results indicated that working memory was the most reliable predictor of perceptual restoration ability, followed by lexical knowledge, and inhibitory control and processing speed. Generally, silent gap conditions were related to and predicted by a broader range of cognitive abilities, whereas noise burst conditions were related to working memory capacity and inhibitory control. These findings suggest that higher-order cognitive and linguistic abilities facilitate the top-down restoration of missing speech information and contribute to individual variability in perceptual restoration.
Collapse
|
21
|
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
|
22
|
Heilbron M, Armeni K, Schoffelen JM, Hagoort P, de Lange FP. A hierarchy of linguistic predictions during natural language comprehension. Proc Natl Acad Sci U S A 2022; 119:e2201968119. [PMID: 35921434 DOI: 10.1101/2020.12.03.410399] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
Affiliation(s)
- Micha Heilbron
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Kristijan Armeni
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| | | | - Peter Hagoort
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| |
Collapse
|
23
|
Huettig F, Audring J, Jackendoff R. A parallel architecture perspective on pre-activation and prediction in language processing. Cognition 2022; 224:105050. [DOI: 10.1016/j.cognition.2022.105050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 12/15/2021] [Accepted: 01/26/2022] [Indexed: 11/03/2022]
|
24
|
Szewczyk JM, Federmeier KD. Context-based facilitation of semantic access follows both logarithmic and linear functions of stimulus probability. JOURNAL OF MEMORY AND LANGUAGE 2022; 123:104311. [PMID: 36337731 PMCID: PMC9631957 DOI: 10.1016/j.jml.2021.104311] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Stimuli are easier to process when context makes them predictable, but does context-based facilitation arise from preactivation of a limited set of relatively probable upcoming stimuli (with facilitation then linearly related to probability) or, instead, because the system maintains and updates a probability distribution across all items (with facilitation logarithmically related to probability)? We measured the N400, an index of semantic access, to words of varying probability, including unpredictable words. Word predictability was measured using both cloze probabilities and a state-of-the-art machine learning language model (GPT-2). We reanalyzed five datasets (n = 138) to demonstrate and then replicate that context-based facilitation on the N400 is graded, even among unpredictable words. Furthermore, we established that the relationship between word predictability and context-based facilitation combines linear and logarithmic functions. We argue that this composite function reveals properties of the mapping between words and semantic features and how feature- and word-related information is activated on-line.
Collapse
Affiliation(s)
- Jakub M. Szewczyk
- Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
- Corresponding author at: Donders Institute for Brain, Cognition and Behaviour, Radboud University, Heyendaalseweg 135, 6525 AJ Nijmegen, the Netherlands. (J.M. Szewczyk)
| | - Kara D. Federmeier
- Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
- Program in Neuroscience, University of Illinois at Urbana-champaign, Champaign, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana–Champaign, Champaign, IL, USA
| |
Collapse
|
25
|
Hofmann MJ, Kleemann MA, Roelke-Wellmann A, Vorstius C, Radach R. Semantic feature activation takes time: longer SOA elicits earlier priming effects during reading. Cogn Process 2022; 23:309-318. [PMID: 35254545 PMCID: PMC9072456 DOI: 10.1007/s10339-022-01084-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 02/03/2022] [Indexed: 11/03/2022]
Abstract
While most previous studies of "semantic" priming confound associative and semantic relations, here we use a simple co-occurrence-based approach to examine "pure" semantic priming, while experimentally controlling for associative relations. We define associative relations by the co-occurrence of words in the sentences of a large text corpus. Contextual-semantic feature overlap, in contrast, is defined by the number of common associates that the prime shares with the target. Then we revisit the spreading activation theory and examine whether a long vs. short time available for semantic feature activation leads to early vs. late viewing time effects on the target words of a sentence reading experiment. We independently manipulate contextual-semantic feature overlap of two primes with one target word in sentences of the form pronoun, verb prime, article, adjective prime and target noun, e. g., "She rides the gray elephant." The results showed that long-SOA (verb-noun) overlap reduces early single and first fixation durations of the target noun, and short-SOA (adjective-noun) overlap reduces late go-past durations. This result pattern can be explained by the spreading activation theory: The semantic features of the prime words need some time to become sufficiently active before they can reliably affect target processing. Therefore, the verb can act on the target noun's early eye-movement measures presented three words later, while the adjective is presented immediately prior to the target-thus a difficult adjective-noun semantic integration leads to a late sentence re-examination of the preceding words.
Collapse
Affiliation(s)
- Markus J Hofmann
- General and Biological Psychology, University of Wuppertal, Max-Horkheimer-Str. 20, 42119, Wuppertal, Germany.
| | - Mareike A Kleemann
- General and Biological Psychology, University of Wuppertal, Max-Horkheimer-Str. 20, 42119, Wuppertal, Germany
| | - André Roelke-Wellmann
- General and Biological Psychology, University of Wuppertal, Max-Horkheimer-Str. 20, 42119, Wuppertal, Germany
| | - Christian Vorstius
- General and Biological Psychology, University of Wuppertal, Max-Horkheimer-Str. 20, 42119, Wuppertal, Germany
| | - Ralph Radach
- General and Biological Psychology, University of Wuppertal, Max-Horkheimer-Str. 20, 42119, Wuppertal, Germany
| |
Collapse
|
26
|
Hofmann MJ, Remus S, Biemann C, Radach R, Kuchinke L. Language Models Explain Word Reading Times Better Than Empirical Predictability. Front Artif Intell 2022; 4:730570. [PMID: 35187472 PMCID: PMC8847793 DOI: 10.3389/frai.2021.730570] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/28/2021] [Indexed: 11/13/2022] Open
Abstract
Though there is a strong consensus that word length and frequency are the most important single-word features determining visual-orthographic access to the mental lexicon, there is less agreement as how to best capture syntactic and semantic factors. The traditional approach in cognitive reading research assumes that word predictability from sentence context is best captured by cloze completion probability (CCP) derived from human performance data. We review recent research suggesting that probabilistic language models provide deeper explanations for syntactic and semantic effects than CCP. Then we compare CCP with three probabilistic language models for predicting word viewing times in an English and a German eye tracking sample: (1) Symbolic n-gram models consolidate syntactic and semantic short-range relations by computing the probability of a word to occur, given two preceding words. (2) Topic models rely on subsymbolic representations to capture long-range semantic similarity by word co-occurrence counts in documents. (3) In recurrent neural networks (RNNs), the subsymbolic units are trained to predict the next word, given all preceding words in the sentences. To examine lexical retrieval, these models were used to predict single fixation durations and gaze durations to capture rapidly successful and standard lexical access, and total viewing time to capture late semantic integration. The linear item-level analyses showed greater correlations of all language models with all eye-movement measures than CCP. Then we examined non-linear relations between the different types of predictability and the reading times using generalized additive models. N-gram and RNN probabilities of the present word more consistently predicted reading performance compared with topic models or CCP. For the effects of last-word probability on current-word viewing times, we obtained the best results with n-gram models. Such count-based models seem to best capture short-range access that is still underway when the eyes move on to the subsequent word. The prediction-trained RNN models, in contrast, better predicted early preprocessing of the next word. In sum, our results demonstrate that the different language models account for differential cognitive processes during reading. We discuss these algorithmically concrete blueprints of lexical consolidation as theoretically deep explanations for human reading.
Collapse
Affiliation(s)
- Markus J. Hofmann
- Department of Psychology, University of Wuppertal, Wuppertal, Germany
- *Correspondence: Markus J. Hofmann
| | - Steffen Remus
- Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Chris Biemann
- Department of Informatics, Universität Hamburg, Hamburg, Germany
| | - Ralph Radach
- Department of Psychology, University of Wuppertal, Wuppertal, Germany
| | - Lars Kuchinke
- International Psychoanalytic University, Berlin, Germany
| |
Collapse
|
27
|
Mirault J, Massol S, Grainger J. An algorithm for analyzing cloze test results. METHODS IN PSYCHOLOGY 2021. [DOI: 10.1016/j.metip.2021.100064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
28
|
Hörberg T, Jaeger TF. A Rational Model of Incremental Argument Interpretation: The Comprehension of Swedish Transitive Clauses. Front Psychol 2021; 12:674202. [PMID: 34721134 PMCID: PMC8554243 DOI: 10.3389/fpsyg.2021.674202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 09/21/2021] [Indexed: 11/16/2022] Open
Abstract
A central component of sentence understanding is verb-argument interpretation, determining how the referents in the sentence are related to the events or states expressed by the verb. Previous work has found that comprehenders change their argument interpretations incrementally as the sentence unfolds, based on morphosyntactic (e.g., case, agreement), lexico-semantic (e.g., animacy, verb-argument fit), and discourse cues (e.g., givenness). However, it is still unknown whether these cues have a privileged role in language processing, or whether their effects on argument interpretation originate in implicit expectations based on the joint distribution of these cues with argument assignments experienced in previous language input. We compare the former, linguistic account against the latter, expectation-based account, using data from production and comprehension of transitive clauses in Swedish. Based on a large corpus of Swedish, we develop a rational (Bayesian) model of incremental argument interpretation. This model predicts the processing difficulty experienced at different points in the sentence as a function of the Bayesian surprise associated with changes in expectations over possible argument interpretations. We then test the model against reading times from a self-paced reading experiment on Swedish. We find Bayesian surprise to be a significant predictor of reading times, complementing effects of word surprisal. Bayesian surprise also captures the qualitative effects of morpho-syntactic and lexico-semantic cues. Additional model comparisons find that it—with a single degree of freedom—captures much, if not all, of the effects associated with these cues. This suggests that the effects of form- and meaning-based cues to argument interpretation are mediated through expectation-based processing.
Collapse
Affiliation(s)
- Thomas Hörberg
- Department of Linguistics, Stockholm University, Stockholm, Sweden.,Department of Computational Science and Technology, KTH Royal Institute of Technology, Stockholm, Sweden
| | - T Florian Jaeger
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States.,Department of Computer Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
29
|
Dave S, Brothers T, Hoversten LJ, Traxler MJ, Swaab TY. Cognitive control mediates age-related changes in flexible anticipatory processing during listening comprehension. Brain Res 2021; 1768:147573. [PMID: 34216583 PMCID: PMC8403152 DOI: 10.1016/j.brainres.2021.147573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 05/16/2021] [Accepted: 06/25/2021] [Indexed: 11/21/2022]
Abstract
Effective listening comprehension not only requires processing local linguistic input, but also necessitates incorporating contextual cues available in the global communicative environment. Local sentence processing can be facilitated by pre-activation of likely upcoming input, or predictive processing. Recent evidence suggests that young adults can flexibly adapt local predictive processes based on cues provided by the global communicative environment, such as the reliability of specific speakers. Whether older comprehenders can also flexibly adapt to global contextual cues is currently unknown. Moreover, it is unclear whether the underlying mechanisms supporting local predictive processing differ from those supporting adaptation to global contextual cues. Critically, it is unclear whether these mechanisms change as a function of typical aging. We examined the flexibility of prediction in young and older adults by presenting sentences from speakers whose utterances were typically more or less predictable (i.e., reliable speakers who produced expected words 80% of the time, versus unreliable speakers who produced expected words 20% of the time). For young listeners, global speaker reliability cues modulated neural effects of local predictability on the N400. In contrast, older adults, on average, did not show global modulation of local processing. Importantly, however, cognitive control (i.e., Stroop interference effects) mediated age-related reductions in sensitivity to the reliability of the speaker. Both young and older adults with high cognitive control showed greater N400 effects of predictability during sentences produced by a reliable speaker, suggesting that cognitive control is required to regulate the strength of top-down predictions based on global contextual information. Critically, cognitive control predicted sensitivity to global speaker-specific information but not local predictability cues, suggesting that predictive processing in local sentence contexts may be supported by separable neural mechanisms from adaptation of prediction as a function of global context. These results have important implications for interpreting age-related change in predictive processing, and for drawing more generalized conclusions regarding domain-general versus language-specific accounts of prediction.
Collapse
Affiliation(s)
- Shruti Dave
- Department of Medical Social Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
| | | | - Liv J Hoversten
- Department of Psychology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Matthew J Traxler
- Department of Psychology and Center for Mind and Brain, University of California Davis, Davis, CA, USA
| | - Tamara Y Swaab
- Department of Psychology and Center for Mind and Brain, University of California Davis, Davis, CA, USA
| |
Collapse
|
30
|
Kuperberg GR. Tea With Milk? A Hierarchical Generative Framework of Sequential Event Comprehension. Top Cogn Sci 2021; 13:256-298. [PMID: 33025701 PMCID: PMC7897219 DOI: 10.1111/tops.12518] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 07/11/2020] [Accepted: 07/11/2020] [Indexed: 10/23/2022]
Abstract
To make sense of the world around us, we must be able to segment a continual stream of sensory inputs into discrete events. In this review, I propose that in order to comprehend events, we engage hierarchical generative models that "reverse engineer" the intentions of other agents as they produce sequential action in real time. By generating probabilistic predictions for upcoming events, generative models ensure that we are able to keep up with the rapid pace at which perceptual inputs unfold. By tracking our certainty about other agents' goals and the magnitude of prediction errors at multiple temporal scales, generative models enable us to detect event boundaries by inferring when a goal has changed. Moreover, by adapting flexibly to the broader dynamics of the environment and our own comprehension goals, generative models allow us to optimally allocate limited resources. Finally, I argue that we use generative models not only to comprehend events but also to produce events (carry out goal-relevant sequential action) and to continually learn about new events from our surroundings. Taken together, this hierarchical generative framework provides new insights into how the human brain processes events so effortlessly while highlighting the fundamental links between event comprehension, production, and learning.
Collapse
Affiliation(s)
- Gina R. Kuperberg
- Department of Psychology and Center for Cognitive Science, Tufts University
- Department of Psychiatry and the Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School
| |
Collapse
|