1
|
Li Z, Hong B, Nolte G, Engel AK, Zhang D. Speaker-listener neural coupling correlates with semantic and acoustic features of naturalistic speech. Soc Cogn Affect Neurosci 2024; 19:nsae051. [PMID: 39012092 PMCID: PMC11296674 DOI: 10.1093/scan/nsae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 05/12/2024] [Accepted: 07/16/2024] [Indexed: 07/17/2024] Open
Abstract
Recent research has extensively reported the phenomenon of inter-brain neural coupling between speakers and listeners during speech communication. Yet, the specific speech processes underlying this neural coupling remain elusive. To bridge this gap, this study estimated the correlation between the temporal dynamics of speaker-listener neural coupling with speech features, utilizing two inter-brain datasets accounting for different noise levels and listener's language experiences (native vs. non-native). We first derived time-varying speaker-listener neural coupling, extracted acoustic feature (envelope) and semantic features (entropy and surprisal) from speech, and then explored their correlational relationship. Our findings reveal that in clear conditions, speaker-listener neural coupling correlates with semantic features. However, as noise increases, this correlation is only significant for native listeners. For non-native listeners, neural coupling correlates predominantly with acoustic feature rather than semantic features. These results revealed how speaker-listener neural coupling is associated with the acoustic and semantic features under various scenarios, enriching our understanding of the inter-brain neural mechanisms during natural speech communication. We therefore advocate for more attention on the dynamic nature of speaker-listener neural coupling and its modeling with multilevel speech features.
Collapse
Affiliation(s)
- Zhuoran Li
- Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA 52242, United States
- Stead Family Department of Pediatrics, University of Iowa Carver College of Medicine, Iowa City, IA 52242, United States
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Dan Zhang
- Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| |
Collapse
|
2
|
Cometa A, Battaglini C, Artoni F, Greco M, Frank R, Repetto C, Bottoni F, Cappa SF, Micera S, Ricciardi E, Moro A. Brain and grammar: revealing electrophysiological basic structures with competing statistical models. Cereb Cortex 2024; 34:bhae317. [PMID: 39098819 DOI: 10.1093/cercor/bhae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 07/08/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024] Open
Abstract
Acoustic, lexical, and syntactic information are simultaneously processed in the brain requiring complex strategies to distinguish their electrophysiological activity. Capitalizing on previous works that factor out acoustic information, we could concentrate on the lexical and syntactic contribution to language processing by testing competing statistical models. We exploited electroencephalographic recordings and compared different surprisal models selectively involving lexical information, part of speech, or syntactic structures in various combinations. Electroencephalographic responses were recorded in 32 participants during listening to affirmative active declarative sentences. We compared the activation corresponding to basic syntactic structures, such as noun phrases vs. verb phrases. Lexical and syntactic processing activates different frequency bands, partially different time windows, and different networks. Moreover, surprisal models based on part of speech inventory only do not explain well the electrophysiological data, while those including syntactic information do. By disentangling acoustic, lexical, and syntactic information, we demonstrated differential brain sensitivity to syntactic information. These results confirm and extend previous measures obtained with intracranial recordings, supporting our hypothesis that syntactic structures are crucial in neural language processing. This study provides a detailed understanding of how the brain processes syntactic information, highlighting the importance of syntactic surprisal in shaping neural responses during language comprehension.
Collapse
Affiliation(s)
- Andrea Cometa
- MoMiLab, IMT School for Advanced Studies Lucca, Piazza S.Francesco, 19, Lucca 55100, Italy
- The BioRobotics Institute and Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Viale Rinaldo Piaggio 34, Pontedera 56025, Italy
- Cognitive Neuroscience (ICoN) Center, University School for Advanced Studies IUSS, Piazza Vittoria 15, Pavia 27100, Italy
| | - Chiara Battaglini
- Neurolinguistics and Experimental Pragmatics (NEP) Lab, University School for Advanced Studies IUSS Pavia, Piazza della Vittoria 15, Pavia 27100, Italy
| | - Fiorenzo Artoni
- Department of Clinical Neurosciences, Faculty of Medicine, University of Geneva, 1, rue Michel-Servet, Genéve 1211, Switzerland
| | - Matteo Greco
- Cognitive Neuroscience (ICoN) Center, University School for Advanced Studies IUSS, Piazza Vittoria 15, Pavia 27100, Italy
| | - Robert Frank
- Department of Linguistics, Yale University, 370 Temple St, New Haven, CT 06511, United States
| | - Claudia Repetto
- Department of Psychology, Università Cattolica del Sacro Cuore, Largo A. Gemelli 1, Milan 20123, Italy
| | - Franco Bottoni
- Istituto Clinico Humanitas, IRCCS, Via Alessandro Manzoni 56, Rozzano 20089, Italy
| | - Stefano F Cappa
- Cognitive Neuroscience (ICoN) Center, University School for Advanced Studies IUSS, Piazza Vittoria 15, Pavia 27100, Italy
- Dementia Research Center, IRCCS Mondino Foundation National Institute of Neurology, Via Mondino 2, Pavia 27100, Italy
| | - Silvestro Micera
- The BioRobotics Institute and Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Viale Rinaldo Piaggio 34, Pontedera 56025, Italy
- Bertarelli Foundation Chair in Translational NeuroEngineering, Center for Neuroprosthetics and School of Engineering, Ecole Polytechnique Federale de Lausanne, Campus Biotech, Chemin des Mines 9, Geneva, GE CH 1202, Switzerland
| | - Emiliano Ricciardi
- MoMiLab, IMT School for Advanced Studies Lucca, Piazza S.Francesco, 19, Lucca 55100, Italy
| | - Andrea Moro
- Cognitive Neuroscience (ICoN) Center, University School for Advanced Studies IUSS, Piazza Vittoria 15, Pavia 27100, Italy
| |
Collapse
|
3
|
Zada Z, Goldstein A, Michelmann S, Simony E, Price A, Hasenfratz L, Barham E, Zadbood A, Doyle W, Friedman D, Dugan P, Melloni L, Devore S, Flinker A, Devinsky O, Nastase SA, Hasson U. A shared model-based linguistic space for transmitting our thoughts from brain to brain in natural conversations. Neuron 2024:S0896-6273(24)00460-4. [PMID: 39096896 DOI: 10.1016/j.neuron.2024.06.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 03/26/2024] [Accepted: 06/25/2024] [Indexed: 08/05/2024]
Abstract
Effective communication hinges on a mutual understanding of word meaning in different contexts. We recorded brain activity using electrocorticography during spontaneous, face-to-face conversations in five pairs of epilepsy patients. We developed a model-based coupling framework that aligns brain activity in both speaker and listener to a shared embedding space from a large language model (LLM). The context-sensitive LLM embeddings allow us to track the exchange of linguistic information, word by word, from one brain to another in natural conversations. Linguistic content emerges in the speaker's brain before word articulation and rapidly re-emerges in the listener's brain after word articulation. The contextual embeddings better capture word-by-word neural alignment between speaker and listener than syntactic and articulatory models. Our findings indicate that the contextual embeddings learned by LLMs can serve as an explicit numerical model of the shared, context-rich meaning space humans use to communicate their thoughts to one another.
Collapse
Affiliation(s)
- Zaid Zada
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA.
| | - Ariel Goldstein
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Department of Cognitive and Brain Sciences and Business School, Hebrew University, Jerusalem 9190501, Israel
| | - Sebastian Michelmann
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Erez Simony
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Faculty of Engineering, Holon Institute of Technology, Holon 5810201, Israel
| | - Amy Price
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Liat Hasenfratz
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Emily Barham
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Asieh Zadbood
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA; Department of Psychology, Columbia University, New York, NY 10027, USA
| | - Werner Doyle
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Daniel Friedman
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Patricia Dugan
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Lucia Melloni
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Sasha Devore
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Adeen Flinker
- Grossman School of Medicine, New York University, New York, NY 10016, USA; Tandon School of Engineering, New York University, New York, NY 10016, USA
| | - Orrin Devinsky
- Grossman School of Medicine, New York University, New York, NY 10016, USA
| | - Samuel A Nastase
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| | - Uri Hasson
- Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
4
|
Li R, Cao M, Fu D, Wei W, Wang D, Yuan Z, Hu R, Deng W. Deciphering language disturbances in schizophrenia: A study using fine-tuned language models. Schizophr Res 2024; 271:120-128. [PMID: 39024960 DOI: 10.1016/j.schres.2024.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 07/04/2024] [Accepted: 07/07/2024] [Indexed: 07/20/2024]
Abstract
This research presents two stable language metrics, namely Successful Prediction Rate (SPR) and Disfluency (DF), to objectively quantify the linguistic disturbances associated with schizophrenia. These novel language metrics can capture both off-topic responses and incoherence in patients' speech by modeling speech information and fine-tuning techniques. Additionally, these metrics exhibit cultural sensitivity while providing a more comprehensive evaluation of linguistic abnormalities in schizophrenia. This research fine-tuned the ELECTRA Pretrained Language Model on a 750 MB text corpus obtained from major Chinese mental health forums. The effectiveness of the fine-tuned language model is verified on a group comprising 38 individuals diagnosed with schizophrenia and 25 meticulously matched healthy controls. The study explores the association between the fine-tuned language model and the Positive and Negative Syndrome Scale (PANSS) items. The results demonstrate that SPR is higher in healthy controls, indicating better language understanding by the pre-trained language model. Conversely, DF is higher in individuals with schizophrenia, indicating more inconsistent language structure. The relationship between linguistic features and P2 (conceptual disorganization) reveals that patients with positive P2 exhibit lower SPR and higher DF. Binary logistic regression using the combined SPR and DF features achieves 84.5 % accuracy in classifying P2, exceeding the performance of traditional features by 20.5 %. Moreover, the proposed linguistic features outperform traditional linguistic features in discriminating FTD (formal thought disorder), as demonstrated by multivariate linear regression analysis.
Collapse
Affiliation(s)
- Renyu Li
- DAMO Academy, Alibaba Group, Hangzhou, China
| | - Minne Cao
- Affiliated Mental Health Center, Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Dawei Fu
- DAMO Academy, Alibaba Group, Hangzhou, China
| | - Wei Wei
- Affiliated Mental Health Center, Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Dequan Wang
- Affiliated Mental Health Center, Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhaoxia Yuan
- Affiliated Mental Health Center, Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Ruofei Hu
- DAMO Academy, Alibaba Group, Hangzhou, China; Lifestyle Supporting Technologies Group, Technical University of Madrid, Spain
| | - Wei Deng
- Affiliated Mental Health Center, Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, China; Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, China.
| |
Collapse
|
5
|
Cong Y, LaCroix AN, Lee J. Clinical efficacy of pre-trained large language models through the lens of aphasia. Sci Rep 2024; 14:15573. [PMID: 38971898 PMCID: PMC11227580 DOI: 10.1038/s41598-024-66576-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 07/01/2024] [Indexed: 07/08/2024] Open
Abstract
The rapid development of large language models (LLMs) motivates us to explore how such state-of-the-art natural language processing systems can inform aphasia research. What kind of language indices can we derive from a pre-trained LLM? How do they differ from or relate to the existing language features in aphasia? To what extent can LLMs serve as an interpretable and effective diagnostic and measurement tool in a clinical context? To investigate these questions, we constructed predictive and correlational models, which utilize mean surprisals from LLMs as predictor variables. Using AphasiaBank archived data, we validated our models' efficacy in aphasia diagnosis, measurement, and prediction. Our finding is that LLMs-surprisals can effectively detect the presence of aphasia and different natures of the disorder, LLMs in conjunction with the existing language indices improve models' efficacy in subtyping aphasia, and LLMs-surprisals can capture common agrammatic deficits at both word and sentence level. Overall, LLMs have potential to advance automatic and precise aphasia prediction. A natural language processing pipeline can be greatly benefitted from integrating LLMs, enabling us to refine models of existing language disorders, such as aphasia.
Collapse
Affiliation(s)
- Yan Cong
- School of Languages and Cultures, Purdue University, West Lafayette, USA.
| | - Arianna N LaCroix
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, USA
| | - Jiyeon Lee
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, USA
| |
Collapse
|
6
|
Hong Z, Wang H, Zada Z, Gazula H, Turner D, Aubrey B, Niekerken L, Doyle W, Devore S, Dugan P, Friedman D, Devinsky O, Flinker A, Hasson U, Nastase SA, Goldstein A. Scale matters: Large language models with billions (rather than millions) of parameters better match neural representations of natural language. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.12.598513. [PMID: 39005394 PMCID: PMC11244877 DOI: 10.1101/2024.06.12.598513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Recent research has used large language models (LLMs) to study the neural basis of naturalistic language processing in the human brain. LLMs have rapidly grown in complexity, leading to improved language processing capabilities. However, neuroscience researchers haven't kept up with the quick progress in LLM development. Here, we utilized several families of transformer-based LLMs to investigate the relationship between model size and their ability to capture linguistic information in the human brain. Crucially, a subset of LLMs were trained on a fixed training set, enabling us to dissociate model size from architecture and training set size. We used electrocorticography (ECoG) to measure neural activity in epilepsy patients while they listened to a 30-minute naturalistic audio story. We fit electrode-wise encoding models using contextual embeddings extracted from each hidden layer of the LLMs to predict word-level neural signals. In line with prior work, we found that larger LLMs better capture the structure of natural language and better predict neural activity. We also found a log-linear relationship where the encoding performance peaks in relatively earlier layers as model size increases. We also observed variations in the best-performing layer across different brain regions, corresponding to an organized language processing hierarchy.
Collapse
Affiliation(s)
- Zhuoqiao Hong
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Haocheng Wang
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Zaid Zada
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Harshvardhan Gazula
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA
| | - David Turner
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Bobbi Aubrey
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Leonard Niekerken
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Werner Doyle
- New York University Grossman School of Medicine, New York, NY
| | - Sasha Devore
- New York University Grossman School of Medicine, New York, NY
| | - Patricia Dugan
- New York University Grossman School of Medicine, New York, NY
| | - Daniel Friedman
- New York University Grossman School of Medicine, New York, NY
| | - Orrin Devinsky
- New York University Grossman School of Medicine, New York, NY
| | - Adeen Flinker
- New York University Grossman School of Medicine, New York, NY
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Samuel A Nastase
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ
| | - Ariel Goldstein
- Business School, Data Science Department and Cognitive Science Department, Hebrew University, Jerusalem, Israel
| |
Collapse
|
7
|
Kumar S, Sumers TR, Yamakoshi T, Goldstein A, Hasson U, Norman KA, Griffiths TL, Hawkins RD, Nastase SA. Shared functional specialization in transformer-based language models and the human brain. Nat Commun 2024; 15:5523. [PMID: 38951520 PMCID: PMC11217339 DOI: 10.1038/s41467-024-49173-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 05/24/2024] [Indexed: 07/03/2024] Open
Abstract
When processing language, the brain is thought to deploy specialized computations to construct meaning from complex linguistic structures. Recently, artificial neural networks based on the Transformer architecture have revolutionized the field of natural language processing. Transformers integrate contextual information across words via structured circuit computations. Prior work has focused on the internal representations ("embeddings") generated by these circuits. In this paper, we instead analyze the circuit computations directly: we deconstruct these computations into the functionally-specialized "transformations" that integrate contextual information across words. Using functional MRI data acquired while participants listened to naturalistic stories, we first verify that the transformations account for considerable variance in brain activity across the cortical language network. We then demonstrate that the emergent computations performed by individual, functionally-specialized "attention heads" differentially predict brain activity in specific cortical regions. These heads fall along gradients corresponding to different layers and context lengths in a low-dimensional cortical space.
Collapse
Affiliation(s)
- Sreejan Kumar
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| | - Theodore R Sumers
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.
| | - Takateru Yamakoshi
- Faculty of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Ariel Goldstein
- Department of Cognitive and Brain Sciences and Business School, Hebrew University, Jerusalem, 9190401, Israel
| | - Uri Hasson
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Kenneth A Norman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Thomas L Griffiths
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Robert D Hawkins
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Samuel A Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| |
Collapse
|
8
|
Ferrante M, Boccato T, Passamonti L, Toschi N. Retrieving and reconstructing conceptually similar images from fMRI with latent diffusion models and a neuro-inspired brain decoding model. J Neural Eng 2024; 21:046001. [PMID: 38885689 DOI: 10.1088/1741-2552/ad593c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 06/17/2024] [Indexed: 06/20/2024]
Abstract
Objective.Brain decoding is a field of computational neuroscience that aims to infer mental states or internal representations of perceptual inputs from measurable brain activity. This study proposes a novel approach to brain decoding that relies on semantic and contextual similarity.Approach.We use several functional magnetic resonance imaging (fMRI) datasets of natural images as stimuli and create a deep learning decoding pipeline inspired by the bottom-up and top-down processes in human vision. Our pipeline includes a linear brain-to-feature model that maps fMRI activity to semantic visual stimuli features. We assume that the brain projects visual information onto a space that is homeomorphic to the latent space of last layer of a pretrained neural network, which summarizes and highlights similarities and differences between concepts. These features are categorized in the latent space using a nearest-neighbor strategy, and the results are used to retrieve images or condition a generative latent diffusion model to create novel images.Main results.We demonstrate semantic classification and image retrieval on three different fMRI datasets: Generic Object Decoding (vision perception and imagination), BOLD5000, and NSD. In all cases, a simple mapping between fMRI and a deep semantic representation of the visual stimulus resulted in meaningful classification and retrieved or generated images. We assessed quality using quantitative metrics and a human evaluation experiment that reproduces the multiplicity of conscious and unconscious criteria that humans use to evaluate image similarity. Our method achieved correct evaluation in over 80% of the test set.Significance.Our study proposes a novel approach to brain decoding that relies on semantic and contextual similarity. The results demonstrate that measurable neural correlates can be linearly mapped onto the latent space of a neural network to synthesize images that match the original content. These findings have implications for both cognitive neuroscience and artificial intelligence.
Collapse
Affiliation(s)
- Matteo Ferrante
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
| | - Tommaso Boccato
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
| | - Luca Passamonti
- CNR, Istituto di Bioimmagini e Fisiologia Molecolare, Milan, Italy
| | - Nicola Toschi
- Department of Biomedicine and Prevention, University of Rome, Tor Vergata, Rome, Italy
- Martinos Center for Biomedical Imaging, MGH and Harvard Medical School, Boston, MA, United States of America
| |
Collapse
|
9
|
Kauf C, Kim HS, Lee EJ, Jhingan N, Selena She J, Taliaferro M, Gibson E, Fedorenko E. Linguistic inputs must be syntactically parsable to fully engage the language network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.21.599332. [PMID: 38948870 PMCID: PMC11212959 DOI: 10.1101/2024.06.21.599332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Human language comprehension is remarkably robust to ill-formed inputs (e.g., word transpositions). This robustness has led some to argue that syntactic parsing is largely an illusion, and that incremental comprehension is more heuristic, shallow, and semantics-based than is often assumed. However, the available data are also consistent with the possibility that humans always perform rule-like symbolic parsing and simply deploy error correction mechanisms to reconstruct ill-formed inputs when needed. We put these hypotheses to a new stringent test by examining brain responses to a) stimuli that should pose a challenge for syntactic reconstruction but allow for complex meanings to be built within local contexts through associative/shallow processing (sentences presented in a backward word order), and b) grammatically well-formed but semantically implausible sentences that should impede semantics-based heuristic processing. Using a novel behavioral syntactic reconstruction paradigm, we demonstrate that backward-presented sentences indeed impede the recovery of grammatical structure during incremental comprehension. Critically, these backward-presented stimuli elicit a relatively low response in the language areas, as measured with fMRI. In contrast, semantically implausible but grammatically well-formed sentences elicit a response in the language areas similar in magnitude to naturalistic (plausible) sentences. In other words, the ability to build syntactic structures during incremental language processing is both necessary and sufficient to fully engage the language network. Taken together, these results provide strongest to date support for a generalized reliance of human language comprehension on syntactic parsing.
Collapse
Affiliation(s)
- Carina Kauf
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Hee So Kim
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Elizabeth J. Lee
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Niharika Jhingan
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Jingyuan Selena She
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Maya Taliaferro
- Department of Psychology, New York University, New York, NY 10012 USA
| | - Edward Gibson
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
- The Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138 USA
| |
Collapse
|
10
|
Subramaniam V, Conwell C, Wang C, Kreiman G, Katz B, Cases I, Barbu A. Revealing Vision-Language Integration in the Brain with Multimodal Networks. ARXIV 2024:arXiv:2406.14481v1. [PMID: 38947929 PMCID: PMC11213144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoen-cephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models. Our target DNN models span different architectures (e.g., convolutional networks and transformers) and multimodal training techniques (e.g., cross-attention and contrastive learning). As a key enabling step, we first demonstrate that trained vision and language models systematically outperform their randomly initialized counterparts in their ability to predict SEEG signals. We then compare unimodal and multimodal models against one another. Because our target DNN models often have different architectures, number of parameters, and training sets (possibly obscuring those differences attributable to integration), we carry out a controlled comparison of two models (SLIP and SimCLR), which keep all of these attributes the same aside from input modality. Using this approach, we identify a sizable number of neural sites (on average 141 out of 1090 total sites or 12.94%) and brain regions where multimodal integration seems to occur. Additionally, we find that among the variants of multimodal training techniques we assess, CLIP-style training is the best suited for downstream prediction of the neural activity in these sites.
Collapse
Affiliation(s)
| | - Colin Conwell
- Department of Cognitive Science, Johns Hopkins University
| | | | | | | | | | | |
Collapse
|
11
|
Morgan AM, Devinsky O, Doyle WK, Dugan P, Friedman D, Flinker A. A low-activity cortical network selectively encodes syntax. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.20.599931. [PMID: 38948730 PMCID: PMC11212956 DOI: 10.1101/2024.06.20.599931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Syntax, the abstract structure of language, is a hallmark of human cognition. Despite its importance, its neural underpinnings remain obscured by inherent limitations of non-invasive brain measures and a near total focus on comprehension paradigms. Here, we address these limitations with high-resolution neurosurgical recordings (electrocorticography) and a controlled sentence production experiment. We uncover three syntactic networks that are broadly distributed across traditional language regions, but with focal concentrations in middle and inferior frontal gyri. In contrast to previous findings from comprehension studies, these networks process syntax mostly to the exclusion of words and meaning, supporting a cognitive architecture with a distinct syntactic system. Most strikingly, our data reveal an unexpected property of syntax: it is encoded independent of neural activity levels. We propose that this "low-activity coding" scheme represents a novel mechanism for encoding information, reserved for higher-order cognition more broadly.
Collapse
Affiliation(s)
- Adam M. Morgan
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Orrin Devinsky
- Neurosurgery Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Werner K. Doyle
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Patricia Dugan
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Daniel Friedman
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
| | - Adeen Flinker
- Neurology Department, NYU Grossman School of Medicine, 550 1st Ave, New York, 10016, NY, USA
- Biomedical Engineering Department, NYU Tandon School of Engineering, 6 MetroTech Center Ave, Brooklyn, 11201, NY, USA
| |
Collapse
|
12
|
Waldrop MM. Can ChatGPT help researchers understand how the human brain handles language? Proc Natl Acad Sci U S A 2024; 121:e2410196121. [PMID: 38875152 PMCID: PMC11194597 DOI: 10.1073/pnas.2410196121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024] Open
|
13
|
Wang R, Chen ZS. Large-scale foundation models and generative AI for BigData neuroscience. Neurosci Res 2024:S0168-0102(24)00075-0. [PMID: 38897235 DOI: 10.1016/j.neures.2024.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 04/15/2024] [Accepted: 05/15/2024] [Indexed: 06/21/2024]
Abstract
Recent advances in machine learning have led to revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the landscapes of neuroscience research and make a significant impact on the future. Here we present a mini-review on recent advances in foundation models and generative AI models as well as their applications in neuroscience, including natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.
Collapse
Affiliation(s)
- Ran Wang
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Zhe Sage Chen
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Neuroscience and Physiology, Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA; Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY 11201, USA.
| |
Collapse
|
14
|
Mahowald K, Ivanova AA, Blank IA, Kanwisher N, Tenenbaum JB, Fedorenko E. Dissociating language and thought in large language models. Trends Cogn Sci 2024; 28:517-540. [PMID: 38508911 DOI: 10.1016/j.tics.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 03/22/2024]
Abstract
Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence (knowledge of linguistic rules and patterns) and functional linguistic competence (understanding and using language in the world). We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. We posit that models that use language in human-like ways would need to master both of these competence types, which, in turn, could require the emergence of separate mechanisms specialized for formal versus functional linguistic competence.
Collapse
|
15
|
Fedorenko E, Piantadosi ST, Gibson EAF. Language is primarily a tool for communication rather than thought. Nature 2024; 630:575-586. [PMID: 38898296 DOI: 10.1038/s41586-024-07522-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/03/2024] [Indexed: 06/21/2024]
Abstract
Language is a defining characteristic of our species, but the function, or functions, that it serves has been debated for centuries. Here we bring recent evidence from neuroscience and allied disciplines to argue that in modern humans, language is a tool for communication, contrary to a prominent view that we use language for thinking. We begin by introducing the brain network that supports linguistic ability in humans. We then review evidence for a double dissociation between language and thought, and discuss several properties of language that suggest that it is optimized for communication. We conclude that although the emergence of language has unquestionably transformed human culture, language does not appear to be a prerequisite for complex thought, including symbolic thought. Instead, language is a powerful tool for the transmission of cultural knowledge; it plausibly co-evolved with our thinking and reasoning capacities, and only reflects, rather than gives rise to, the signature sophistication of human cognition.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Massachusetts Institute of Technology, Cambridge, MA, USA.
- Speech and Hearing in Bioscience and Technology Program at Harvard University, Boston, MA, USA.
| | | | | |
Collapse
|
16
|
Young MJ, Kazazian K, Fischer D, Lissak IA, Bodien YG, Edlow BL. Disclosing Results of Tests for Covert Consciousness: A Framework for Ethical Translation. Neurocrit Care 2024; 40:865-878. [PMID: 38243150 PMCID: PMC11147696 DOI: 10.1007/s12028-023-01899-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 11/22/2023] [Indexed: 01/21/2024]
Abstract
The advent of neurotechnologies including advanced functional magnetic resonance imaging and electroencephalography to detect states of awareness not detectable by traditional bedside neurobehavioral techniques (i.e., covert consciousness) promises to transform neuroscience research and clinical practice for patients with brain injury. As these interventions progress from research tools into actionable, guideline-endorsed clinical tests, ethical guidance for clinicians on how to responsibly communicate the sensitive results they yield is crucial yet remains underdeveloped. Drawing on insights from empirical and theoretical neuroethics research and our clinical experience with advanced neurotechnologies to detect consciousness in behaviorally unresponsive patients, we critically evaluate ethical promises and perils associated with disclosing the results of clinical covert consciousness assessments and describe a semistructured approach to responsible data sharing to mitigate potential risks.
Collapse
Affiliation(s)
- Michael J Young
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA, 02114, USA.
| | - Karnig Kazazian
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA, 02114, USA
- Western Institute of Neuroscience, Western University, London, ON, Canada
| | - David Fischer
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - India A Lissak
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA, 02114, USA
| | - Yelena G Bodien
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA, 02114, USA
| | - Brian L Edlow
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, 101 Merrimac Street, Suite 310, Boston, MA, 02114, USA
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA, USA
| |
Collapse
|
17
|
Yu S, Gu C, Huang K, Li P. Predicting the next sentence (not word) in large language models: What model-brain alignment tells us about discourse comprehension. SCIENCE ADVANCES 2024; 10:eadn7744. [PMID: 38781343 PMCID: PMC11114233 DOI: 10.1126/sciadv.adn7744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/18/2024] [Indexed: 05/25/2024]
Abstract
Current large language models (LLMs) rely on word prediction as their backbone pretraining task. Although word prediction is an important mechanism underlying language processing, human language comprehension occurs at multiple levels, involving the integration of words and sentences to achieve a full understanding of discourse. This study models language comprehension by using the next sentence prediction (NSP) task to investigate mechanisms of discourse-level comprehension. We show that NSP pretraining enhanced a model's alignment with brain data especially in the right hemisphere and in the multiple demand network, highlighting the contributions of nonclassical language regions to high-level language understanding. Our results also suggest that NSP can enable the model to better capture human comprehension performance and to better encode contextual information. Our study demonstrates that the inclusion of diverse learning objectives in a model leads to more human-like representations, and investigating the neurocognitive plausibility of pretraining tasks in LLMs can shed light on outstanding questions in language neuroscience.
Collapse
Affiliation(s)
- Shaoyun Yu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Chanyuan Gu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kexin Huang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Ping Li
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR, China
- Centre for Immersive Learning and Metaverse in Education, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
18
|
Almazroi AA, Ayub N. Enhancing aspect-based multi-labeling with ensemble learning for ethical logistics. PLoS One 2024; 19:e0295248. [PMID: 38771789 PMCID: PMC11108219 DOI: 10.1371/journal.pone.0295248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 11/20/2023] [Indexed: 05/23/2024] Open
Abstract
In the dynamic domain of logistics, effective communication is essential for streamlined operations. Our innovative solution, the Multi-Labeling Ensemble (MLEn), tackles the intricate task of extracting multi-labeled data, employing advanced techniques for accurate preprocessing of textual data through the NLTK toolkit. This approach is carefully tailored to the prevailing language used in logistics communication. MLEn utilizes innovative methods, including sentiment intensity analysis, Word2Vec, and Doc2Vec, ensuring comprehensive feature extraction. This proves particularly suitable for logistics in e-commerce, capturing nuanced communication essential for efficient operations. Ethical considerations are a cornerstone in logistics communication, and MLEn plays a pivotal role in detecting and categorizing inappropriate language, aligning inherently with ethical norms. Leveraging Tf-IDF and Vader for feature enhancement, MLEn adeptly discerns and labels ethically sensitive content in logistics communication. Across diverse datasets, including Emotions, MLEn consistently achieves impressive accuracy levels ranging from 92% to 97%, establishing its superiority in the logistics context. Particularly, our proposed method, DenseNet-EHO, outperforms BERT by 8% and surpasses other techniques by a 15-25% efficiency. A comprehensive analysis, considering metrics such as precision, recall, F1-score, Ranking Loss, Jaccard Similarity, AUC-ROC, sensitivity, and time complexity, underscores DenseNet-EHO's efficiency, aligning with the practical demands within the logistics track. Our research significantly contributes to enhancing precision, diversity, and computational efficiency in aspect-based sentiment analysis within logistics. By integrating cutting-edge preprocessing, sentiment intensity analysis, and vectorization, MLEn emerges as a robust framework for multi-label datasets, consistently outperforming conventional approaches and giving outstanding precision, accuracy, and efficiency in the logistics field.
Collapse
Affiliation(s)
- Abdulwahab Ali Almazroi
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| | - Nasir Ayub
- Department of Creative Technologies, Air University Islamabad, Islamabad, Pakistan
| |
Collapse
|
19
|
Yu L, Dugan P, Doyle W, Devinsky O, Friedman D, Flinker A. A left-lateralized dorsolateral prefrontal network for naming. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594403. [PMID: 38798614 PMCID: PMC11118423 DOI: 10.1101/2024.05.15.594403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
The ability to connect the form and meaning of a concept, known as word retrieval, is fundamental to human communication. While various input modalities could lead to identical word retrieval, the exact neural dynamics supporting this convergence relevant to daily auditory discourse remain poorly understood. Here, we leveraged neurosurgical electrocorticographic (ECoG) recordings from 48 patients and dissociated two key language networks that highly overlap in time and space integral to word retrieval. Using unsupervised temporal clustering techniques, we found a semantic processing network located in the middle and inferior frontal gyri. This network was distinct from an articulatory planning network in the inferior frontal and precentral gyri, which was agnostic to input modalities. Functionally, we confirmed that the semantic processing network encodes word surprisal during sentence perception. Our findings characterize how humans integrate ongoing auditory semantic information over time, a critical linguistic function from passive comprehension to daily discourse.
Collapse
Affiliation(s)
- Leyao Yu
- Department of Biomedical Engineering, New York University, New York, 10016, New York, the United States
- Department of Neurology, School of Medicine, New York University, New York, 10016, New York, the United States
| | - Patricia Dugan
- Department of Neurology, School of Medicine, New York University, New York, 10016, New York, the United States
| | - Werner Doyle
- Department of Neurosurgery, School of Medicine, New York University, New York, 10016, New York, the United States
| | - Orrin Devinsky
- Department of Neurology, School of Medicine, New York University, New York, 10016, New York, the United States
| | - Daniel Friedman
- Department of Neurology, School of Medicine, New York University, New York, 10016, New York, the United States
| | - Adeen Flinker
- Department of Biomedical Engineering, New York University, New York, 10016, New York, the United States
- Department of Neurology, School of Medicine, New York University, New York, 10016, New York, the United States
| |
Collapse
|
20
|
Riveland R, Pouget A. Natural language instructions induce compositional generalization in networks of neurons. Nat Neurosci 2024; 27:988-999. [PMID: 38499855 DOI: 10.1038/s41593-024-01607-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 02/15/2024] [Indexed: 03/20/2024]
Abstract
A fundamental human cognitive feat is to interpret linguistic instructions in order to perform novel tasks without explicit task experience. Yet, the neural computations that might be used to accomplish this remain poorly understood. We use advances in natural language processing to create a neural model of generalization based on linguistic instructions. Models are trained on a set of common psychophysical tasks, and receive instructions embedded by a pretrained language model. Our best models can perform a previously unseen task with an average performance of 83% correct based solely on linguistic instructions (that is, zero-shot learning). We found that language scaffolds sensorimotor representations such that activity for interrelated tasks shares a common geometry with the semantic representations of instructions, allowing language to cue the proper composition of practiced skills in unseen settings. We show how this model generates a linguistic description of a novel task it has identified using only motor feedback, which can subsequently guide a partner model to perform the task. Our models offer several experimentally testable predictions outlining how linguistic information must be represented to facilitate flexible and general cognition in the human brain.
Collapse
Affiliation(s)
- Reidar Riveland
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland.
| | - Alexandre Pouget
- Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland
| |
Collapse
|
21
|
Fedorenko E, Ivanova AA, Regev TI. The language network as a natural kind within the broader landscape of the human brain. Nat Rev Neurosci 2024; 25:289-312. [PMID: 38609551 DOI: 10.1038/s41583-024-00802-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 04/14/2024]
Abstract
Language behaviour is complex, but neuroscientific evidence disentangles it into distinct components supported by dedicated brain areas or networks. In this Review, we describe the 'core' language network, which includes left-hemisphere frontal and temporal areas, and show that it is strongly interconnected, independent of input and output modalities, causally important for language and language-selective. We discuss evidence that this language network plausibly stores language knowledge and supports core linguistic computations related to accessing words and constructions from memory and combining them to interpret (decode) or generate (encode) linguistic messages. We emphasize that the language network works closely with, but is distinct from, both lower-level - perceptual and motor - mechanisms and higher-level systems of knowledge and reasoning. The perceptual and motor mechanisms process linguistic signals, but, in contrast to the language network, are sensitive only to these signals' surface properties, not their meanings; the systems of knowledge and reasoning (such as the system that supports social reasoning) are sometimes engaged during language use but are not language-selective. This Review lays a foundation both for in-depth investigations of these different components of the language processing pipeline and for probing inter-component interactions.
Collapse
Affiliation(s)
- Evelina Fedorenko
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- The Program in Speech and Hearing in Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| | - Anna A Ivanova
- School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Tamar I Regev
- Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
22
|
Lin R, Meng X, Chen F, Li X, Jensen O, Theeuwes J, Wang B. Neural evidence for attentional capture by salient distractors. Nat Hum Behav 2024; 8:932-944. [PMID: 38538771 DOI: 10.1038/s41562-024-01852-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 02/21/2024] [Indexed: 04/17/2024]
Abstract
Salient objects often capture our attention, serving as distractors and hindering our current goals. It remains unclear when and how salient distractors interact with our goals, and our knowledge on the neural mechanisms responsible for attentional capture is limited to a few brain regions recorded from non-human primates. Here we conducted a multivariate analysis on human intracranial signals covering most brain regions and successfully dissociated distractor-specific representations from target-arousal signals in the high-frequency (60-100 Hz) activity. We found that salient distractors were processed rapidly around 220 ms, while target-tuning attention was attenuated simultaneously, supporting initial capture by distractors. Notably, neuronal activity specific to the distractor representation was strongest in the superior and middle temporal gyrus, amygdala and anterior cingulate cortex, while there were smaller contributions from the parietal and frontal cortices. These results provide neural evidence for attentional capture by salient distractors engaging a much larger network than previously appreciated.
Collapse
Affiliation(s)
- Rongqi Lin
- Key Laboratory of Brain, Cognition and Education Sciences, South China Normal University, Ministry of Education, Guangzhou, China
- Institute for Brain Research and Rehabilitation, South China Normal University, Guangzhou, China
- Center for Studies of Psychological Application, South China Normal University, Guangzhou, China
- Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
- Department of Psychology, Zhejiang Normal University, Jinhua, China
- Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Xianghong Meng
- Department of Neurosurgery, Shenzhen University General Hospital, Shenzhen, China
| | - Fuyong Chen
- Department of Neurosurgery, University of Hongkong Shenzhen Hospital, Shenzhen, China
| | - Xinyu Li
- Department of Psychology, Zhejiang Normal University, Jinhua, China
| | - Ole Jensen
- Centre for Human Brain Health, School of Psychology, University of Birmingham, Birmingham, UK
| | - Jan Theeuwes
- Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Benchi Wang
- Key Laboratory of Brain, Cognition and Education Sciences, South China Normal University, Ministry of Education, Guangzhou, China.
- Institute for Brain Research and Rehabilitation, South China Normal University, Guangzhou, China.
- Center for Studies of Psychological Application, South China Normal University, Guangzhou, China.
- Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China.
| |
Collapse
|
23
|
Saddler MR, McDermott JH. Models optimized for real-world tasks reveal the necessity of precise temporal coding in hearing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.21.590435. [PMID: 38712054 PMCID: PMC11071365 DOI: 10.1101/2024.04.21.590435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Neurons encode information in the timing of their spikes in addition to their firing rates. Spike timing is particularly precise in the auditory nerve, where action potentials phase lock to sound with sub-millisecond precision, but its behavioral relevance is uncertain. To investigate the role of this temporal coding, we optimized machine learning models to perform real-world hearing tasks with simulated cochlear input. We asked how precise auditory nerve spike timing needed to be to reproduce human behavior. Models with high-fidelity phase locking exhibited more human-like sound localization and speech perception than models without, consistent with an essential role in human hearing. Degrading phase locking produced task-dependent effects, revealing how the use of fine-grained temporal information reflects both ecological task demands and neural implementation constraints. The results link neural coding to perception and clarify conditions in which prostheses that fail to restore high-fidelity temporal coding could in principle restore near-normal hearing.
Collapse
Affiliation(s)
- Mark R Saddler
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, MA, USA
| |
Collapse
|
24
|
Cai J, Hadjinicolaou AE, Paulk AC, Soper DJ, Xia T, Williams ZM, Cash SS. Natural language processing models reveal neural dynamics of human conversation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.10.531095. [PMID: 36945468 PMCID: PMC10028965 DOI: 10.1101/2023.03.10.531095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
Abstract
Through conversation, humans relay complex information through the alternation of speech production and comprehension. The neural mechanisms that underlie these complementary processes or through which information is precisely conveyed by language, however, remain poorly understood. Here, we used pretrained deep learning natural language processing models in combination with intracranial neuronal recordings to discover neural signals that reliably reflect speech production, comprehension, and their transitions during natural conversation between individuals. Our findings indicate that neural activities that encoded linguistic information were broadly distributed throughout frontotemporal areas across multiple frequency bands. We also find that these activities were specific to the words and sentences being conveyed and that they were dependent on the word's specific context and order. Finally, we demonstrate that these neural patterns partially overlapped during language production and comprehension and that listener-speaker transitions were associated with specific, time-aligned changes in neural activity. Collectively, our findings reveal a dynamical organization of neural activities that subserve language production and comprehension during natural conversation and harness the use of deep learning models in understanding the neural mechanisms underlying human language.
Collapse
Affiliation(s)
- Jing Cai
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Alex E. Hadjinicolaou
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Angelique C. Paulk
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Daniel J. Soper
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Tian Xia
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Ziv M. Williams
- Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA
- Harvard-MIT Division of Health Sciences and Technology, Boston, MA
- Harvard Medical School, Program in Neuroscience, Boston, MA
- These authors contributed equally
| | - Sydney S. Cash
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
- Harvard-MIT Division of Health Sciences and Technology, Boston, MA
- These authors contributed equally
| |
Collapse
|
25
|
Rambelli G, Chersoni E, Testa D, Blache P, Lenci A. Neural Generative Models and the Parallel Architecture of Language: A Critical Review and Outlook. Top Cogn Sci 2024. [PMID: 38635667 DOI: 10.1111/tops.12733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 03/15/2024] [Accepted: 03/21/2024] [Indexed: 04/20/2024]
Abstract
According to the parallel architecture, syntactic and semantic information processing are two separate streams that interact selectively during language comprehension. While considerable effort is put into psycho- and neurolinguistics to understand the interchange of processing mechanisms in human comprehension, the nature of this interaction in recent neural Large Language Models remains elusive. In this article, we revisit influential linguistic and behavioral experiments and evaluate the ability of a large language model, GPT-3, to perform these tasks. The model can solve semantic tasks autonomously from syntactic realization in a manner that resembles human behavior. However, the outcomes present a complex and variegated picture, leaving open the question of how Language Models could learn structured conceptual representations.
Collapse
Affiliation(s)
- Giulia Rambelli
- Department of Modern Languages, Literatures, and Cultures, University of Bologna
| | - Emmanuele Chersoni
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University
| | | | | | - Alessandro Lenci
- Department of Philology, Literature, and Linguistics, University of Pisa
| |
Collapse
|
26
|
Lyu B, Marslen-Wilson WD, Fang Y, Tyler LK. Finding structure during incremental speech comprehension. eLife 2024; 12:RP89311. [PMID: 38577982 PMCID: PMC10997333 DOI: 10.7554/elife.89311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024] Open
Abstract
A core aspect of human speech comprehension is the ability to incrementally integrate consecutive words into a structured and coherent interpretation, aligning with the speaker's intended meaning. This rapid process is subject to multidimensional probabilistic constraints, including both linguistic knowledge and non-linguistic information within specific contexts, and it is their interpretative coherence that drives successful comprehension. To study the neural substrates of this process, we extract word-by-word measures of sentential structure from BERT, a deep language model, which effectively approximates the coherent outcomes of the dynamic interplay among various types of constraints. Using representational similarity analysis, we tested BERT parse depths and relevant corpus-based measures against the spatiotemporally resolved brain activity recorded by electro-/magnetoencephalography when participants were listening to the same sentences. Our results provide a detailed picture of the neurobiological processes involved in the incremental construction of structured interpretations. These findings show when and where coherent interpretations emerge through the evaluation and integration of multifaceted constraints in the brain, which engages bilateral brain regions extending beyond the classical fronto-temporal language system. Furthermore, this study provides empirical evidence supporting the use of artificial neural networks as computational models for revealing the neural dynamics underpinning complex cognitive processes in the brain.
Collapse
Affiliation(s)
| | - William D Marslen-Wilson
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| | - Yuxing Fang
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| | - Lorraine K Tyler
- Centre for Speech, Language and the Brain, Department of Psychology, University of CambridgeCambridgeUnited Kingdom
| |
Collapse
|
27
|
Hosseini EA, Schrimpf M, Zhang Y, Bowman S, Zaslavsky N, Fedorenko E. Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:43-63. [PMID: 38645622 PMCID: PMC11025646 DOI: 10.1162/nol_a_00137] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 01/09/2024] [Indexed: 04/23/2024]
Abstract
Artificial neural networks have emerged as computationally plausible models of human language processing. A major criticism of these models is that the amount of training data they receive far exceeds that of humans during language learning. Here, we use two complementary approaches to ask how the models' ability to capture human fMRI responses to sentences is affected by the amount of training data. First, we evaluate GPT-2 models trained on 1 million, 10 million, 100 million, or 1 billion words against an fMRI benchmark. We consider the 100-million-word model to be developmentally plausible in terms of the amount of training data given that this amount is similar to what children are estimated to be exposed to during the first 10 years of life. Second, we test the performance of a GPT-2 model trained on a 9-billion-token dataset to reach state-of-the-art next-word prediction performance on the human benchmark at different stages during training. Across both approaches, we find that (i) the models trained on a developmentally plausible amount of data already achieve near-maximal performance in capturing fMRI responses to sentences. Further, (ii) lower perplexity-a measure of next-word prediction performance-is associated with stronger alignment with human data, suggesting that models that have received enough training to achieve sufficiently high next-word prediction performance also acquire representations of sentences that are predictive of human fMRI responses. In tandem, these findings establish that although some training is necessary for the models' predictive ability, a developmentally realistic amount of training (∼100 million words) may suffice.
Collapse
Affiliation(s)
- Eghbal A. Hosseini
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Martin Schrimpf
- The MIT Quest for Intelligence Initiative, Cambridge, MA, USA
- Swiss Federal Institute of Technology, Lausanne, Switzerland
| | - Yian Zhang
- Computer Science Department, Stanford University, Stanford, CA, USA
| | - Samuel Bowman
- Center for Data Science, New York University, New York, NY, USA
- Department of Linguistics, New York University, New York, NY, USA
- Department of Computer Science, New York University, New York, NY, USA
| | - Noga Zaslavsky
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Language Science, University of California, Irvine, CA, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- The MIT Quest for Intelligence Initiative, Cambridge, MA, USA
- Speech and Hearing Bioscience and Technology Program, Harvard University, Boston, MA, USA
| |
Collapse
|
28
|
Kauf C, Tuckute G, Levy R, Andreas J, Fedorenko E. Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:7-42. [PMID: 38645614 PMCID: PMC11025651 DOI: 10.1162/nol_a_00116] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/11/2023] [Indexed: 04/23/2024]
Abstract
Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences' word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence's syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN's embedding space and decrease the ANN's ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result-that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones-aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.
Collapse
Affiliation(s)
- Carina Kauf
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Roger Levy
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jacob Andreas
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
29
|
Rosen ZP, Dale R. BERTs of a feather: Studying inter- and intra-group communication via information theory and language models. Behav Res Methods 2024; 56:3140-3160. [PMID: 38030924 DOI: 10.3758/s13428-023-02267-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2023] [Indexed: 12/01/2023]
Abstract
When communicating, individuals alter their language to fulfill a myriad of social functions. In particular, linguistic convergence and divergence are fundamental in establishing and maintaining group identity. Quantitatively characterizing linguistic convergence is important when testing hypotheses surrounding language, including interpersonal and group communication. We provide a quantitative interpretation of linguistic convergence grounded in information theory. We then construct a computational model, built on top of a neural network model of language, that can be deployed to measure and test hypotheses about linguistic convergence in "big data." We demonstrate the utility of our convergence measurement in two case studies: (1) showing that our measurement is indeed sensitive to linguistic convergence across turns in dyadic conversation, and (2) showing that our convergence measurement is sensitive to social factors that mediate convergence in Internet-based communities (specifically, r/MensRights and r/MensLib). Our measurement also captures differences in which social factors influence web-based communities. We conclude by discussing methodological and theoretical implications of this semantic convergence analysis.
Collapse
Affiliation(s)
- Zachary P Rosen
- Communication Studies Saddleback Community College, Mission Viejo, CA, USA.
| | - Rick Dale
- Department of Communication UCLA, Los Angeles, CA, USA
| |
Collapse
|
30
|
Fitz H, Hagoort P, Petersson KM. Neurobiological Causal Models of Language Processing. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:225-247. [PMID: 38645618 PMCID: PMC11025648 DOI: 10.1162/nol_a_00133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/18/2023] [Indexed: 04/23/2024]
Abstract
The language faculty is physically realized in the neurobiological infrastructure of the human brain. Despite significant efforts, an integrated understanding of this system remains a formidable challenge. What is missing from most theoretical accounts is a specification of the neural mechanisms that implement language function. Computational models that have been put forward generally lack an explicit neurobiological foundation. We propose a neurobiologically informed causal modeling approach which offers a framework for how to bridge this gap. A neurobiological causal model is a mechanistic description of language processing that is grounded in, and constrained by, the characteristics of the neurobiological substrate. It intends to model the generators of language behavior at the level of implementational causality. We describe key features and neurobiological component parts from which causal models can be built and provide guidelines on how to implement them in model simulations. Then we outline how this approach can shed new light on the core computational machinery for language, the long-term storage of words in the mental lexicon and combinatorial processing in sentence comprehension. In contrast to cognitive theories of behavior, causal models are formulated in the "machine language" of neurobiology which is universal to human cognition. We argue that neurobiological causal modeling should be pursued in addition to existing approaches. Eventually, this approach will allow us to develop an explicit computational neurobiology of language.
Collapse
Affiliation(s)
- Hartmut Fitz
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Neurobiology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Peter Hagoort
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Neurobiology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Karl Magnus Petersson
- Neurobiology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Faculty of Medicine and Biomedical Sciences, University of Algarve, Faro, Portugal
| |
Collapse
|
31
|
Huber E, Sauppe S, Isasi-Isasmendi A, Bornkessel-Schlesewsky I, Merlo P, Bickel B. Surprisal From Language Models Can Predict ERPs in Processing Predicate-Argument Structures Only if Enriched by an Agent Preference Principle. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2024; 5:167-200. [PMID: 38645615 PMCID: PMC11025647 DOI: 10.1162/nol_a_00121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 08/30/2023] [Indexed: 04/23/2024]
Abstract
Language models based on artificial neural networks increasingly capture key aspects of how humans process sentences. Most notably, model-based surprisals predict event-related potentials such as N400 amplitudes during parsing. Assuming that these models represent realistic estimates of human linguistic experience, their success in modeling language processing raises the possibility that the human processing system relies on no other principles than the general architecture of language models and on sufficient linguistic input. Here, we test this hypothesis on N400 effects observed during the processing of verb-final sentences in German, Basque, and Hindi. By stacking Bayesian generalised additive models, we show that, in each language, N400 amplitudes and topographies in the region of the verb are best predicted when model-based surprisals are complemented by an Agent Preference principle that transiently interprets initial role-ambiguous noun phrases as agents, leading to reanalysis when this interpretation fails. Our findings demonstrate the need for this principle independently of usage frequencies and structural differences between languages. The principle has an unequal force, however. Compared to surprisal, its effect is weakest in German, stronger in Hindi, and still stronger in Basque. This gradient is correlated with the extent to which grammars allow unmarked NPs to be patients, a structural feature that boosts reanalysis effects. We conclude that language models gain more neurobiological plausibility by incorporating an Agent Preference. Conversely, theories of human processing profit from incorporating surprisal estimates in addition to principles like the Agent Preference, which arguably have distinct evolutionary roots.
Collapse
Affiliation(s)
- Eva Huber
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
| | - Sebastian Sauppe
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
- Department of Psychology, University of Zurich, Zurich, Switzerland
| | - Arrate Isasi-Isasmendi
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
| | - Ina Bornkessel-Schlesewsky
- Cognitive Neuroscience Laboratory, Australian Research Centre for Interactive and Virtual Environments, University of South Australia, Adelaide, Australia
| | - Paola Merlo
- Department of Linguistics, University of Geneva, Geneva, Switzerland
- University Center for Computer Science, University of Geneva, Geneva, Switzerland
| | - Balthasar Bickel
- Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland
| |
Collapse
|
32
|
Goldstein A, Grinstein-Dabush A, Schain M, Wang H, Hong Z, Aubrey B, Schain M, Nastase SA, Zada Z, Ham E, Feder A, Gazula H, Buchnik E, Doyle W, Devore S, Dugan P, Reichart R, Friedman D, Brenner M, Hassidim A, Devinsky O, Flinker A, Hasson U. Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns. Nat Commun 2024; 15:2768. [PMID: 38553456 PMCID: PMC10980748 DOI: 10.1038/s41467-024-46631-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 03/04/2024] [Indexed: 04/02/2024] Open
Abstract
Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. We hypothesize that language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent language. To test this hypothesis, we densely record the neural activity patterns in the inferior frontal gyrus (IFG) of three participants using dense intracranial arrays while they listened to a 30-minute podcast. From these fine-grained spatiotemporal neural recordings, we derive a continuous vectorial representation for each word (i.e., a brain embedding) in each patient. Using stringent zero-shot mapping we demonstrate that brain embeddings in the IFG and the DLM contextual embedding space have common geometric patterns. The common geometric patterns allow us to predict the brain embedding in IFG of a given left-out word based solely on its geometrical relationship to other non-overlapping words in the podcast. Furthermore, we show that contextual embeddings capture the geometry of IFG embeddings better than static word embeddings. The continuous brain embedding space exposes a vector-based neural code for natural language processing in the human brain.
Collapse
Affiliation(s)
- Ariel Goldstein
- Business School, Data Science department and Cognitive Department, Hebrew University, Jerusalem, Israel.
- Google Research, Tel Aviv, Israel.
| | | | | | - Haocheng Wang
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Zhuoqiao Hong
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Bobbi Aubrey
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
- New York University Grossman School of Medicine, New York, NY, USA
| | | | - Samuel A Nastase
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Zaid Zada
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Eric Ham
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | - Harshvardhan Gazula
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | | | - Werner Doyle
- New York University Grossman School of Medicine, New York, NY, USA
| | - Sasha Devore
- New York University Grossman School of Medicine, New York, NY, USA
| | - Patricia Dugan
- New York University Grossman School of Medicine, New York, NY, USA
| | - Roi Reichart
- Faculty of Industrial Engineering and Management, Technion, Israel Institute of Technology, Haifa, Israel
| | - Daniel Friedman
- New York University Grossman School of Medicine, New York, NY, USA
| | - Michael Brenner
- Google Research, Tel Aviv, Israel
- School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA
| | | | - Orrin Devinsky
- New York University Grossman School of Medicine, New York, NY, USA
| | - Adeen Flinker
- New York University Grossman School of Medicine, New York, NY, USA
- New York University Tandon School of Engineering, Brooklyn, NY, USA
| | - Uri Hasson
- Google Research, Tel Aviv, Israel
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA
| |
Collapse
|
33
|
Marin Vargas A, Bisi A, Chiappa AS, Versteeg C, Miller LE, Mathis A. Task-driven neural network models predict neural dynamics of proprioception. Cell 2024; 187:1745-1761.e19. [PMID: 38518772 DOI: 10.1016/j.cell.2024.02.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 12/06/2023] [Accepted: 02/27/2024] [Indexed: 03/24/2024]
Abstract
Proprioception tells the brain the state of the body based on distributed sensory neurons. Yet, the principles that govern proprioceptive processing are poorly understood. Here, we employ a task-driven modeling approach to investigate the neural code of proprioceptive neurons in cuneate nucleus (CN) and somatosensory cortex area 2 (S1). We simulated muscle spindle signals through musculoskeletal modeling and generated a large-scale movement repertoire to train neural networks based on 16 hypotheses, each representing different computational goals. We found that the emerging, task-optimized internal representations generalize from synthetic data to predict neural dynamics in CN and S1 of primates. Computational tasks that aim to predict the limb position and velocity were the best at predicting the neural activity in both areas. Since task optimization develops representations that better predict neural activity during active than passive movements, we postulate that neural activity in the CN and S1 is top-down modulated during goal-directed movements.
Collapse
Affiliation(s)
- Alessandro Marin Vargas
- Brain Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; NeuroX Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Axel Bisi
- Brain Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; NeuroX Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Alberto S Chiappa
- Brain Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; NeuroX Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Chris Versteeg
- Department of Neuroscience, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60208, USA; Shirley Ryan AbilityLab, Chicago, IL 60611, USA
| | - Lee E Miller
- Department of Neuroscience, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60208, USA; Shirley Ryan AbilityLab, Chicago, IL 60611, USA
| | - Alexander Mathis
- Brain Mind Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; NeuroX Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.
| |
Collapse
|
34
|
Jahn CI, Markov NT, Morea B, Daw ND, Ebitz RB, Buschman TJ. Learning attentional templates for value-based decision-making. Cell 2024; 187:1476-1489.e21. [PMID: 38401541 DOI: 10.1016/j.cell.2024.01.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 12/18/2023] [Accepted: 01/25/2024] [Indexed: 02/26/2024]
Abstract
Attention filters sensory inputs to enhance task-relevant information. It is guided by an "attentional template" that represents the stimulus features that are currently relevant. To understand how the brain learns and uses templates, we trained monkeys to perform a visual search task that required them to repeatedly learn new attentional templates. Neural recordings found that templates were represented across the prefrontal and parietal cortex in a structured manner, such that perceptually neighboring templates had similar neural representations. When the task changed, a new attentional template was learned by incrementally shifting the template toward rewarded features. Finally, we found that attentional templates transformed stimulus features into a common value representation that allowed the same decision-making mechanisms to deploy attention, regardless of the identity of the template. Altogether, our results provide insight into the neural mechanisms by which the brain learns to control attention and how attention can be flexibly deployed across tasks.
Collapse
Affiliation(s)
- Caroline I Jahn
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA.
| | - Nikola T Markov
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA
| | - Britney Morea
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA
| | - Nathaniel D Daw
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Psychology, Princeton University, Princeton, NJ 08540, USA
| | - R Becket Ebitz
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Neurosciences, Université de Montréal, Montréal, QC H3C 3J7, Canada
| | - Timothy J Buschman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08540, USA; Department of Psychology, Princeton University, Princeton, NJ 08540, USA.
| |
Collapse
|
35
|
Bzdok D, Thieme A, Levkovskyy O, Wren P, Ray T, Reddy S. Data science opportunities of large language models for neuroscience and biomedicine. Neuron 2024; 112:698-717. [PMID: 38340718 DOI: 10.1016/j.neuron.2024.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/03/2024] [Accepted: 01/17/2024] [Indexed: 02/12/2024]
Abstract
Large language models (LLMs) are a new asset class in the machine-learning landscape. Here we offer a primer on defining properties of these modeling techniques. We then reflect on new modes of investigation in which LLMs can be used to reframe classic neuroscience questions to deliver fresh answers. We reason that LLMs have the potential to (1) enrich neuroscience datasets by adding valuable meta-information, such as advanced text sentiment, (2) summarize vast information sources to overcome divides between siloed neuroscience communities, (3) enable previously unthinkable fusion of disparate information sources relevant to the brain, (4) help deconvolve which cognitive concepts most usefully grasp phenomena in the brain, and much more.
Collapse
Affiliation(s)
- Danilo Bzdok
- Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada; TheNeuro - Montreal Neurological Institute (MNI), Department of Biomedical Engineering, McGill University, Montreal, QC, Canada.
| | | | | | - Paul Wren
- Mindstate Design Labs, San Francisco, CA, USA
| | - Thomas Ray
- Mindstate Design Labs, San Francisco, CA, USA
| | - Siva Reddy
- Mila - Quebec Artificial Intelligence Institute, Montreal, QC, Canada; Facebook CIFAR AI Chair; ServiceNow Research
| |
Collapse
|
36
|
Shain C, Meister C, Pimentel T, Cotterell R, Levy R. Large-scale evidence for logarithmic effects of word predictability on reading time. Proc Natl Acad Sci U S A 2024; 121:e2307876121. [PMID: 38422017 DOI: 10.1073/pnas.2307876121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/11/2023] [Indexed: 03/02/2024] Open
Abstract
During real-time language comprehension, our minds rapidly decode complex meanings from sequences of words. The difficulty of doing so is known to be related to words' contextual predictability, but what cognitive processes do these predictability effects reflect? In one view, predictability effects reflect facilitation due to anticipatory processing of words that are predictable from context. This view predicts a linear effect of predictability on processing demand. In another view, predictability effects reflect the costs of probabilistic inference over sentence interpretations. This view predicts either a logarithmic or a superlogarithmic effect of predictability on processing demand, depending on whether it assumes pressures toward a uniform distribution of information over time. The empirical record is currently mixed. Here, we revisit this question at scale: We analyze six reading datasets, estimate next-word probabilities with diverse statistical language models, and model reading times using recent advances in nonlinear regression. Results support a logarithmic effect of word predictability on processing difficulty, which favors probabilistic inference as a key component of human language processing.
Collapse
Affiliation(s)
- Cory Shain
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Clara Meister
- Department of Computer Science, Institute for Machine Learning, ETH Zürich, Zürich 8092, Schweiz
| | - Tiago Pimentel
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, United Kingdom
| | - Ryan Cotterell
- Department of Computer Science, Institute for Machine Learning, ETH Zürich, Zürich 8092, Schweiz
| | - Roger Levy
- Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139
| |
Collapse
|
37
|
Tuckute G, Sathe A, Srikant S, Taliaferro M, Wang M, Schrimpf M, Kay K, Fedorenko E. Driving and suppressing the human language network using large language models. Nat Hum Behav 2024; 8:544-561. [PMID: 38172630 DOI: 10.1038/s41562-023-01783-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 11/10/2023] [Indexed: 01/05/2024]
Abstract
Transformer models such as GPT generate human-like language and are predictive of human brain responses to language. Here, using functional-MRI-measured brain responses to 1,000 diverse sentences, we first show that a GPT-based encoding model can predict the magnitude of the brain response associated with each sentence. We then use the model to identify new sentences that are predicted to drive or suppress responses in the human language network. We show that these model-selected novel sentences indeed strongly drive and suppress the activity of human language areas in new individuals. A systematic analysis of the model-selected sentences reveals that surprisal and well-formedness of linguistic input are key determinants of response strength in the language network. These results establish the ability of neural network models to not only mimic human language but also non-invasively control neural activity in higher-level cortical areas, such as the language network.
Collapse
Affiliation(s)
- Greta Tuckute
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Aalok Sathe
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shashank Srikant
- Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- MIT-IBM Watson AI Lab, Cambridge, MA, USA
| | - Maya Taliaferro
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mingye Wang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Martin Schrimpf
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Quest for Intelligence, Massachusetts Institute of Technology, Cambridge, MA, USA
- Neuro-X Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Kendrick Kay
- Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN, USA
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
38
|
Sievers B, Thornton MA. Deep social neuroscience: the promise and peril of using artificial neural networks to study the social brain. Soc Cogn Affect Neurosci 2024; 19:nsae014. [PMID: 38334747 PMCID: PMC10880882 DOI: 10.1093/scan/nsae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 12/20/2023] [Accepted: 02/04/2024] [Indexed: 02/10/2024] Open
Abstract
This review offers an accessible primer to social neuroscientists interested in neural networks. It begins by providing an overview of key concepts in deep learning. It then discusses three ways neural networks can be useful to social neuroscientists: (i) building statistical models to predict behavior from brain activity; (ii) quantifying naturalistic stimuli and social interactions; and (iii) generating cognitive models of social brain function. These applications have the potential to enhance the clinical value of neuroimaging and improve the generalizability of social neuroscience research. We also discuss the significant practical challenges, theoretical limitations and ethical issues faced by deep learning. If the field can successfully navigate these hazards, we believe that artificial neural networks may prove indispensable for the next stage of the field's development: deep social neuroscience.
Collapse
Affiliation(s)
- Beau Sievers
- Department of Psychology, Stanford University, 420 Jane Stanford Way, Stanford, CA 94305, USA
- Department of Psychology, Harvard University, 33 Kirkland St., Cambridge, MA 02138, USA
| | - Mark A Thornton
- Department of Psychological and Brain Sciences, Dartmouth College, 6207 Moore Hall, Hanover, NH 03755, USA
| |
Collapse
|
39
|
Pezzulo G, Parr T, Cisek P, Clark A, Friston K. Generating meaning: active inference and the scope and limits of passive AI. Trends Cogn Sci 2024; 28:97-112. [PMID: 37973519 DOI: 10.1016/j.tics.2023.10.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 10/03/2023] [Accepted: 10/05/2023] [Indexed: 11/19/2023]
Abstract
Prominent accounts of sentient behavior depict brains as generative models of organismic interaction with the world, evincing intriguing similarities with current advances in generative artificial intelligence (AI). However, because they contend with the control of purposive, life-sustaining sensorimotor interactions, the generative models of living organisms are inextricably anchored to the body and world. Unlike the passive models learned by generative AI systems, they must capture and control the sensory consequences of action. This allows embodied agents to intervene upon their worlds in ways that constantly put their best models to the test, thus providing a solid bedrock that is - we argue - essential to the development of genuine understanding. We review the resulting implications and consider future directions for generative AI.
Collapse
Affiliation(s)
- Giovanni Pezzulo
- Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy.
| | - Thomas Parr
- Nuffield Department of Clinical Neurosciences, University of Oxford
| | - Paul Cisek
- Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
| | - Andy Clark
- Department of Philosophy, University of Sussex, Brighton, UK; Department of Informatics, University of Sussex, Brighton, UK; Department of Philosophy, Macquarie University, Sydney, New South Wales, Australia
| | - Karl Friston
- Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology, University College London, London, UK; VERSES AI Research Lab, Los Angeles, CA, USA
| |
Collapse
|
40
|
Ohmae K, Ohmae S. Emergence of syntax and word prediction in an artificial neural circuit of the cerebellum. Nat Commun 2024; 15:927. [PMID: 38296954 PMCID: PMC10831061 DOI: 10.1038/s41467-024-44801-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 01/03/2024] [Indexed: 02/02/2024] Open
Abstract
The cerebellum, interconnected with the cerebral neocortex, plays a vital role in human-characteristic cognition such as language processing, however, knowledge about the underlying circuit computation of the cerebellum remains very limited. To gain a better understanding of the computation underlying cerebellar language processing, we developed a biologically constrained cerebellar artificial neural network (cANN) model, which implements the recently identified cerebello-cerebellar recurrent pathway. We found that while cANN acquires prediction of future words, another function of syntactic recognition emerges in the middle layer of the prediction circuit. The recurrent pathway of the cANN was essential for the two language functions, whereas cANN variants with further biological constraints preserved these functions. Considering the uniform structure of cerebellar circuitry across all functional domains, the single-circuit computation, which is the common basis of the two language functions, can be generalized to fundamental cerebellar functions of prediction and grammar-like rule extraction from sequences, that underpin a wide range of cerebellar motor and cognitive functions. This is a pioneering study to understand the circuit computation of human-characteristic cognition using biologically-constrained ANNs.
Collapse
Affiliation(s)
- Keiko Ohmae
- Neuroscience Department, Baylor College of Medicine, Houston, TX, USA
- Chinese Institute for Brain Research (CIBR), Beijing, China
| | - Shogo Ohmae
- Neuroscience Department, Baylor College of Medicine, Houston, TX, USA.
- Chinese Institute for Brain Research (CIBR), Beijing, China.
| |
Collapse
|
41
|
Pang R, Baker C, Murthy M, Pillow J. Inferring neural dynamics of memory during naturalistic social communication. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.26.577404. [PMID: 38328156 PMCID: PMC10849655 DOI: 10.1101/2024.01.26.577404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Memory processes in complex behaviors like social communication require forming representations of the past that grow with time. The neural mechanisms that support such continually growing memory remain unknown. We address this gap in the context of fly courtship, a natural social behavior involving the production and perception of long, complex song sequences. To study female memory for male song history in unrestrained courtship, we present 'Natural Continuation' (NC)-a general, simulation-based model comparison procedure to evaluate candidate neural codes for complex stimuli using naturalistic behavioral data. Applying NC to fly courtship revealed strong evidence for an adaptive population mechanism for how female auditory neural dynamics could convert long song histories into a rich mnemonic format. Song temporal patterning is continually transformed by heterogeneous nonlinear adaptation dynamics, then integrated into persistent activity, enabling common neural mechanisms to retain continuously unfolding information over long periods and yielding state-of-the-art predictions of female courtship behavior. At a population level this coding model produces multi-dimensional advection-diffusion-like responses that separate songs over a continuum of timescales and can be linearly transformed into flexible output signals, illustrating its potential to create a generic, scalable mnemonic format for extended input signals poised to drive complex behavioral responses. This work thus shows how naturalistic behavior can directly inform neural population coding models, revealing here a novel process for memory formation.
Collapse
Affiliation(s)
- Rich Pang
- Princeton Neuroscience Institute, Princeton, NJ, USA
- Center for the Physics of Biological Function, Princeton, NJ and New York, NY, USA
| | - Christa Baker
- Princeton Neuroscience Institute, Princeton, NJ, USA
- Present address: Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| | - Mala Murthy
- Princeton Neuroscience Institute, Princeton, NJ, USA
| | | |
Collapse
|
42
|
Badrulhisham F, Pogatzki-Zahn E, Segelcke D, Spisak T, Vollert J. Machine learning and artificial intelligence in neuroscience: A primer for researchers. Brain Behav Immun 2024; 115:470-479. [PMID: 37972877 DOI: 10.1016/j.bbi.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 10/16/2023] [Accepted: 11/08/2023] [Indexed: 11/19/2023] Open
Abstract
Artificial intelligence (AI) is often used to describe the automation of complex tasks that we would attribute intelligence to. Machine learning (ML) is commonly understood as a set of methods used to develop an AI. Both have seen a recent boom in usage, both in scientific and commercial fields. For the scientific community, ML can solve bottle necks created by complex, multi-dimensional data generated, for example, by functional brain imaging or *omics approaches. ML can here identify patterns that could not have been found using traditional statistic approaches. However, ML comes with serious limitations that need to be kept in mind: their tendency to optimise solutions for the input data means it is of crucial importance to externally validate any findings before considering them more than a hypothesis. Their black-box nature implies that their decisions usually cannot be understood, which renders their use in medical decision making problematic and can lead to ethical issues. Here, we present an introduction for the curious to the field of ML/AI. We explain the principles as commonly used methods as well as recent methodological advancements before we discuss risks and what we see as future directions of the field. Finally, we show practical examples of neuroscience to illustrate the use and limitations of ML.
Collapse
Affiliation(s)
| | - Esther Pogatzki-Zahn
- Department of Anaesthesiology, Intensive Care and Pain Medicine, University Hospital Muenster, Muenster, Germany
| | - Daniel Segelcke
- Department of Anaesthesiology, Intensive Care and Pain Medicine, University Hospital Muenster, Muenster, Germany
| | - Tamas Spisak
- Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Medicine Essen, Essen, Germany; Center for Translational Neuro- and Behavioral Sciences, Department of Neurology, University Medicine Essen, Essen, Germany
| | - Jan Vollert
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom; Pain Research, Department of Surgery and Cancer, Imperial College London, London, United Kingdom.
| |
Collapse
|
43
|
Bi Z. Cognition of Time and Thinking Beyond. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1455:171-195. [PMID: 38918352 DOI: 10.1007/978-3-031-60183-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
A common research protocol in cognitive neuroscience is to train subjects to perform deliberately designed experiments while recording brain activity, with the aim of understanding the brain mechanisms underlying cognition. However, how the results of this protocol of research can be applied in technology is seldom discussed. Here, I review the studies on time processing of the brain as examples of this research protocol, as well as two main application areas of neuroscience (neuroengineering and brain-inspired artificial intelligence). Time processing is a fundamental dimension of cognition, and time is also an indispensable dimension of any real-world signal to be processed in technology. Therefore, one may expect that the studies of time processing in cognition profoundly influence brain-related technology. Surprisingly, I found that the results from cognitive studies on timing processing are hardly helpful in solving practical problems. This awkward situation may be due to the lack of generalizability of the results of cognitive studies, which are under well-controlled laboratory conditions, to real-life situations. This lack of generalizability may be rooted in the fundamental unknowability of the world (including cognition). Overall, this paper questions and criticizes the usefulness and prospect of the abovementioned research protocol of cognitive neuroscience. I then give three suggestions for future research. First, to improve the generalizability of research, it is better to study brain activity under real-life conditions instead of in well-controlled laboratory experiments. Second, to overcome the unknowability of the world, we can engineer an easily accessible surrogate of the object under investigation, so that we can predict the behavior of the object under investigation by experimenting on the surrogate. Third, the paper calls for technology-oriented research, with the aim of technology creation instead of knowledge discovery.
Collapse
Affiliation(s)
- Zedong Bi
- Lingang Laboratory, Shanghai, China.
- Institute for Future, Qingdao University, Qingdao, China.
- School of Automation, Shandong Key Laboratory of Industrial Control Technology, Qingdao University, Qingdao, China.
| |
Collapse
|
44
|
Gwilliams L, Flick G, Marantz A, Pylkkänen L, Poeppel D, King JR. Introducing MEG-MASC a high-quality magneto-encephalography dataset for evaluating natural speech processing. Sci Data 2023; 10:862. [PMID: 38049487 PMCID: PMC10695966 DOI: 10.1038/s41597-023-02752-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/16/2023] [Indexed: 12/06/2023] Open
Abstract
The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers who listened to two hours of naturalistic stories. Each participant performed two identical sessions, involving listening to four fictional stories from the Manually Annotated Sub-Corpus (MASC) intermixed with random word lists and comprehension questions. We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS). This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech. We provide the Python code to replicate several validations analyses of the MEG evoked responses such as the temporal decoding of phonetic features and word frequency. All code and MEG, audio and text data are publicly available to keep with best practices in transparent and reproducible research.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Psychology, Stanford University, Stanford, USA.
- Department of Psychology, New York University, New York, USA.
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates.
| | - Graham Flick
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
- Rotman Research Institute, Baycrest Hospital, Toronto, Canada
| | - Alec Marantz
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
| | - Liina Pylkkänen
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, United Arab Emirates
- Department of Linguistics, New York University, New York, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, USA
- Ernst Struengmann Institute for Neuroscience, Frankfurt, Germany
| | - Jean-Rémi King
- Department of Psychology, New York University, New York, USA
- LSP, École normale supérieure, PSL University, CNRS, 75005, Paris, France
| |
Collapse
|
45
|
van der Burght CL, Friederici AD, Maran M, Papitto G, Pyatigorskaya E, Schroën JAM, Trettenbrein PC, Zaccarella E. Cleaning up the Brickyard: How Theory and Methodology Shape Experiments in Cognitive Neuroscience of Language. J Cogn Neurosci 2023; 35:2067-2088. [PMID: 37713672 DOI: 10.1162/jocn_a_02058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/17/2023]
Abstract
The capacity for language is a defining property of our species, yet despite decades of research, evidence on its neural basis is still mixed and a generalized consensus is difficult to achieve. We suggest that this is partly caused by researchers defining "language" in different ways, with focus on a wide range of phenomena, properties, and levels of investigation. Accordingly, there is very little agreement among cognitive neuroscientists of language on the operationalization of fundamental concepts to be investigated in neuroscientific experiments. Here, we review chains of derivation in the cognitive neuroscience of language, focusing on how the hypothesis under consideration is defined by a combination of theoretical and methodological assumptions. We first attempt to disentangle the complex relationship between linguistics, psychology, and neuroscience in the field. Next, we focus on how conclusions that can be drawn from any experiment are inherently constrained by auxiliary assumptions, both theoretical and methodological, on which the validity of conclusions drawn rests. These issues are discussed in the context of classical experimental manipulations as well as study designs that employ novel approaches such as naturalistic stimuli and computational modeling. We conclude by proposing that a highly interdisciplinary field such as the cognitive neuroscience of language requires researchers to form explicit statements concerning the theoretical definitions, methodological choices, and other constraining factors involved in their work.
Collapse
Affiliation(s)
| | - Angela D Friederici
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Matteo Maran
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | - Giorgio Papitto
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | - Elena Pyatigorskaya
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | - Joëlle A M Schroën
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
| | - Patrick C Trettenbrein
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- International Max Planck Research School on Neuroscience of Communication, Leipzig, Germany
- University of Göttingen, Göttingen, Germany
| | - Emiliano Zaccarella
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
46
|
Tang J, Du M, Vo VA, Lal V, Huth AG. Brain encoding models based on multimodal transformers can transfer across language and vision. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:29654-29666. [PMID: 39015152 PMCID: PMC11250991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Encoding models have been used to assess how the human brain represents concepts in language and vision. While language and vision rely on similar concept representations, current encoding models are typically trained and tested on brain responses to each modality in isolation. Recent advances in multimodal pretraining have produced transformers that can extract aligned representations of concepts in language and vision. In this work, we used representations from multimodal transformers to train encoding models that can transfer across fMRI responses to stories and movies. We found that encoding models trained on brain responses to one modality can successfully predict brain responses to the other modality, particularly in cortical regions that represent conceptual meaning. Further analysis of these encoding models revealed shared semantic dimensions that underlie concept representations in language and vision. Comparing encoding models trained using representations from multimodal and unimodal transformers, we found that multimodal transformers learn more aligned representations of concepts in language and vision. Our results demonstrate how multimodal transformers can provide insights into the brain's capacity for multimodal processing.
Collapse
|
47
|
Li Y, Anumanchipalli GK, Mohamed A, Chen P, Carney LH, Lu J, Wu J, Chang EF. Dissecting neural computations in the human auditory pathway using deep neural networks for speech. Nat Neurosci 2023; 26:2213-2225. [PMID: 37904043 PMCID: PMC10689246 DOI: 10.1038/s41593-023-01468-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/13/2023] [Indexed: 11/01/2023]
Abstract
The human auditory system extracts rich linguistic abstractions from speech signals. Traditional approaches to understanding this complex process have used linear feature-encoding models, with limited success. Artificial neural networks excel in speech recognition tasks and offer promising computational models of speech processing. We used speech representations in state-of-the-art deep neural network (DNN) models to investigate neural coding from the auditory nerve to the speech cortex. Representations in hierarchical layers of the DNN correlated well with the neural activity throughout the ascending auditory system. Unsupervised speech models performed at least as well as other purely supervised or fine-tuned models. Deeper DNN layers were better correlated with the neural activity in the higher-order auditory cortex, with computations aligned with phonemic and syllabic structures in speech. Accordingly, DNN models trained on either English or Mandarin predicted cortical responses in native speakers of each language. These results reveal convergence between DNN model representations and the biological auditory pathway, offering new approaches for modeling neural coding in the auditory cortex.
Collapse
Affiliation(s)
- Yuanning Li
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
| | - Gopala K Anumanchipalli
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | | | - Peili Chen
- School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materialsand Devices, ShanghaiTech University, Shanghai, China
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Junfeng Lu
- Neurologic Surgery Department, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- Brain Function Laboratory, Neurosurgical Institute, Fudan University, Shanghai, China
| | - Jinsong Wu
- Neurologic Surgery Department, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China
- Brain Function Laboratory, Neurosurgical Institute, Fudan University, Shanghai, China
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
48
|
Antony JW, Van Dam J, Massey JR, Barnett AJ, Bennion KA. Long-term, multi-event surprise correlates with enhanced autobiographical memory. Nat Hum Behav 2023; 7:2152-2168. [PMID: 37322234 DOI: 10.1038/s41562-023-01631-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 05/16/2023] [Indexed: 06/17/2023]
Abstract
Neurobiological and psychological models of learning emphasize the importance of prediction errors (surprises) for memory formation. This relationship has been shown for individual momentary surprising events; however, it is less clear whether surprise that unfolds across multiple events and timescales is also linked with better memory of those events. We asked basketball fans about their most positive and negative autobiographical memories of individual plays, games and seasons, allowing surprise measurements spanning seconds, hours and months. We used advanced analytics on National Basketball Association play-by-play data and betting odds spanning 17 seasons, more than 22,000 games and more than 5.6 million plays to compute and align the estimated surprise value of each memory. We found that surprising events were associated with better recall of positive memories on the scale of seconds and months and negative memories across all three timescales. Game and season memories could not be explained by surprise at shorter timescales, suggesting that long-term, multi-event surprise correlates with memory. These results expand notions of surprise in models of learning and reinforce its relevance in real-world domains.
Collapse
Affiliation(s)
- James W Antony
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, CA, USA.
| | - Jacob Van Dam
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, CA, USA
| | - Jarett R Massey
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, CA, USA
| | | | - Kelly A Bennion
- Department of Psychology and Child Development, California Polytechnic State University, San Luis Obispo, CA, USA
| |
Collapse
|
49
|
Bruera A, Tao Y, Anderson A, Çokal D, Haber J, Poesio M. Modeling Brain Representations of Words' Concreteness in Context Using GPT-2 and Human Ratings. Cogn Sci 2023; 47:e13388. [PMID: 38103208 DOI: 10.1111/cogs.13388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 09/12/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023]
Abstract
The meaning of most words in language depends on their context. Understanding how the human brain extracts contextualized meaning, and identifying where in the brain this takes place, remain important scientific challenges. But technological and computational advances in neuroscience and artificial intelligence now provide unprecedented opportunities to study the human brain in action as language is read and understood. Recent contextualized language models seem to be able to capture homonymic meaning variation ("bat", in a baseball vs. a vampire context), as well as more nuanced differences of meaning-for example, polysemous words such as "book", which can be interpreted in distinct but related senses ("explain a book", information, vs. "open a book", object) whose differences are fine-grained. We study these subtle differences in lexical meaning along the concrete/abstract dimension, as they are triggered by verb-noun semantic composition. We analyze functional magnetic resonance imaging (fMRI) activations elicited by Italian verb phrases containing nouns whose interpretation is affected by the verb to different degrees. By using a contextualized language model and human concreteness ratings, we shed light on where in the brain such fine-grained meaning variation takes place and how it is coded. Our results show that phrase concreteness judgments and the contextualized model can predict BOLD activation associated with semantic composition within the language network. Importantly, representations derived from a complex, nonlinear composition process consistently outperform simpler composition approaches. This is compatible with a holistic view of semantic composition in the brain, where semantic representations are modified by the process of composition itself. When looking at individual brain areas, we find that encoding performance is statistically significant, although with differing patterns of results, suggesting differential involvement, in the posterior superior temporal sulcus, inferior frontal gyrus and anterior temporal lobe, and in motor areas previously associated with processing of concreteness/abstractness.
Collapse
Affiliation(s)
- Andrea Bruera
- School of Electronic Engineering and Computer Science, Cognitive Science Research Group, Queen Mary University of London
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences
| | - Yuan Tao
- Department of Cognitive Science, Johns Hopkins University
| | | | - Derya Çokal
- Department of German Language and Literature I-Linguistics, University of Cologne
| | - Janosch Haber
- School of Electronic Engineering and Computer Science, Cognitive Science Research Group, Queen Mary University of London
- Chattermill, London
| | - Massimo Poesio
- School of Electronic Engineering and Computer Science, Cognitive Science Research Group, Queen Mary University of London
- Department of Information and Computing Sciences, University of Utrecht
| |
Collapse
|
50
|
Zou J, Zhang Y, Li J, Tian X, Ding N. Human attention during goal-directed reading comprehension relies on task optimization. eLife 2023; 12:RP87197. [PMID: 38032825 PMCID: PMC10688971 DOI: 10.7554/elife.87197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023] Open
Abstract
The computational principles underlying attention allocation in complex goal-directed tasks remain elusive. Goal-directed reading, that is, reading a passage to answer a question in mind, is a common real-world task that strongly engages attention. Here, we investigate what computational models can explain attention distribution in this complex task. We show that the reading time on each word is predicted by the attention weights in transformer-based deep neural networks (DNNs) optimized to perform the same reading task. Eye tracking further reveals that readers separately attend to basic text features and question-relevant information during first-pass reading and rereading, respectively. Similarly, text features and question relevance separately modulate attention weights in shallow and deep DNN layers. Furthermore, when readers scan a passage without a question in mind, their reading time is predicted by DNNs optimized for a word prediction task. Therefore, we offer a computational account of how task optimization modulates attention distribution during real-world reading.
Collapse
Affiliation(s)
- Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang UniversityHangzhouChina
- Nanhu Brain-computer Interface InstituteHangzhouChina
| | - Yuran Zhang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang UniversityHangzhouChina
| | - Jialu Li
- Division of Arts and Sciences, New York University ShanghaiShanghaiChina
| | - Xing Tian
- Division of Arts and Sciences, New York University ShanghaiShanghaiChina
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang UniversityHangzhouChina
- Nanhu Brain-computer Interface InstituteHangzhouChina
| |
Collapse
|