1
|
Bhatia S, Galesic M, Mitchell M. Editorial for the Special Issue on Algorithms in Our Lives. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024; 19:707-710. [PMID: 38165782 DOI: 10.1177/17456916231214452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Affiliation(s)
- Sudeep Bhatia
- Department of Psychology, University of Pennsylvania
- Wharton School, University of Pennsylvania
| | - Mirta Galesic
- Santa Fe Institute, Santa Fe, New Mexico
- Complexity Science Hub Vienna, Vienna, Austria
- Vermont Complex Systems Center, University of Vermont
- Harding Center for Risk Literacy, University of Potsdam
| | | |
Collapse
|
2
|
Strachan JWA, Albergo D, Borghini G, Pansardi O, Scaliti E, Gupta S, Saxena K, Rufo A, Panzeri S, Manzi G, Graziano MSA, Becchio C. Testing theory of mind in large language models and humans. Nat Hum Behav 2024; 8:1285-1295. [PMID: 38769463 PMCID: PMC11272575 DOI: 10.1038/s41562-024-01882-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/05/2024] [Indexed: 05/22/2024]
Abstract
At the core of what defines us as humans is the concept of theory of mind: the ability to track other people's mental states. The recent development of large language models (LLMs) such as ChatGPT has led to intense debate about the possibility that these models exhibit behaviour that is indistinguishable from human behaviour in theory of mind tasks. Here we compare human and LLM performance on a comprehensive battery of measurements that aim to measure different theory of mind abilities, from understanding false beliefs to interpreting indirect requests and recognizing irony and faux pas. We tested two families of LLMs (GPT and LLaMA2) repeatedly against these measures and compared their performance with those from a sample of 1,907 human participants. Across the battery of theory of mind tests, we found that GPT-4 models performed at, or even sometimes above, human levels at identifying indirect requests, false beliefs and misdirection, but struggled with detecting faux pas. Faux pas, however, was the only test where LLaMA2 outperformed humans. Follow-up manipulations of the belief likelihood revealed that the superiority of LLaMA2 was illusory, possibly reflecting a bias towards attributing ignorance. By contrast, the poor performance of GPT originated from a hyperconservative approach towards committing to conclusions rather than from a genuine failure of inference. These findings not only demonstrate that LLMs exhibit behaviour that is consistent with the outputs of mentalistic inference in humans but also highlight the importance of systematic testing to ensure a non-superficial comparison between human and artificial intelligences.
Collapse
Affiliation(s)
- James W A Strachan
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Dalila Albergo
- Cognition, Motion and Neuroscience, Italian Institute of Technology, Genoa, Italy
- Center for Mind/Brain Sciences, University of Trento, Rovereto, Italy
| | - Giulia Borghini
- Cognition, Motion and Neuroscience, Italian Institute of Technology, Genoa, Italy
| | - Oriana Pansardi
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Cognition, Motion and Neuroscience, Italian Institute of Technology, Genoa, Italy
- Department of Psychology, University of Turin, Turin, Italy
| | - Eugenio Scaliti
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Cognition, Motion and Neuroscience, Italian Institute of Technology, Genoa, Italy
- Department of Management, 'Valter Cantino', University of Turin, Turin, Italy
- Human Science and Technologies, University of Turin, Turin, Italy
| | | | | | | | - Stefano Panzeri
- Institute for Neural Information Processing, Center for Molecular Neurobiology, University Medical Center Hamburg- Eppendorf, Hamburg, Germany
| | | | | | - Cristina Becchio
- Department of Neurology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
- Cognition, Motion and Neuroscience, Italian Institute of Technology, Genoa, Italy.
| |
Collapse
|
3
|
Mahowald K, Ivanova AA, Blank IA, Kanwisher N, Tenenbaum JB, Fedorenko E. Dissociating language and thought in large language models. Trends Cogn Sci 2024; 28:517-540. [PMID: 38508911 DOI: 10.1016/j.tics.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 03/22/2024]
Abstract
Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence (knowledge of linguistic rules and patterns) and functional linguistic competence (understanding and using language in the world). We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. We posit that models that use language in human-like ways would need to master both of these competence types, which, in turn, could require the emergence of separate mechanisms specialized for formal versus functional linguistic competence.
Collapse
|
4
|
Palmer A, Smith NA, Spirling A. Using proprietary language models in academic research requires explicit justification. NATURE COMPUTATIONAL SCIENCE 2024; 4:2-3. [PMID: 38177494 DOI: 10.1038/s43588-023-00585-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
| | - Noah A Smith
- University of Washington and Allen Institute for AI, Seattle, WA, USA
| | | |
Collapse
|