1
|
Marjieh R, Sucholutsky I, van Rijn P, Jacoby N, Griffiths TL. Large language models predict human sensory judgments across six modalities. Sci Rep 2024; 14:21445. [PMID: 39271909 PMCID: PMC11399123 DOI: 10.1038/s41598-024-72071-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 09/03/2024] [Indexed: 09/15/2024] Open
Abstract
Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality, and provides highly correlated predictions with human data irrespective of whether direct visual input is provided or purely textual descriptors. To study the impact of specific languages, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.
Collapse
Affiliation(s)
- Raja Marjieh
- Department of Psychology, Princeton University, Princeton, USA.
| | - Ilia Sucholutsky
- Department of Computer Science, Princeton University, Princeton, USA
| | - Pol van Rijn
- Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Nori Jacoby
- Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology, Cornell University, Ithaca, USA
| | - Thomas L Griffiths
- Department of Psychology, Princeton University, Princeton, USA
- Department of Computer Science, Princeton University, Princeton, USA
| |
Collapse
|
2
|
Sia MY, Mather E, Crocker MW, Mani N. The role of systematicity in early referent selection. Dev Sci 2024; 27:e13444. [PMID: 37667460 DOI: 10.1111/desc.13444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/02/2023] [Accepted: 08/13/2023] [Indexed: 09/06/2023]
Abstract
Previous studies showed that word learning is affected by children's existing knowledge. For instance, knowledge of semantic category aids word learning, whereas a dense phonological neighbourhood impedes learning of similar-sounding words. Here, we examined to what extent children associate similar-sounding words (e.g., rat and cat) with objects of the same semantic category (e.g., both are animals), that is, to what extent children assume meaning overlap given form overlap between two words. We tested this by first presenting children (N = 93, Mage = 22.4 months) with novel word-object associations. Then, we examined the extent to which children assume that a similar sounding novel label, that is, a phonological neighbour, refers to a similar looking object, that is, a likely semantic neighbour, as opposed to a dissimilar looking object. Were children to preferentially fixate the similar-looking novel object, it would suggest that systematic word form-meaning relations aid referent selection in young children. While we did not find any evidence for such word form-meaning systematicity, we demonstrated that children showed robust learning for the trained novel word-object associations, and were able to discriminate between similar-sounding labels and also similar-looking objects. Thus, we argue that unlike iconicity which appears early in vocabulary development, we find no evidence for systematicity in early referent selection.
Collapse
Affiliation(s)
- Ming Yean Sia
- Department for Psychology of Language, University of Göttingen, Göttingen, Germany
- Leibniz Science Campus Primate Cognition, Göttingen, Germany
| | - Emily Mather
- School of Psychology and Social Work, University of Hull, Hull, UK
| | - Matthew W Crocker
- Department of Language Science & Technology, Saarland University, Saarbrücken, Germany
| | - Nivedita Mani
- Department for Psychology of Language, University of Göttingen, Göttingen, Germany
- Leibniz Science Campus Primate Cognition, Göttingen, Germany
| |
Collapse
|
3
|
Hoang NL, Taniguchi T, Hagiwara Y, Taniguchi A. Emergent communication of multimodal deep generative models based on Metropolis-Hastings naming game. Front Robot AI 2024; 10:1290604. [PMID: 38356917 PMCID: PMC10864618 DOI: 10.3389/frobt.2023.1290604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/18/2023] [Indexed: 02/16/2024] Open
Abstract
Deep generative models (DGM) are increasingly employed in emergent communication systems. However, their application in multimodal data contexts is limited. This study proposes a novel model that combines multimodal DGM with the Metropolis-Hastings (MH) naming game, enabling two agents to focus jointly on a shared subject and develop common vocabularies. The model proves that it can handle multimodal data, even in cases of missing modalities. Integrating the MH naming game with multimodal variational autoencoders (VAE) allows agents to form perceptual categories and exchange signs within multimodal contexts. Moreover, fine-tuning the weight ratio to favor a modality that the model could learn and categorize more readily improved communication. Our evaluation of three multimodal approaches - mixture-of-experts (MoE), product-of-experts (PoE), and mixture-of-product-of-experts (MoPoE)-suggests an impact on the creation of latent spaces, the internal representations of agents. Our results from experiments with the MNIST + SVHN and Multimodal165 datasets indicate that combining the Gaussian mixture model (GMM), PoE multimodal VAE, and MH naming game substantially improved information sharing, knowledge formation, and data reconstruction.
Collapse
Affiliation(s)
- Nguyen Le Hoang
- Graduate School of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan
| | - Tadahiro Taniguchi
- College of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan
| | - Yoshinobu Hagiwara
- Research Organization of Science and Technology, Ritsumeikan University, Kusatsu, Shiga, Japan
| | - Akira Taniguchi
- College of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan
| |
Collapse
|
4
|
van de Pol I, Lodder P, van Maanen L, Steinert-Threlkeld S, Szymanik J. Quantifiers satisfying semantic universals have shorter minimal description length. Cognition 2023; 232:105150. [PMID: 36563568 DOI: 10.1016/j.cognition.2022.105150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 04/22/2022] [Accepted: 04/25/2022] [Indexed: 12/24/2022]
Abstract
Despite wide variation among natural languages, there are linguistic properties thought to be universal to all or nearly all languages. Here, we consider universals at the semantic level, in the domain of quantifiers, which are given by the properties of monotonicity, quantity, and conservativity, and we investigate whether these universals might be explained by differences in complexity. First, we use a minimal pair methodology and compare the complexities of individual quantifiers using approximate Kolmogorov complexity. Second, we use a simple yet expressive grammar to generate a large collection of quantifiers and we investigate their complexities at an aggregate level in terms of both their minimal description lengths and their approximate Kolmogorov complexities. For minimal description length we find that quantifiers satisfying semantic universals are simpler: they have a shorter minimal description length. For approximate Kolmogorov complexity we find that monotone quantifiers have a lower Kolmogorov complexity than non-monotone quantifiers and for quantity and conservativity we find that approximate Kolmogorov complexity does not scale robustly. These results suggest that the simplicity of quantifier meanings, in terms of their minimal description length, partially explains the presence of semantic universals in the domain of quantifiers.
Collapse
Affiliation(s)
- Iris van de Pol
- Institute for Logic, Language and Computation, University of Amsterdam, the Netherlands.
| | - Paul Lodder
- Institute for Logic, Language and Computation, University of Amsterdam, the Netherlands
| | | | | | - Jakub Szymanik
- Center for Mind/Brain Sciences and Dept. of Information Engineering and Computer Science, University of Trento, Italy
| |
Collapse
|
5
|
Ohmer X, Marino M, Franke M, König P. Mutual influence between language and perception in multi-agent communication games. PLoS Comput Biol 2022; 18:e1010658. [PMID: 36315590 PMCID: PMC9648844 DOI: 10.1371/journal.pcbi.1010658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 11/10/2022] [Accepted: 10/14/2022] [Indexed: 11/12/2022] Open
Abstract
Language interfaces with many other cognitive domains. This paper explores how interactions at these interfaces can be studied with deep learning methods, focusing on the relation between language emergence and visual perception. To model the emergence of language, a sender and a receiver agent are trained on a reference game. The agents are implemented as deep neural networks, with dedicated vision and language modules. Motivated by the mutual influence between language and perception in cognition, we apply systematic manipulations to the agents’ (i) visual representations, to analyze the effects on emergent communication, and (ii) communication protocols, to analyze the effects on visual representations. Our analyses show that perceptual biases shape semantic categorization and communicative content. Conversely, if the communication protocol partitions object space along certain attributes, agents learn to represent visual information about these attributes more accurately, and the representations of communication partners align. Finally, an evolutionary analysis suggests that visual representations may be shaped in part to facilitate the communication of environmentally relevant distinctions. Aside from accounting for co-adaptation effects between language and perception, our results point out ways to modulate and improve visual representation learning and emergent communication in artificial agents. Language is grounded in the world and used to coordinate and achieve common objectives. We simulate grounded, interactive language use with a communication game. A sender refers to an object in the environment and if the receiver selects the correct object both agents are rewarded. By practicing the game, the agents develop their own communication protocol. We use this setup to study interactions between emerging language and visual perception. Agents are implemented as neural networks with dedicated vision modules to process images of objects. By manipulating their visual representations we can show how variations in perception are reflected in linguistic variations. Conversely, we demonstrate that differences in language are reflected in the agents’ visual representations. Our simulations mirror several empirically observed phenomena: labels for concrete objects and properties (e.g., “striped”, “bowl”) group together visually similar objects, object representations adapt to the categories imposed by language, and representational spaces between communication partners align. In addition, an evolutionary analysis suggests that visual representations may be shaped, in part, to facilitate communication about environmentally relevant information. In sum, we use communication games with neural network agents to model co-adaptation effects between language and visual perception. Future work could apply this computational framework to other interfaces between language and cognition.
Collapse
Affiliation(s)
- Xenia Ohmer
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
- * E-mail:
| | - Michael Marino
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
| | - Michael Franke
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
- Department of Linguistics, University of Tübingen, Tübingen, Germany
| | - Peter König
- Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
6
|
de Vries JP, Akbarinia A, Flachot A, Gegenfurtner KR. Emergent color categorization in a neural network trained for object recognition. eLife 2022; 11:76472. [PMID: 36511778 PMCID: PMC9797187 DOI: 10.7554/elife.76472] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 12/11/2022] [Indexed: 12/14/2022] Open
Abstract
Color is a prime example of categorical perception, yet it is unclear why and how color categories emerge. On the one hand, prelinguistic infants and several animals treat color categorically. On the other hand, recent modeling endeavors have successfully utilized communicative concepts as the driving force for color categories. Rather than modeling categories directly, we investigate the potential emergence of color categories as a result of acquiring visual skills. Specifically, we asked whether color is represented categorically in a convolutional neural network (CNN) trained to recognize objects in natural images. We systematically trained new output layers to the CNN for a color classification task and, probing novel colors, found borders that are largely invariant to the training colors. The border locations were confirmed using an evolutionary algorithm that relies on the principle of categorical perception. A psychophysical experiment on human observers, analogous to our primary CNN experiment, shows that the borders agree to a large degree with human category boundaries. These results provide evidence that the development of basic visual skills can contribute to the emergence of a categorical representation of color.
Collapse
Affiliation(s)
| | | | - Alban Flachot
- Experimental Psychology, Giessen UniversityGiessenGermany,Center for Vision Research, Department of Psychology, York UniversityTorontoCanada
| | | |
Collapse
|
7
|
Karjus A, Blythe RA, Kirby S, Wang T, Smith K. Conceptual Similarity and Communicative Need Shape Colexification: An Experimental Study. Cogn Sci 2021; 45:e13035. [PMID: 34491584 PMCID: PMC9285023 DOI: 10.1111/cogs.13035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 06/18/2021] [Accepted: 07/19/2021] [Indexed: 11/28/2022]
Abstract
Colexification refers to the phenomenon of multiple meanings sharing one word in a language. Cross‐linguistic lexification patterns have been shown to be largely predictable, as similar concepts are often colexified. We test a recent claim that, beyond this general tendency, communicative needs play an important role in shaping colexification patterns. We approach this question by means of a series of human experiments, using an artificial language communication game paradigm. Our results across four experiments match the previous cross‐linguistic findings: all other things being equal, speakers do prefer to colexify similar concepts. However, we also find evidence supporting the communicative need hypothesis: when faced with a frequent need to distinguish similar pairs of meanings, speakadjust their colexification preferences to maintain communicative efficiency and avoid colexifying those similar meanings which need to be distinguished in communication. This research provides further evidence to support the argument that languages are shaped by the needs and preferences of their speakers.
Collapse
Affiliation(s)
- Andres Karjus
- ERA Chair for Cultural Data Analytics, Tallinn University.,School of Humanities, Tallinn University.,Centre for Language Evolution, School of Philosophy, Psychology and Language Sciences, University of Edinburgh
| | - Richard A Blythe
- Centre for Language Evolution, School of Philosophy, Psychology and Language Sciences, University of Edinburgh.,School of Physics and Astronomy, University of Edinburgh
| | - Simon Kirby
- Centre for Language Evolution, School of Philosophy, Psychology and Language Sciences, University of Edinburgh
| | | | - Kenny Smith
- Centre for Language Evolution, School of Philosophy, Psychology and Language Sciences, University of Edinburgh
| |
Collapse
|
8
|
Abstract
Cognition is often defined as a dual process of physical and non-physical mechanisms. This duality originated from past theory on the constituent parts of the natural world. Even though material causation is not an explanation for all natural processes, phenomena at the cellular level of life are modeled by physical causes. These phenomena include explanations for the function of organ systems, including the nervous system and information processing in the cerebrum. This review restricts the definition of cognition to a mechanistic process and enlists studies that support an abstract set of proximate mechanisms. Specifically, this process is approached from a large-scale perspective, the flow of information in a neural system. Study at this scale further constrains the possible explanations for cognition since the information flow is amenable to theory, unlike a lower-level approach where the problem becomes intractable. These possible hypotheses include stochastic processes for explaining the processes of cognition along with principles that support an abstract format for the encoded information.
Collapse
|
9
|
Chou PH, Chien TW, Yang TY, Yeh YT, Chou W, Yeh CH. Predicting Active NBA Players Most Likely to Be Inducted into the Basketball Hall of Famers Using Artificial Neural Networks in Microsoft Excel: Development and Usability Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18084256. [PMID: 33923846 PMCID: PMC8072800 DOI: 10.3390/ijerph18084256] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 03/18/2021] [Accepted: 03/25/2021] [Indexed: 12/11/2022]
Abstract
The prediction of whether active NBA players can be inducted into the Hall of Fame (HOF) is interesting and important. However, no such research have been published in the literature, particularly using the artificial neural network (ANN) technique. The aim of this study is to build an ANN model with an app for automatic prediction and classification of HOF for NBA players. We downloaded 4728 NBA players’ data of career stats and accolades from the website at basketball-reference.com. The training sample was collected from 85 HOF members and 113 retired Non-HOF players based on completed data and a longer career length (≥15 years). Featured variables were taken from the higher correlation coefficients (<0.1) with HOF and significant deviations apart from the two HOF/Non-HOF groups using logistical regression. Two models (i.e., ANN and convolutional neural network, CNN) were compared in model accuracy (e.g., sensitivity, specificity, area under the receiver operating characteristic curve, AUC). An app predicting HOF was then developed involving the model’s parameters. We observed that (1) 20 feature variables in the ANN model yielded a higher AUC of 0.93 (95% CI 0.93–0.97) based on the 198-case training sample, (2) the ANN performed better than CNN on the accuracy of AUC (= 0.91, 95% CI 0.87–0.95), and (3) an ready and available app for predicting HOF was successfully developed. The 20-variable ANN model with the 53 parameters estimated by the ANN for improving the accuracy of HOF has been developed. The app can help NBA fans to predict their players likely to be inducted into the HOF and is not just limited to the active NBA players.
Collapse
Affiliation(s)
- Po-Hsin Chou
- Department of Orthopedics and Traumatology, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- School of Medicine, National Yang Ming Chiao Tung University, Taipei 112, Taiwan
| | - Tsair-Wei Chien
- Department of Medical Research, Chi-Mei Medical Center, Tainan 700, Taiwan;
| | - Ting-Ya Yang
- Medical Education Center, Chi-Mei Medical Center, Tainan 700, Taiwan;
- School of Medicine, College of Medicine, China Medical University, Taichung 400, Taiwan
| | - Yu-Tsen Yeh
- Medical School, St. George’s University of London, London SW17 0RE, UK;
| | - Willy Chou
- Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, Tainan 700, Taiwan
- Correspondence: (W.C.); (C.-H.Y.); Tel.: +886-6291-2811 (C.-H.Y.)
| | - Chao-Hung Yeh
- Department of Neurosurgery, Chi Mei Medical Center, Tainan 700, Taiwan
- Correspondence: (W.C.); (C.-H.Y.); Tel.: +886-6291-2811 (C.-H.Y.)
| |
Collapse
|