1
|
Kearney E, McMahon KL, Guenther F, Arciuli J, de Zubicaray GI. Revisiting the concreteness effect: Non-arbitrary mappings between form and concreteness of English words influence lexical processing. Cognition 2024; 254:105972. [PMID: 39388784 DOI: 10.1016/j.cognition.2024.105972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 08/08/2024] [Accepted: 09/28/2024] [Indexed: 10/12/2024]
Abstract
How do we represent and process abstract and concrete concepts? The "concreteness effect", in which words with more concrete meanings are processed more quickly and accurately across a range of language tasks compared to abstract ones, suggests a differential conceptual organization of these words in the brain. However, concrete words tend to be marked by specific phonotactic features, such as having fewer syllables and more phonological neighbours. It is unclear whether these non-arbitrary form-meaning relationships that systematically denote the concreteness of a word impact language processing. In the current study, we first establish the extent of systematic mappings between phonological/phonetic features and concreteness ratings in a large set of monosyllabic and polysyllabic English words (i.e., concreteness form typicality), then demonstrate that they significantly influence lexical processing using behavioural megastudy datasets. Surface form features predicted a significant proportion of variance in concreteness ratings of monomorphemic words (25 %) which increased with the addition of polymorphemic forms (43 %). In addition, concreteness form typicality was a significant predictor of performance on visual and auditory lexical decision, naming, and semantic (concrete/abstract) decision tasks, after controlling for a range of psycholinguistic variables and concreteness ratings. Overall, our results provide the first evidence that concreteness form typicality influences lexical processing. We discuss theoretical implications for interpretations of the concreteness effect and models of language processing that have yet to incorporate non-arbitrary relationships between form and meaning into their feature sets.
Collapse
Affiliation(s)
- Elaine Kearney
- School of Psychology and Counselling, Faculty of Health, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia.
| | - Katie L McMahon
- School of Clinical Sciences, Centre for Biomedical Technologies, QUT, Kelvin Grove, QLD 4059, Australia; Herston Imaging Research Facility, Royal Brisbane & Women's Hospital, Herston, QLD 4029, Australia
| | - Frank Guenther
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215, USA; Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Joanne Arciuli
- College of Nursing and Health Sciences, Flinders University, Bedford Park, SA 5042, Australia
| | - Greig I de Zubicaray
- School of Psychology and Counselling, Faculty of Health, Queensland University of Technology (QUT), Kelvin Grove, QLD 4059, Australia
| |
Collapse
|
2
|
Trott S. Can large language models help augment English psycholinguistic datasets? Behav Res Methods 2024; 56:6082-6100. [PMID: 38261264 PMCID: PMC11335796 DOI: 10.3758/s13428-024-02337-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/05/2024] [Indexed: 01/24/2024]
Abstract
Research on language and cognition relies extensively on psycholinguistic datasets or "norms". These datasets contain judgments of lexical properties like concreteness and age of acquisition, and can be used to norm experimental stimuli, discover empirical relationships in the lexicon, and stress-test computational models. However, collecting human judgments at scale is both time-consuming and expensive. This issue of scale is compounded for multi-dimensional norms and those incorporating context. The current work asks whether large language models (LLMs) can be leveraged to augment the creation of large, psycholinguistic datasets in English. I use GPT-4 to collect multiple kinds of semantic judgments (e.g., word similarity, contextualized sensorimotor associations, iconicity) for English words and compare these judgments against the human "gold standard". For each dataset, I find that GPT-4's judgments are positively correlated with human judgments, in some cases rivaling or even exceeding the average inter-annotator agreement displayed by humans. I then identify several ways in which LLM-generated norms differ from human-generated norms systematically. I also perform several "substitution analyses", which demonstrate that replacing human-generated norms with LLM-generated norms in a statistical model does not change the sign of parameter estimates (though in select cases, there are significant changes to their magnitude). I conclude by discussing the considerations and limitations associated with LLM-generated norms in general, including concerns of data contamination, the choice of LLM, external validity, construct validity, and data quality. Additionally, all of GPT-4's judgments (over 30,000 in total) are made available online for further analysis.
Collapse
Affiliation(s)
- Sean Trott
- Department of Cognitive Science, UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093-0515, USA.
| |
Collapse
|
3
|
Scheffler T, Nenchev I. Affective, semantic, frequency, and descriptive norms for 107 face emojis. Behav Res Methods 2024:10.3758/s13428-024-02444-x. [PMID: 39147946 DOI: 10.3758/s13428-024-02444-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2024] [Indexed: 08/17/2024]
Abstract
We introduce a novel dataset of affective, semantic, and descriptive norms for all facial emojis at the point of data collection. We gathered and examined subjective ratings of emojis from 138 German speakers along five essential dimensions: valence, arousal, familiarity, clarity, and visual complexity. Additionally, we provide absolute frequency counts of emoji use, drawn from an extensive Twitter corpus, as well as a much smaller WhatsApp database. Our results replicate the well-established quadratic relationship between arousal and valence of lexical items, also known for words. We also report associations among the variables: for example, the subjective familiarity of an emoji is strongly correlated with its usage frequency, and positively associated with its emotional valence and clarity of meaning. We establish the meanings associated with face emojis, by asking participants for up to three descriptions for each emoji. Using this linguistic data, we computed vector embeddings for each emoji, enabling an exploration of their distribution within the semantic space. Our description-based emoji vector embeddings not only capture typical meaning components of emojis, such as their valence, but also surpass simple definitions and direct emoji2vec models in reflecting the semantic relationship between emojis and words. Our dataset stands out due to its robust reliability and validity. This new semantic norm for face emojis impacts the future design of highly controlled experiments focused on the cognitive processing of emojis, their lexical representation, and their linguistic properties.
Collapse
Affiliation(s)
- Tatjana Scheffler
- Department for German Language and Literature, Ruhr University Bochum, Universitätsstraße 150, 44801, Bochum, Germany.
| | - Ivan Nenchev
- Department of Psychiatry and Psychotherapy, Charité Campus Mitte, Charité Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health at Charité - Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Digital Clinician Scientist Program, Berlin, Germany
| |
Collapse
|
4
|
Trott S. Large Language Models and the Wisdom of Small Crowds. Open Mind (Camb) 2024; 8:723-738. [PMID: 38828431 PMCID: PMC11142632 DOI: 10.1162/opmi_a_00144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 04/01/2024] [Indexed: 06/05/2024] Open
Abstract
Recent advances in Large Language Models (LLMs) have raised the question of replacing human subjects with LLM-generated data. While some believe that LLMs capture the "wisdom of the crowd"-due to their vast training data-empirical evidence for this hypothesis remains scarce. We present a novel methodological framework to test this: the "number needed to beat" (NNB), which measures how many humans are needed for a sample's quality to rival the quality achieved by GPT-4, a state-of-the-art LLM. In a series of pre-registered experiments, we collect novel human data and demonstrate the utility of this method for four psycholinguistic datasets for English. We find that NNB > 1 for each dataset, but also that NNB varies across tasks (and in some cases is quite small, e.g., 2). We also introduce two "centaur" methods for combining LLM and human data, which outperform both stand-alone LLMs and human samples. Finally, we analyze the trade-offs in data cost and quality for each approach. While clear limitations remain, we suggest that this framework could guide decision-making about whether and how to integrate LLM-generated data into the research pipeline.
Collapse
Affiliation(s)
- Sean Trott
- Department of Cognitive Science, University of California, San Diego, San Diego, CA, USA
| |
Collapse
|
5
|
Calvillo-Torres R, Haro J, Ferré P, Poch C, Hinojosa JA. Sound symbolic associations in Spanish emotional words: affective dimensions and discrete emotions. Cogn Emot 2024:1-17. [PMID: 38660751 DOI: 10.1080/02699931.2024.2345377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 04/16/2024] [Indexed: 04/26/2024]
Abstract
Sound symbolism refers to non-arbitrary associations between word forms and meaning, such as those observed for some properties of sounds and size or shape. Recent evidence suggests that these connections extend to emotional concepts. Here we investigated two types of non-arbitrary relationships. Study 1 examined whether iconicity scores (i.e. resemblance-based mapping between aspects of a word's form and its meaning) for words can be predicted from ratings in the affective dimensions of valence and arousal and/or the discrete emotions of happiness, anger, fear, disgust and sadness. Words denoting negative concepts were more likely to have more iconic word forms. Study 2 explored whether statistical regularities in single phonemes (i.e. systematicity) predicted ratings in affective dimensions and/or discrete emotions. Voiceless (/p/, /t/) and voiced plosives (/b/, /d/, /g/) were related to high arousing words, whereas high arousing negative words tended to include fricatives (/s/, /z/). Hissing consonants were also more likely to occur in words denoting all negative discrete emotions. Additionally, words conveying certain discrete emotions included specific phonemes. Overall, our data suggest that emotional features might explain variations in iconicity and provide new insight about phonemic patterns showing sound symbolic associations with the affective properties of words.
Collapse
Affiliation(s)
- Rocío Calvillo-Torres
- Departamento de Psicología Experimental, Procesos Cognitivos y Logopedia, Universidad Complutense de Madrid, Madrid, Spain
| | - Juan Haro
- Departament de Psicologia and CRAMC, Universitat Rovira i Virgili, Tarragona, Spain
| | - Pilar Ferré
- Departament de Psicologia and CRAMC, Universitat Rovira i Virgili, Tarragona, Spain
| | - Claudia Poch
- Centro de Investigación Nebrija en Cognición (CINC), Universidad Nebrija, Madrid, Spain
- Departamento de Educación, Universidad de Nebrija, Madrid, Spain
| | - José A Hinojosa
- Departamento de Psicología Experimental, Procesos Cognitivos y Logopedia, Universidad Complutense de Madrid, Madrid, Spain
- Centro de Investigación Nebrija en Cognición (CINC), Universidad Nebrija, Madrid, Spain
- Instituto Pluridisciplinar, Universidad Complutense de Madrid, Madrid, Spain
| |
Collapse
|
6
|
Sasaki K, Nishikawa J, Morita J. Evaluation of co-speech gestures grounded in word-distributed representation. Front Robot AI 2024; 11:1362463. [PMID: 38726067 PMCID: PMC11079185 DOI: 10.3389/frobt.2024.1362463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 03/25/2024] [Indexed: 05/12/2024] Open
Abstract
The condition for artificial agents to possess perceivable intentions can be considered that they have resolved a form of the symbol grounding problem. Here, the symbol grounding is considered an achievement of the state where the language used by the agent is endowed with some quantitative meaning extracted from the physical world. To achieve this type of symbol grounding, we adopt a method for characterizing robot gestures with quantitative meaning calculated from word-distributed representations constructed from a large corpus of text. In this method, a "size image" of a word is generated by defining an axis (index) that discriminates the "size" of the word in the word-distributed vector space. The generated size images are converted into gestures generated by a physical artificial agent (robot). The robot's gesture can be set to reflect either the size of the word in terms of the amount of movement or in terms of its posture. To examine the perception of communicative intention in the robot that performs the gestures generated as described above, the authors examine human ratings on "the naturalness" obtained through an online survey, yielding results that partially validate our proposed method. Based on the results, the authors argue for the possibility of developing advanced artifacts that achieve human-like symbolic grounding.
Collapse
Affiliation(s)
- Kosuke Sasaki
- Department of Informatics, Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
| | - Jumpei Nishikawa
- Department of Information Science and Technology, Graduate School of Science and Technology, Shizuoka University, Shizuoka, Japan
| | - Junya Morita
- Department of Informatics, Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
- Department of Information Science and Technology, Graduate School of Science and Technology, Shizuoka University, Shizuoka, Japan
- Department of Behavior Informatics, Faculty of Informatics, Shizuoka University, Hamamatsu, Japan
| |
Collapse
|
7
|
Haslett DA, Cai ZG. Systematic mappings of sound to meaning: A theoretical review. Psychon Bull Rev 2024; 31:627-648. [PMID: 37803232 DOI: 10.3758/s13423-023-02395-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2023] [Indexed: 10/08/2023]
Abstract
The form of a word sometimes conveys semantic information. For example, the iconic word gurgle sounds like what it means, and busy is easy to identify as an English adjective because it ends in -y. Such links between form and meaning matter because they help people learn and use language. But gurgle also sounds like gargle and burble, and the -y in busy is morphologically and etymologically unrelated to the -y in crazy and watery. Whatever processing effects gurgle and busy have in common likely stem not from iconic, morphological, or etymological relationships but from systematicity more broadly: the phenomenon whereby semantically related words share a phonological or orthographic feature. In this review, we evaluate corpus evidence that spoken languages are systematic (even when controlling for iconicity, morphology, and etymology) and experimental evidence that systematicity impacts word processing (even in lieu of iconic, morphological, and etymological relationships). We conclude by drawing attention to the relationship between systematicity and low-frequency words and, consequently, the role that systematicity plays in natural language processing.
Collapse
Affiliation(s)
- David A Haslett
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Zhenguang G Cai
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, Hong Kong
- Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, Hong Kong
| |
Collapse
|
8
|
Erben Johansson N. Prominence effects in vocal iconicity: Implications for lexical access and language changea). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:8-17. [PMID: 38169522 DOI: 10.1121/10.0024240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024]
Abstract
This paper explores how three cognitive and perceptual cues, vocal iconicity, resemblance-based mappings between form and meaning, and segment position and lexical stress, interact to affect word formation and language processing. The study combines an analysis of the word-internal positions that iconic segments occur in based on data from 245 language families with an experimental study in which participants representing more than 30 languages rated iconic and non-iconic pseudowords. The pseudowords were designed to systematically vary segment and stress placement across syllables. The results for study 1 indicate that segments used iconically appear approximately 0.26 segment positions closer toward the beginning of words compared to non-iconic segments. In study 2, it was found that iconic segments occurring in stressed syllables and non-iconic segments occurring in the second syllable were rated as significantly more fitting. These findings suggest that the interplay between vocal iconicity and prominence effects increases the predictive function of iconic segments by foregrounding sounds, which intrinsically carry semantic information. Consequently, these results contribute to the understanding of the widespread occurrence of vocal iconicity in human languages.
Collapse
|
9
|
Imai M, Akita K. The Iconicity Ring Hypothesis Bridges the Gap Between Symbol Grounding and Linguistic Relativity. Top Cogn Sci 2023; 15:676-682. [PMID: 37331018 DOI: 10.1111/tops.12671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 05/15/2023] [Accepted: 05/22/2023] [Indexed: 06/20/2023]
Abstract
Kemmerer captured the drastic change in theories of word meaning representations, contrasting the view that word meaning representations are amodal and universal, with the view that they are grounded and language-specific. However, he does not address how language can be simultaneously grounded and language-specific. Here, we approach this question from the perspective of language acquisition and evolution. We argue that adding a new element-iconicity-is critically beneficial and offer the iconicity ring hypothesis, which explains how language-specific, secondary iconicity might emerge from biologically grounded and universally shared iconicity in the course of language acquisition and evolution.
Collapse
Affiliation(s)
- Mutsumi Imai
- Faculty of Environment and Information Sciences, Keio University
| | | |
Collapse
|