1
|
Wang T, Xu X. The good, the bad, and the ambivalent: Extrapolating affective values for 38,000+ Chinese words via a computational model. Behav Res Methods 2024; 56:5386-5405. [PMID: 37968560 DOI: 10.3758/s13428-023-02274-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2023] [Indexed: 11/17/2023]
Abstract
Word affective ratings are important tools in psycholinguistic research, natural language processing, and many other fields. However, even for well-studied languages, such norms are usually limited in scale. To extrapolate affective (i.e., valence and arousal) values for words in the SUBTLEX-CH database (Cai & Brysbaert, 2010, PLoS ONE, 5(6):e10729), we implemented a computational neural network which captured how words' vector-based semantic representations corresponded to the probability densities of their valence and arousal. Based on these probability density functions, we predicted not only a word's affective values, but also their respective degrees of variability that could characterize individual differences in human affective ratings. The resulting estimates of affective values largely converged with human ratings for both valence and arousal, and the estimated degrees of variability also captured important features of the variability in human ratings. We released the extrapolated affective values, together with their corresponding degrees of variability, for over 38,000 Chinese words in the Open Science Framework ( https://osf.io/s9zmd/ ). We also discussed how the view of embodied cognition could be illuminated by this computational model.
Collapse
Affiliation(s)
- Tianqi Wang
- School of Foreign Languages, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, 200240, China
- Speech Science Laboratory, The University of Hong Kong, Hong Kong, China
- Academic Unit of Human Communication, Development, and Information Sciences, The University of Hong Kong, Hong Kong, China
| | - Xu Xu
- School of Foreign Languages, Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, 200240, China.
| |
Collapse
|
2
|
Plisiecki H, Sobieszek A. Extrapolation of affective norms using transformer-based neural networks and its application to experimental stimuli selection. Behav Res Methods 2024; 56:4716-4731. [PMID: 37749424 PMCID: PMC11289359 DOI: 10.3758/s13428-023-02212-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/30/2023] [Indexed: 09/27/2023]
Abstract
Data on the emotionality of words is important for the selection of experimental stimuli and sentiment analysis on large bodies of text. While norms for valence and arousal have been thoroughly collected in English, most languages do not have access to such large datasets. Moreover, theoretical developments lead to new dimensions being proposed, the norms for which are only partially available. In this paper, we propose a transformer-based neural network architecture for semantic and emotional norms extrapolation that predicts a whole ensemble of norms at once while achieving state-of-the-art correlations with human judgements on each. We improve on the previous approaches with regards to the correlations with human judgments by Δr = 0.1 on average. We precisely discuss the limitations of norm extrapolation as a whole, with a special focus on the introduced model. Further, we propose a unique practical application of our model by proposing a method of stimuli selection which performs unsupervised control by picking words that match in their semantic content. As the proposed model can easily be applied to different languages, we provide norm extrapolations for English, Polish, Dutch, German, French, and Spanish. To aid researchers, we also provide access to the extrapolation networks through an accessible web application.
Collapse
Affiliation(s)
- Hubert Plisiecki
- Institute of Psychology, Polish Academy of Sciences, SWPS University of Warsaw, Warsaw, Poland.
| | - Adam Sobieszek
- Faculty of Psychology, University of Warsaw, Warsaw, Poland
| |
Collapse
|
3
|
de Zubicaray GI, Hinojosa JA. Statistical Relationships Between Phonological Form, Emotional Valence and Arousal of Spanish Words. J Cogn 2024; 7:42. [PMID: 38737820 PMCID: PMC11086587 DOI: 10.5334/joc.366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/23/2024] [Indexed: 05/14/2024] Open
Abstract
A number of studies have provided evidence of limited non-arbitrary associations between the phonological forms and meanings of affective words, a finding referred to as affective sound symbolism. Here, we explored whether the affective connotations of Spanish words might have more extensive statistical relationships with phonological/phonetic features, or affective form typicality. After eliminating words with poor affective rating agreement and morphophonological redundancies (e.g., negating prefixes), we found evidence of significant form typicality for emotional valence, emotionality, and arousal in a large sample of monosyllabic and polysyllabic words. These affective form-meaning mappings remained significant even when controlling for a range of lexico-semantic variables. We show that affective variables and their corresponding form typicality measures are able to significantly predict lexical decision performance using a megastudy dataset. Overall, our findings provide new evidence that affective form typicality is a statistical property of the Spanish lexicon.
Collapse
Affiliation(s)
- Greig I. de Zubicaray
- School of Psychology and Counselling, Faculty of Health, Queensland University of Technology (QUT), Brisbane, Australia
| | - José A. Hinojosa
- Departamento de Psicología Experimental, Procesos Cognitivos y Logopedia, Universidad Complutense de Madrid, Madrid, Spain
- Instituto Pluridisciplinar, Universidad Complutense de Madrid, Madrid, Spain
- Centro de Investigación Nebrija en Cognición (CINC), Universidad Nebrija, Madrid, Spain
| |
Collapse
|
4
|
Hollander J, Olney A. Raising the Roof: Situating Verbs in Symbolic and Embodied Language Processing. Cogn Sci 2024; 48:e13442. [PMID: 38655894 DOI: 10.1111/cogs.13442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 02/05/2024] [Accepted: 03/29/2024] [Indexed: 04/26/2024]
Abstract
Recent investigations on how people derive meaning from language have focused on task-dependent shifts between two cognitive systems. The symbolic (amodal) system represents meaning as the statistical relationships between words. The embodied (modal) system represents meaning through neurocognitive simulation of perceptual or sensorimotor systems associated with a word's referent. A primary finding of literature in this field is that the embodied system is only dominant when a task necessitates it, but in certain paradigms, this has only been demonstrated using nouns and adjectives. The purpose of this paper is to study whether similar effects hold with verbs. Experiment 1 evaluated a novel task in which participants rated a selection of verbs on their implied vertical movement. Ratings correlated well with distributional semantic models, establishing convergent validity, though some variance was unexplained by language statistics alone. Experiment 2 replicated previous noun-based location-cue congruency experimental paradigms with verbs and showed that the ratings obtained in Experiment 1 predicted reaction times more strongly than language statistics. Experiment 3 modified the location-cue paradigm by adding movement to create an animated, temporally decoupled, movement-verb judgment task designed to examine the relative influence of symbolic and embodied processing for verbs. Results were generally consistent with linguistic shortcut hypotheses of symbolic-embodied integrated language processing; location-cue congruence elicited processing facilitation in some conditions, and perceptual information accounted for reaction times and accuracy better than language statistics alone. These studies demonstrate novel ways in which embodied and linguistic information can be examined while using verbs as stimuli.
Collapse
Affiliation(s)
- John Hollander
- Department of Psychology, Institute for Intelligent Systems, University of Memphis
| | - Andrew Olney
- Department of Psychology, Institute for Intelligent Systems, University of Memphis
| |
Collapse
|
5
|
Vankrunkelsven H, Yang Y, Brysbaert M, De Deyne S, Storms G. Semantic gender: Norms for 24,000 Dutch words and its role in word meaning. Behav Res Methods 2024; 56:113-125. [PMID: 36471212 DOI: 10.3758/s13428-022-02032-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/19/2022] [Indexed: 12/12/2022]
Abstract
Semantic gender norms are presented for 24,037 Dutch words. Eighty participants rated 6017 words each on a five-point Likert scale ranging from feminine to masculine. Each word was rated by ten male and ten female participants. The collected norms show high reliability and correlate well with similar norms in English. We show that semantic gender is distinct from other lexical dimensions such as valence, arousal, dominance, concreteness, and age of acquisition. Semantic gender is not the same as the grammatical gender of words, either. The collected norms can be predicted accurately using a semantic space based on word association data. A dimension explaining a good amount of variance is present in this space, indicating that semantic gender is an important component of the human meaning system.
Collapse
Affiliation(s)
- Hendrik Vankrunkelsven
- Faculty of Psychology and Educational Sciences, University of Leuven, Tiensestraat 102, 3000, Leuven, Belgium.
| | - Yang Yang
- Department of Psychology and Educational Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | - Marc Brysbaert
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Simon De Deyne
- School of Psychological Sciences, University of Melbourne, Melbourne, Australia
| | - Gert Storms
- Faculty of Psychology and Educational Sciences, University of Leuven, Tiensestraat 102, 3000, Leuven, Belgium
| |
Collapse
|
6
|
Botarleanu RM, Dascalu M, Watanabe M, Crossley SA, McNamara DS. Age of Exposure 2.0: Estimating word complexity using iterative models of word embeddings. Behav Res Methods 2022; 54:3015-3042. [PMID: 35167112 DOI: 10.3758/s13428-022-01797-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/12/2022] [Indexed: 12/16/2022]
Abstract
Age of acquisition (AoA) is a measure of word complexity which refers to the age at which a word is typically learned. AoA measures have shown strong correlations with reading comprehension, lexical decision times, and writing quality. AoA scores based on both adult and child data have limitations that allow for error in measurement, and increase the cost and effort to produce. In this paper, we introduce Age of Exposure (AoE) version 2, a proxy for human exposure to new vocabulary terms that expands AoA word lists through training regressors to predict AoA scores. Word2vec word embeddings are trained on cumulatively increasing corpora of texts, word exposure trajectories are generated by aligning the word2vec vector spaces, and features of words are derived for modeling AoA scores. Our prediction models achieve low errors (from 13% with a corresponding R2 of .35 up to 7% with an R2 of .74), can be uniformly applied to different AoA word lists, and generalize to the entire vocabulary of a language. Our method benefits from using existing readability indices to define the order of texts in the corpora, while the performed analyses confirm that the generated AoA scores accurately predicted the difficulty of texts (R2 of .84, surpassing related previous work). Further, we provide evidence of the internal reliability of our word trajectory features, demonstrate the effectiveness of the word trajectory features when contrasted with simple lexical features, and show that the exclusion of features that rely on external resources does not significantly impact performance.
Collapse
Affiliation(s)
| | - Mihai Dascalu
- University Politehnica of Bucharest, Bucharest, Romania.
- Academy of Romanian Scientists, Bucharest, Romania.
| | | | | | | |
Collapse
|
7
|
Extrapolation of Human Estimates of the Concreteness/ Abstractness of Words by Neural Networks of Various Architectures. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In a great deal of theoretical and applied cognitive and neurophysiological research, it is essential to have more vocabularies with concreteness/abstractness ratings. Since creating such dictionaries by interviewing informants is labor-intensive, considerable effort has been made to machine-extrapolate human rankings. The purpose of the article is to study the possibility of the fast construction of high-quality machine dictionaries. In this paper, state-of-the-art deep learning neural networks are involved for the first time to solve this problem. For the English language, the BERT model has achieved a record result for the quality of a machine-generated dictionary. It is known that the use of multilingual models makes it possible to transfer ratings from one language to another. However, this approach is understudied so far and the results achieved so far are rather weak. Microsoft’s Multilingual-MiniLM-L12-H384 model also obtained the best result to date in transferring ratings from one language to another. Thus, the article demonstrates the advantages of transformer-type neural networks in this task. Their use will allow the generation of good-quality dictionaries in low-resource languages. Additionally, we study the dependence of the result on the amount of initial data and the number of languages in the multilingual case. The possibilities of transferring into a certain language from one language and from several languages together are compared. The influence of the volume of training and test data has been studied. It has been found that an increase in the amount of training data in a multilingual case does not improve the result.
Collapse
|
8
|
Schulte im Walde S, Frassinelli D. Distributional Measures of Semantic Abstraction. Front Artif Intell 2022; 4:796756. [PMID: 35252847 PMCID: PMC8892386 DOI: 10.3389/frai.2021.796756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
This article provides an in-depth study of distributional measures for distinguishing between degrees of semantic abstraction. Abstraction is considered a “central construct in cognitive science” and a “process of information reduction that allows for efficient storage and retrieval of central knowledge”. Relying on the distributional hypothesis, computational studies have successfully exploited measures of contextual co-occurrence and neighbourhood density to distinguish between conceptual semantic categorisations. So far, these studies have modeled semantic abstraction across lexical-semantic tasks such as ambiguity; diachronic meaning changes; abstractness vs. concreteness; and hypernymy. Yet, the distributional approaches target different conceptual types of semantic relatedness, and as to our knowledge not much attention has been paid to apply, compare or analyse the computational abstraction measures across conceptual tasks. The current article suggests a novel perspective that exploits variants of distributional measures to investigate semantic abstraction in English in terms of the abstract–concrete dichotomy (e.g., glory–banana) and in terms of the generality–specificity distinction (e.g., animal–fish), in order to compare the strengths and weaknesses of the measures regarding categorisations of abstraction, and to determine and investigate conceptual differences. In a series of experiments we identify reliable distributional measures for both instantiations of lexical-semantic abstraction and reach a precision higher than 0.7, but the measures clearly differ for the abstract–concrete vs. abstract–specific distinctions and for nouns vs. verbs. Overall, we identify two groups of measures, (i) frequency and word entropy when distinguishing between more and less abstract words in terms of the generality–specificity distinction, and (ii) neighbourhood density variants (especially target–context diversity) when distinguishing between more and less abstract words in terms of the abstract–concrete dichotomy. We conclude that more general words are used more often and are less surprising than more specific words, and that abstract words establish themselves empirically in semantically more diverse contexts than concrete words. Finally, our experiments once more point out that distributional models of conceptual categorisations need to take word classes and ambiguity into account: results for nouns vs. verbs differ in many respects, and ambiguity hinders fine-tuning empirical observations.
Collapse
Affiliation(s)
- Sabine Schulte im Walde
- Institute for Natural Language Processing, University of Stuttgart, Stuttgart, Germany
- *Correspondence: Sabine Schulte im Walde
| | - Diego Frassinelli
- Department of Linguistics, University of Konstanz, Konstanz, Germany
| |
Collapse
|
9
|
Utsumi A. Exploring What Is Encoded in Distributional Word Vectors: A Neurobiologically Motivated Analysis. Cogn Sci 2021; 44:e12844. [PMID: 32458523 DOI: 10.1111/cogs.12844] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 12/27/2019] [Accepted: 03/21/2020] [Indexed: 11/27/2022]
Abstract
The pervasive use of distributional semantic models or word embeddings for both cognitive modeling and practical application is because of their remarkable ability to represent the meanings of words. However, relatively little effort has been made to explore what types of information are encoded in distributional word vectors. Knowing the internal knowledge embedded in word vectors is important for cognitive modeling using distributional semantic models. Therefore, in this paper, we attempt to identify the knowledge encoded in word vectors by conducting a computational experiment using Binder et al.'s (2016) featural conceptual representations based on neurobiologically motivated attributes. In an experiment, these conceptual vectors are predicted from text-based word vectors using a neural network and linear transformation, and prediction performance is compared among various types of information. The analysis demonstrates that abstract information is generally predicted more accurately by word vectors than perceptual and spatiotemporal information, and specifically, the prediction accuracy of cognitive and social information is higher. Emotional information is also found to be successfully predicted for abstract words. These results indicate that language can be a major source of knowledge about abstract attributes, and they support the recent view that emphasizes the importance of language for abstract concepts. Furthermore, we show that word vectors can capture some types of perceptual and spatiotemporal information about concrete concepts and some relevant word categories. This suggests that language statistics can encode more perceptual knowledge than often expected.
Collapse
Affiliation(s)
- Akira Utsumi
- Department of Informatics & Artificial Intelligence eXploration Research Center, The University of Electro-Communications
| |
Collapse
|
10
|
Kenett YN, Ungar L, Chatterjee A. Beauty and Wellness in the Semantic Memory of the Beholder. Front Psychol 2021; 12:696507. [PMID: 34421747 PMCID: PMC8376150 DOI: 10.3389/fpsyg.2021.696507] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 07/12/2021] [Indexed: 11/13/2022] Open
Abstract
Beauty and wellness are terms used often in common parlance, however their meaning and relation to each other is unclear. To probe their meaning, we applied network science methods to estimate and compare the semantic networks associated with beauty and wellness in different age generation cohorts (Generation Z, Millennials, Generation X, and Baby Boomers) and in women and men. These mappings were achieved by estimating group-based semantic networks from free association responses to a list of 47 words, either related to Beauty, Wellness, or Beauty + Wellness. Beauty was consistently related to Elegance, Feminine, Gorgeous, Lovely, Sexy, and Stylish. Wellness was consistently related Aerobics, Fitness, Health, Holistic, Lifestyle, Medical, Nutrition, and Thrive. In addition, older cohorts had semantic networks that were less connected and more segregated from each other. Finally, we found that women compared to men had more segregated and organized concepts of Beauty and Wellness. In contemporary societies that are pre-occupied by the pursuit of beauty and a healthy lifestyle, our findings shed novel light on how people think about beauty and wellness and how they are related across different age generations and by sex.
Collapse
Affiliation(s)
- Yoed N. Kenett
- Penn Center for Neuroaesthetics, University of Pennsylvania, Philadelphia, PA, United States
- Faculty of Industrial Engineering & Management, Technion–Israel Institute of Technology, Haifa, Israel
| | - Lyle Ungar
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States
| | - Anjan Chatterjee
- Penn Center for Neuroaesthetics, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
11
|
Colexification Networks Encode Affective Meaning. AFFECTIVE SCIENCE 2021; 2:99-111. [PMID: 36043166 PMCID: PMC9382918 DOI: 10.1007/s42761-021-00033-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/27/2021] [Indexed: 11/30/2022]
Abstract
Colexification is a linguistic phenomenon that occurs when multiple concepts are expressed in a language with the same word. Colexification patterns are frequently used to estimate the meaning similarity between words, but the hypothesis that these are related is still missing direct empirical validation at scale. Here, we show for the first time that words linked by colexification patterns capture similar affective meanings. Using pre-existing translation data, we extend colexification databases to cover much longer word lists. We achieve this with an unsupervised method of affective lexicon extension that uses colexification network data to interpolate the affective ratings of words that are not included in the original lexicon. We find positive correlations between network-based estimates and empirical affective ratings, which suggest that colexification networks contain information related to affective meanings. Finally, we compare our network method with state-of-the-art machine learning, trained on a large corpus, and show that our simple linguistics-informed unsupervised algorithm yields comparable performance with high explainability. These results show that it is possible to automatically expand affective norms lexica to cover exhaustive word lists when additional data are available, such as in colexification networks.
Collapse
|
12
|
Unger L, Fisher AV. The Emergence of Richly Organized Semantic Knowledge from Simple Statistics: A Synthetic Review. DEVELOPMENTAL REVIEW 2021; 60:100949. [PMID: 33840880 PMCID: PMC8026144 DOI: 10.1016/j.dr.2021.100949] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
As adults, we draw upon our ample knowledge about the world to support such vital cognitive feats as using language, reasoning, retrieving knowledge relevant to our current goals, planning for the future, adapting to unexpected events, and navigating through the environment. Our knowledge readily supports these feats because it is not merely a collection of stored facts, but rather functions as an organized, semantic network of concepts connected by meaningful relations. How do the relations that fundamentally organize semantic concepts emerge with development? Here, we cast a spotlight on a potentially powerful but often overlooked driver of semantic organization: Rich statistical regularities that are ubiquitous in both language and visual input. In this synthetic review, we show that a driving role for statistical regularities is convergently supported by evidence from diverse fields, including computational modeling, statistical learning, and semantic development. Finally, we identify a number of key avenues of future research into how statistical regularities may drive the development of semantic organization.
Collapse
Affiliation(s)
- Layla Unger
- Department of Psychology, Ohio State University, Columbus OH
| | - Anna V Fisher
- Department of Psychology, Carnegie Mellon University, Pittsburgh PA
| |
Collapse
|
13
|
The Croatian psycholinguistic database: Estimates for 6000 nouns, verbs, adjectives and adverbs. Behav Res Methods 2021; 53:1799-1816. [PMID: 33904142 PMCID: PMC8367916 DOI: 10.3758/s13428-020-01533-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/21/2020] [Indexed: 11/08/2022]
Abstract
Psycholinguistic databases containing ratings of concreteness, imageability, age of acquisition, and subjective frequency are used in psycholinguistic and neurolinguistic studies which require words as stimuli. Linguistic characteristics (e.g. word length, corpus frequency) are frequently coded, but word class is seldom systematically treated, although there are indications of its significance for imageability and concreteness. This paper presents the Croatian Psycholinguistic Database (CPD; available at: https://doi.org/10.17234/megahr.2019.hpb ), containing 6000 Croatian nouns, verbs, adjectives and adverbs, rated for concreteness, imageability, age of acquisition, and subjective frequency. Moreover, we present computationally obtained extrapolations of concreteness and imageability to the remainder of the Croatian lexicon (available at: https://github.com/megahr/lexicon/blob/master/predictions/hr_c_i.predictions.txt ). In the two studies presented here, we explore the significance of word class for concreteness and imageability in human and computationally obtained ratings. The observed correlations in the CPD indicate correspondences between psycholinguistic measures expected from the literature. Word classes exhibit differences in subjective frequency, age of acquisition, concreteness and imageability, with significant differences between nouns, verbs, adjectives and adverbs. In the computational study which focused on concreteness and imageability, concreteness obtained higher correlations with human ratings than imageability, and the system underpredicted the concreteness of nouns, and overpredicted the concreteness of adjectives and adverbs. Overall, this suggests that word class contains schematic conceptual and distributional information. Schematic conceptual content seems to be more significant in human ratings of concreteness and less significant in computationally obtained ratings, where distributional information seems to play a more significant role. This suggests that word class differences should be theoretically explored.
Collapse
|
14
|
He L, Kenett YN, Zhuang K, Liu C, Zeng R, Yan T, Huo T, Qiu J. The relation between semantic memory structure, associative abilities, and verbal and figural creativity. THINKING & REASONING 2020. [DOI: 10.1080/13546783.2020.1819415] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Li He
- Key Laboratory of Cognition and Personality of the Ministry of Education, Faculty of Psychology, Southwest University, Chongqing, China
| | - Yoed N. Kenett
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
- William Davidson Faculty of Industrial Engineering and Management, Technion—Israel Institute of Technology, Haifa, Israel
| | - Kaixiang Zhuang
- Key Laboratory of Cognition and Personality of the Ministry of Education, Faculty of Psychology, Southwest University, Chongqing, China
| | - Cheng Liu
- Key Laboratory of Cognition and Personality of the Ministry of Education, Faculty of Psychology, Southwest University, Chongqing, China
| | - Rongcan Zeng
- Key Laboratory of Cognition and Personality of the Ministry of Education, Faculty of Psychology, Southwest University, Chongqing, China
| | - Tingrui Yan
- Key Laboratory of Cognition and Personality of the Ministry of Education, Faculty of Psychology, Southwest University, Chongqing, China
| | - Tengbin Huo
- Key Laboratory of Cognition and Personality of the Ministry of Education, Faculty of Psychology, Southwest University, Chongqing, China
| | - Jiang Qiu
- Key Laboratory of Cognition and Personality of the Ministry of Education, Faculty of Psychology, Southwest University, Chongqing, China
- Southwest University Branch, Collaborative Innovation Center of Assessment Toward Basic Education Quality, Beijing Normal University, Beijing, China
| |
Collapse
|
15
|
Abstract
This paper introduces a novel collection of word embeddings, numerical representations of lexical semantics, in 55 languages, trained on a large corpus of pseudo-conversational speech transcriptions from television shows and movies. The embeddings were trained on the OpenSubtitles corpus using the fastText implementation of the skipgram algorithm. Performance comparable with (and in some cases exceeding) embeddings trained on non-conversational (Wikipedia) text is reported on standard benchmark evaluation datasets. A novel evaluation method of particular relevance to psycholinguists is also introduced: prediction of experimental lexical norms in multiple languages. The models, as well as code for reproducing the models and all analyses reported in this paper (implemented as a user-friendly Python package), are freely available at: https://github.com/jvparidon/subs2vec.
Collapse
|
16
|
Informational content of cosine and other similarities calculated from high-dimensional Conceptual Property Norm data. Cogn Process 2020; 21:601-614. [PMID: 32647948 DOI: 10.1007/s10339-020-00985-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Accepted: 07/01/2020] [Indexed: 10/23/2022]
Abstract
To study concepts that are coded in language, researchers often collect lists of conceptual properties produced by human subjects. From these data, different measures can be computed. In particular, inter-concept similarity is an important variable used in experimental studies. Among possible similarity measures, the cosine of conceptual property frequency vectors seems to be a de facto standard. However, there is a lack of comparative studies that test the merit of different similarity measures when computed from property frequency data. The current work compares four different similarity measures (cosine, correlation, Euclidean and Chebyshev) and five different types of data structures. To that end, we compared the informational content (i.e., entropy) delivered by each of those 4 × 5 = 20 combinations, and used a clustering procedure as a concrete example of how informational content affects statistical analyses. Our results lead us to conclude that similarity measures computed from lower-dimensional data fare better than those calculated from higher-dimensional data, and suggest that researchers should be more aware of data sparseness and dimensionality, and their consequences for statistical analyses.
Collapse
|
17
|
Jacobs AM. Sentiment Analysis for Words and Fiction Characters From the Perspective of Computational (Neuro-)Poetics. Front Robot AI 2019; 6:53. [PMID: 33501068 PMCID: PMC7805775 DOI: 10.3389/frobt.2019.00053] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 06/24/2019] [Indexed: 11/13/2022] Open
Abstract
Two computational studies provide different sentiment analyses for text segments (e.g., "fearful" passages) and figures (e.g., "Voldemort") from the Harry Potter books (Rowling, 1997, 1998, 1999, 2000, 2003, 2005, 2007) based on a novel simple tool called SentiArt. The tool uses vector space models together with theory-guided, empirically validated label lists to compute the valence of each word in a text by locating its position in a 2d emotion potential space spanned by the words of the vector space model. After testing the tool's accuracy with empirical data from a neurocognitive poetics study, it was applied to compute emotional figure and personality profiles (inspired by the so-called "big five" personality theory) for main characters from the book series. The results of comparative analyses using different machine-learning classifiers (e.g., AdaBoost, Neural Net) show that SentiArt performs very well in predicting the emotion potential of text passages. It also produces plausible predictions regarding the emotional and personality profile of fiction characters which are correctly identified on the basis of eight character features, and it achieves a good cross-validation accuracy in classifying 100 figures into "good" vs. "bad" ones. The results are discussed with regard to potential applications of SentiArt in digital literary, applied reading and neurocognitive poetics studies such as the quantification of the hybrid hero potential of figures.
Collapse
Affiliation(s)
- Arthur M Jacobs
- Department of Experimental and Neurocognitive Psychology, Freie Universität Berlin, Berlin, Germany.,Center for Cognitive Neuroscience Berlin, Berlin, Germany
| |
Collapse
|
18
|
Vankrunkelsven H, Verheyen S, Storms G, De Deyne S. Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models. J Cogn 2018; 1:45. [PMID: 31517218 PMCID: PMC6634333 DOI: 10.5334/joc.50] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Accepted: 11/06/2018] [Indexed: 11/20/2022] Open
Abstract
In two studies we compare a distributional semantic model derived from word co-occurrences and a word association based model in their ability to predict properties that affect lexical processing. We focus on age of acquisition, concreteness, and three affective variables, namely valence, arousal, and dominance, since all these variables have been shown to be fundamental in word meaning. In both studies we use a model based on data obtained in a continued free word association task to predict these variables. In Study 1 we directly compare this model to a word co-occurrence model based on syntactic dependency relations to see which model is better at predicting the variables under scrutiny in Dutch. In Study 2 we replicate our findings in English and compare our results to those reported in the literature. In both studies we find the word association-based model fit to predict diverse word properties. Especially in the case of predicting affective word properties, we show that the association model is superior to the distributional model.
Collapse
Affiliation(s)
| | | | - Gert Storms
- Laboratory of Experimental Psychology, KU Leuven, BE
| | - Simon De Deyne
- Laboratory of Experimental Psychology, KU Leuven, BE
- Computational Cognitive Science Lab, University of Melbourne, AU
| |
Collapse
|
19
|
Hofmann MJ, Biemann C, Westbury C, Murusidze M, Conrad M, Jacobs AM. Simple Co‐Occurrence Statistics Reproducibly Predict Association Ratings. Cogn Sci 2018; 42:2287-2312. [DOI: 10.1111/cogs.12662] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2016] [Revised: 05/11/2018] [Accepted: 05/17/2018] [Indexed: 11/26/2022]
Affiliation(s)
| | - Chris Biemann
- Department of Computer Science University of Hamburg
| | | | | | | | | |
Collapse
|
20
|
|
21
|
When is best-worst best? A comparison of best-worst scaling, numeric estimation, and rating scales for collection of semantic norms. Behav Res Methods 2018; 50:115-133. [PMID: 29322399 DOI: 10.3758/s13428-017-1009-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Large-scale semantic norms have become both prevalent and influential in recent psycholinguistic research. However, little attention has been directed towards understanding the methodological best practices of such norm collection efforts. We compared the quality of semantic norms obtained through rating scales, numeric estimation, and a less commonly used judgment format called best-worst scaling. We found that best-worst scaling usually produces norms with higher predictive validities than other response formats, and does so requiring less data to be collected overall. We also found evidence that the various response formats may be producing qualitatively, rather than just quantitatively, different data. This raises the issue of potential response format bias, which has not been addressed by previous efforts to collect semantic norms, likely because of previous reliance on a single type of response format for a single type of semantic judgment. We have made available software for creating best-worst stimuli and scoring best-worst data. We also made available new norms for age of acquisition, valence, arousal, and concreteness collected using best-worst scaling. These norms include entries for 1,040 words, of which 1,034 are also contained in the ANEW norms (Bradley & Lang, Affective norms for English words (ANEW): Instruction manual and affective ratings (pp. 1-45). Technical report C-1, the center for research in psychophysiology, University of Florida, 1999).
Collapse
|
22
|
Corcoran CM, Carrillo F, Fernández‐Slezak D, Bedi G, Klim C, Javitt DC, Bearden CE, Cecchi GA. Prediction of psychosis across protocols and risk cohorts using automated language analysis. World Psychiatry 2018; 17:67-75. [PMID: 29352548 PMCID: PMC5775133 DOI: 10.1002/wps.20491] [Citation(s) in RCA: 211] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Language and speech are the primary source of data for psychiatrists to diagnose and treat mental disorders. In psychosis, the very structure of language can be disturbed, including semantic coherence (e.g., derailment and tangentiality) and syntactic complexity (e.g., concreteness). Subtle disturbances in language are evident in schizophrenia even prior to first psychosis onset, during prodromal stages. Using computer-based natural language processing analyses, we previously showed that, among English-speaking clinical (e.g., ultra) high-risk youths, baseline reduction in semantic coherence (the flow of meaning in speech) and in syntactic complexity could predict subsequent psychosis onset with high accuracy. Herein, we aimed to cross-validate these automated linguistic analytic methods in a second larger risk cohort, also English-speaking, and to discriminate speech in psychosis from normal speech. We identified an automated machine-learning speech classifier - comprising decreased semantic coherence, greater variance in that coherence, and reduced usage of possessive pronouns - that had an 83% accuracy in predicting psychosis onset (intra-protocol), a cross-validated accuracy of 79% of psychosis onset prediction in the original risk cohort (cross-protocol), and a 72% accuracy in discriminating the speech of recent-onset psychosis patients from that of healthy individuals. The classifier was highly correlated with previously identified manual linguistic predictors. Our findings support the utility and validity of automated natural language processing methods to characterize disturbances in semantics and syntax across stages of psychotic disorder. The next steps will be to apply these methods in larger risk cohorts to further test reproducibility, also in languages other than English, and identify sources of variability. This technology has the potential to improve prediction of psychosis outcome among at-risk youths and identify linguistic targets for remediation and preventive intervention. More broadly, automated linguistic analysis can be a powerful tool for diagnosis and treatment across neuropsychiatry.
Collapse
Affiliation(s)
- Cheryl M. Corcoran
- Department of PsychiatryIcahn School of Medicine at Mount SinaiNew YorkNYUSA,New York State Psychiatric InstituteNew YorkNYUSA
| | - Facundo Carrillo
- Departamento de Computación, Facultad de Ciencias Exactas y NaturalesUniversidad de Buenos AiresBuenos AiresArgentina,Instituto de Investigación en Ciencias de la Computación, Universidad de Buenos AiresBuenos AiresArgentina
| | - Diego Fernández‐Slezak
- Departamento de Computación, Facultad de Ciencias Exactas y NaturalesUniversidad de Buenos AiresBuenos AiresArgentina,Instituto de Investigación en Ciencias de la Computación, Universidad de Buenos AiresBuenos AiresArgentina
| | - Gillinder Bedi
- New York State Psychiatric InstituteNew YorkNYUSA,Department of PsychiatryColumbia University Medical CenterNew YorkNYUSA,Centre for Youth Mental HealthUniversity of Melbourne, and Orygen National Centre of Excellence in Youth Mental HealthMelbourneAustralia
| | - Casimir Klim
- New York State Psychiatric InstituteNew YorkNYUSA,Department of PsychiatryColumbia University Medical CenterNew YorkNYUSA
| | - Daniel C. Javitt
- New York State Psychiatric InstituteNew YorkNYUSA,Department of PsychiatryColumbia University Medical CenterNew YorkNYUSA
| | - Carrie E. Bearden
- Department of Psychiatry and Biobehavioral Sciences and PsychologyUniversity of California Los Angeles; Semel Institute for Neuroscience and Human BehaviorLos AngelesCAUSA
| | - Guillermo A. Cecchi
- Computational Biology Center ‐ Neuroscience, IBM T.J. Watson Research CenterOssiningNYUSA
| |
Collapse
|
23
|
Hollis G, Westbury C, Lefsrud L. Extrapolating human judgments from skip-gram vector representations of word meaning. Q J Exp Psychol (Hove) 2017; 70:1603-1619. [DOI: 10.1080/17470218.2016.1195417] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
There is a growing body of research in psychology that attempts to extrapolate human lexical judgments from computational models of semantics. This research can be used to help develop comprehensive norm sets for experimental research, it has applications to large-scale statistical modelling of lexical access and has broad value within natural language processing and sentiment analysis. However, the value of extrapolated human judgments has recently been questioned within psychological research. Of primary concern is the fact that extrapolated judgments may not share the same pattern of statistical relationship with lexical and semantic variables as do actual human judgments; often the error component in extrapolated judgments is not psychologically inert, making such judgments problematic to use for psychological research. We present a new methodology for extrapolating human judgments that partially addresses prior concerns of validity. We use this methodology to extrapolate human judgments of valence, arousal, dominance, and concreteness for 78,286 words. We also provide resources for users to extrapolate these human judgments for three million English words and short phrases. Applications for large sets of extrapolated human judgments are demonstrated and discussed.
Collapse
Affiliation(s)
- Geoff Hollis
- Department of Psychology, University of Alberta, Edmonton, AB, Canada
| | - Chris Westbury
- Department of Psychology, University of Alberta, Edmonton, AB, Canada
| | - Lianne Lefsrud
- Department of Material & Chemicals Engineering, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
24
|
Scoring best-worst data in unbalanced many-item designs, with applications to crowdsourcing semantic judgments. Behav Res Methods 2017; 50:711-729. [DOI: 10.3758/s13428-017-0898-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
25
|
Warriner AB, Shore DI, Schmidt LA, Imbault CL, Kuperman V. Sliding into happiness: A new tool for measuring affective responses to words. ACTA ACUST UNITED AC 2017; 71:71-88. [PMID: 28252996 DOI: 10.1037/cep0000112] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Reliable measurement of affective responses is critical for research into human emotion. Affective evaluation of words is most commonly gauged on multiple dimensions-including valence (positivity) and arousal-using a rating scale. Despite its popularity, this scale is open to criticism: It generates ordinal data that is often misinterpreted as interval, it does not provide the fine resolution that is essential by recent theoretical accounts of emotion, and its extremes may not be properly calibrated. In 5 experiments, the authors introduce a new slider tool for affective evaluation of words on a continuous, well-calibrated and high-resolution scale. In Experiment 1, participants were shown a word and asked to move a manikin representing themselves closer to or farther away from the word. The manikin's distance from the word strongly correlated with the word's valence. In Experiment 2, individual differences in shyness and sociability elicited reliable differences in distance from the words. Experiment 3 validated the results of Experiments 1 and 2 using a demographically more diverse population of responders. Finally, Experiment 4 (along with Experiment 2) suggested that task demand is not a potential cause for scale recalibration. In Experiment 5, men and women placed a manikin closer or farther from words that showed sex differences in valence, highlighting the sensitivity of this measure to group differences. These findings shed a new light on interactions among affect, language, and individual differences, and demonstrate the utility of a new tool for measuring word affect. (PsycINFO Database Record
Collapse
Affiliation(s)
- Amy Beth Warriner
- Department of Psychology, Neuroscience & Behaviour, McMaster University
| | - David I Shore
- Department of Psychology, Neuroscience & Behaviour, McMaster University
| | - Louis A Schmidt
- Department of Psychology, Neuroscience & Behaviour, McMaster University
| | | | - Victor Kuperman
- Department of Psychology, Neuroscience & Behaviour, McMaster University
| |
Collapse
|
26
|
The principals of meaning: Extracting semantic dimensions from co-occurrence models of semantics. Psychon Bull Rev 2016; 23:1744-1756. [DOI: 10.3758/s13423-016-1053-2] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
27
|
|