1
|
Rühlemann C, Barthel M. Word frequency and cognitive effort in turns-at-talk: turn structure affects processing load in natural conversation. Front Psychol 2024; 15:1208029. [PMID: 38899128 PMCID: PMC11186443 DOI: 10.3389/fpsyg.2024.1208029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 05/13/2024] [Indexed: 06/21/2024] Open
Abstract
Frequency distributions are known to widely affect psycholinguistic processes. The effects of word frequency in turns-at-talk, the nucleus of social action in conversation, have, by contrast, been largely neglected. This study probes into this gap by applying corpus-linguistic methods on the conversational component of the British National Corpus (BNC) and the Freiburg Multimodal Interaction Corpus (FreMIC). The latter includes continuous pupil size measures of participants of the recorded conversations, allowing for a systematic investigation of patterns in the contained speech and language on the one hand and their relation to concurrent processing costs they may incur in speakers and recipients on the other hand. We test a first hypothesis in this vein, analyzing whether word frequency distributions within turns-at-talk are correlated with interlocutors' processing effort during the production and reception of these turns. Turns are found to generally show a regular distribution pattern of word frequency, with highly frequent words in turn-initial positions, mid-range frequency words in turn-medial positions, and low-frequency words in turn-final positions. Speakers' pupil size is found to tend to increase during the course of a turn at talk, reaching a climax toward the turn end. Notably, the observed decrease in word frequency within turns is inversely correlated with the observed increase in pupil size in speakers, but not in recipients, with steeper decreases in word frequency going along with steeper increases in pupil size in speakers. We discuss the implications of these findings for theories of speech processing, turn structure, and information packaging. Crucially, we propose that the intensification of processing effort in speakers during a turn at talk is owed to an informational climax, which entails a progression from high-frequency, low-information words through intermediate levels to low-frequency, high-information words. At least in English conversation, interlocutors seem to make use of this pattern as one way to achieve efficiency in conversational interaction, creating a regularly recurring distribution of processing load across speaking turns, which aids smooth turn transitions, content prediction, and effective information transfer.
Collapse
Affiliation(s)
| | - Mathias Barthel
- Pragmatics Department, Leibniz Institute for the German Language (IDS), Mannheim, Germany
| |
Collapse
|
2
|
Hulme RC, Begum A, Nation K, Rodd JM. Diversity of narrative context disrupts the early stage of learning the meanings of novel words. Psychon Bull Rev 2023; 30:2338-2350. [PMID: 37369974 PMCID: PMC10728247 DOI: 10.3758/s13423-023-02316-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/23/2023] [Indexed: 06/29/2023]
Abstract
High quality lexical representations develop through repeated exposures to words in different contexts. This preregistered experiment investigated how diversity of narrative context affects the earliest stages of word learning via reading. Adults (N = 100) learned invented meanings for eight pseudowords, which each occurred in five written paragraphs either within a single coherent narrative context or five different narrative contexts. The words' semantic features were controlled across conditions to avoid influences from polysemy (lexical ambiguity). Posttests included graded measures of word-form recall (spelling accuracy) and recognition (multiple choice), and word-meaning recall (number of semantic features). Diversity of narrative context did not affect word-form learning, but more semantic features were correctly recalled for words trained in a single context. These findings indicate that learning the meanings of novel words is initially boosted by anchoring them to a single coherent narrative discourse.
Collapse
Affiliation(s)
- Rachael C Hulme
- Department of Experimental Psychology, Division of Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Anisha Begum
- Department of Experimental Psychology, Division of Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| | - Kate Nation
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Jennifer M Rodd
- Department of Experimental Psychology, Division of Psychology and Language Sciences, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| |
Collapse
|
3
|
Johns BT, Taler V, Jones MN. Contextual dynamics in lexical encoding across the ageing spectrum: A simulation study. Q J Exp Psychol (Hove) 2023; 76:2164-2182. [PMID: 36458499 PMCID: PMC10466941 DOI: 10.1177/17470218221145685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 10/18/2022] [Accepted: 10/19/2022] [Indexed: 08/23/2023]
Abstract
The field of psycholinguistics has recently questioned the primacy of word frequency (WF) in influencing word recognition and production, instead focusing on the importance of a word's contextual diversity (CD). WF is operationalised by counting the number of occurrences of a word in a corpus, while a word's CD is a count of the number of contexts that a word occurs in, with repetitions within a context being ignored. Numerous studies have converged on the conclusion that CD is a better predictor of word recognition latency and accuracy than frequency. These findings support a cognitive mechanism based on the principle of likely need over the principle of repetition in lexical organisation. In the current study, we trained the semantic distinctiveness model on communication patterns in social media platforms consisting of over 55-billion-word tokens and examined the ability of theoretically distinct models to explain word recognition latency and accuracy data from over 1 million participants from the Mandera et al. English Crowdsourding Project norms, consisting of approximately 59,000 words across six age bands ranging from ages 10 to 60 years. There was a clear quantitative trend across the age bands, where there is a shift from a social environment-based attention mechanism in the "younger" models, to a clear dominance for a discourse-based attention mechanism as models "aged." This pattern suggests that there is a dynamical interaction between the cognitive mechanisms of lexical organisation and environmental information that emerges across ageing.
Collapse
Affiliation(s)
- Brendan T Johns
- Department of Psychology, McGill University, Montreal, Quebec, Canada
| | | | | |
Collapse
|
4
|
Dossey E, Jones Z, Clopper CG. Relative Contributions of Social, Contextual, and Lexical Factors in Speech Processing. LANGUAGE AND SPEECH 2023; 66:322-353. [PMID: 35787020 DOI: 10.1177/00238309221107870] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
This exploratory study examined the simultaneous interactions and relative contributions of bottom-up social information (regional dialect, speaking style), top-down contextual information (semantic predictability), and the internal dynamics of the lexicon (neighborhood density, lexical frequency) to lexical access and word recognition. Cross-modal matching and intelligibility in noise tasks were conducted with a community sample of adults at a local science museum. Each task featured one condition in which keywords were presented in isolation and one condition in which they were presented within a multiword phrase. Lexical processing was slower and more accurate when keywords were presented in their phrasal context, and was both faster and more accurate for auditory stimuli produced in the local Midland dialect. In both tasks, interactions were observed among stimulus dialect, speaking style, semantic predictability, phonological neighborhood density, and lexical frequency. These interactions revealed that bottom-up social information and top-down contextual information contribute more to speech processing than the internal dynamics of the lexicon. Moreover, the relatively stronger bottom-up social effects were observed in both the isolated word and multiword phrase conditions, suggesting that social variation is central to speech processing, even in non-interactive laboratory tasks. At the same time, the specific interactions observed differed between the two experiments, reflecting task-specific demands related to processing time constraints and signal degradation.
Collapse
Affiliation(s)
- Ellen Dossey
- Department of Linguistics, The Ohio State University, USA
| | - Zack Jones
- Department of Linguistics, The Ohio State University, USA
| | | |
Collapse
|
5
|
Context Availability and Sentence Availability Ratings for 3,000 English Words and their Association with Lexical Processing. J Cogn 2022; 5:20. [PMID: 36072106 PMCID: PMC9400651 DOI: 10.5334/joc.211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 02/18/2022] [Indexed: 11/20/2022] Open
Abstract
Words that can be easily placed in contexts are more easily processed, yet norms for context availability are limited. Here, participants rated 3,000 words for context availability and sentence availability, a new metric predicted to capture information relating to textual variation. Both variables were investigated alongside other word-level characteristics to explore lexical-semantic space. Analyses demonstrated that context availability and sentence availability are distinct. Context availability covaries with concreteness and imageability, while sentence availability captures information relating to contextual variation, frequency and ambiguity. Analyses of megastudy data showed that both context availability and sentence availability uniquely facilitated lexical decision performance.
Collapse
|
6
|
Vickery B, Fogerty D, Dubno JR. Phonological and semantic similarity of misperceived words in babble: Effects of sentence context, age, and hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:650. [PMID: 35105039 PMCID: PMC8807001 DOI: 10.1121/10.0009367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 01/03/2022] [Accepted: 01/08/2022] [Indexed: 05/29/2023]
Abstract
This study investigated how age and hearing loss influence the misperceptions made when listening to sentences in babble. Open-set responses to final words in sentences with low and high context were analyzed for younger adults with normal hearing and older adults with normal or impaired hearing. All groups performed similarly in overall accuracy but differed in error type. Misperceptions for all groups were analyzed according to phonological and semantic properties. Comparisons between groups indicated that misperceptions for older adults were more influenced by phonological factors. Furthermore, older adults with hearing loss omitted more responses. Overall, across all groups, results suggest that phonological confusions most explain misperceptions in low context sentences. In high context sentences, the meaningful sentence context appears to provide predictive cues that reduce misperceptions. When misperceptions do occur, responses tend to have greater semantic similarity and lesser phonological similarity to the target, compared to low context sentences. In this way, semantic similarity may index a postdictive process by which ambiguities due to phonological confusions are resolved to conform to the semantic context of the sentence. These patterns demonstrate that context, age, and hearing loss affect the misperceptions, and potential sentence interpretation, made when listening to sentences in babble.
Collapse
Affiliation(s)
- Blythe Vickery
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, South Carolina 29208, USA
| | - Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign, Illinois 61801, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| |
Collapse
|
7
|
Accounting for item-level variance in recognition memory: Comparing word frequency and contextual diversity. Mem Cognit 2021; 50:1013-1032. [PMID: 34811640 DOI: 10.3758/s13421-021-01249-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2021] [Indexed: 11/08/2022]
Abstract
Contextual diversity modifies word frequency by ignoring the repetition of words in context (Adelman, Brown, & Quesada, 2006, Psychological Science, 17(9), 814-823). Semantic diversity modifies contextual diversity by taking into account the uniqueness of the contexts that a word occurs in when calculating lexical strength (Jones, Johns, & Recchia, 2012, Canadian Journal of Experimental Psychology, 66, 115-124). Recent research has demonstrated that measures based on contextual and semantic diversity provide a considerable improvement over word frequency when accounting for lexical organization data (Johns, 2021, Psychological Review, 128, 525-557; Johns, Dye, & Jones, 2020a, Quarterly Journal of Experimental Psychology, 73, 841-855). The article demonstrates that these same findings generalize to word-level episodic recognition rates, using the previously released data of Cortese, Khanna, and Hacker (Cortese et al., 2010, Memory, 18, 595-609) and Cortese, McCarty, and Schock (Cortese et al., 2015, Quarterly Journal of Experimental Psychology, 68, 1489-1501). It was found that including the best fitting contextual diversity model allowed for a very large increase in variance accounted for over previously used variables, such as word frequency, signalling commonality with results from the lexical organization literature. The findings of this article suggest that current trends in the collection of megadata sets of human behavior (e.g., Balota et al., 2007, Behavior Research Methods, 39(3), 445-459) provide a promising avenue to develop new theoretically oriented models of word-level episodic recognition data.
Collapse
|
8
|
Morgan AM, Ferreira VS. Beyond input: Language learners produce novel relative clause types without exposure. JOURNAL OF COGNITIVE PSYCHOLOGY 2021; 33:483-517. [PMID: 34484658 DOI: 10.1080/20445911.2021.1928678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Syntax famously consists of abstract hierarchical representations, essentially instructions for combining words into larger units like sentences. Less famously, most theories of syntax also assume a higher level of abstract representation. Representations at this level comprise instructions for creating the hierarchical representations used to create sentences. To date, however there is no experimental evidence for this additional level of abstraction. Here, we explain why the existence of such representations would imply that, under certain circumstances, speakers should be able to produce structures they have never been exposed to, and we test this prediction directly. We ask: Given the right type of input, can speakers learn a syntactic structure without direct exposure? In particular, different types of relative clauses have different surface word orders. These may be represented in two ways: with many individual representations or one general representation. If the latter, then learning one type of relative clause amounts to learning all types. We teach participants a novel grammar for only some relative clause types (e.g., just subject relative clauses) and test their knowledge of other types (e.g., object relative clauses). Across experiments, participants consistently produced untrained types, implicating the existence of this higher level of abstract syntactic knowledge.
Collapse
Affiliation(s)
- Adam M Morgan
- NYU School of Medicine, Department of Neurology, 227 E 30th St, 8th Floor, New York NY 10016 USA
| | - Victor S Ferreira
- UC San Diego, Department of Psychology, 9500 Gilman Dr., La Jolla CA 92093 USA
| |
Collapse
|
9
|
Schwering SC, Ghaffari-Nikou NM, Zhao F, Niedenthal PM, MacDonald MC. Exploring the Relationship Between Fiction Reading and Emotion Recognition. AFFECTIVE SCIENCE 2021; 2:178-186. [PMID: 36043173 PMCID: PMC9382981 DOI: 10.1007/s42761-021-00034-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 01/27/2021] [Indexed: 05/15/2023]
Abstract
Fiction reading experience affects emotion recognition abilities, yet the causal link remains underspecified. Current theory suggests fiction reading promotes the simulation of fictional minds, which supports emotion recognition skills. We examine the extent to which contextualized statistical experience with emotion category labels in language is associated with emotion recognition. Using corpus analyses, we demonstrate fiction texts reliably use emotion category labels in an emotive sense (e.g., cry of relief), whereas other genres often use alternative senses (e.g., hurricane relief fund). Furthermore, fiction texts were shown to be a particularly reliable source of information about complex emotions. The extent to which these patterns affect human emotion concepts was analyzed in two behavioral experiments. In experiment 1 (n = 134), experience with fiction text predicted recognition of emotions employed in an emotive sense in fiction texts. In experiment 2 (n = 387), fiction reading experience predicted emotion recognition abilities, overall. These results suggest that long-term language experience, and fiction reading, in particular, supports emotion concepts through exposure to these emotions in context.
Collapse
Affiliation(s)
| | | | - Fangyun Zhao
- Department of Psychology, University of Wisconsin-Madison, Madison, WI USA
| | | | | |
Collapse
|
10
|
Abstract
Previous research has speculated that semantic diversity and lexical ambiguity may be closely related constructs. Our research sought to test this claim in respect of the semantic diversity measure proposed by Hoffman et al. (2013). To this end, we replicated the procedure described by Hoffman et al., Behavior Research Methods, 45(3), 718–730 (2013) for computing multidimensional representations of contextual information using Latent Semantic Analysis, and from these we derived semantic diversity values for 28,555 words. We then replicated the facilitatory effect of semantic diversity on word recognition using existing data resources and observed this effect to be greater for low-frequency words. Yet, we found no relationship between this measure and lexical ambiguity effects in word recognition. Further analysis of the LSA-based contextual representations used to compute Hoffman et al. (2013) measure of semantic diversity revealed that they do not capture the distinct meanings of ambiguous words. Instead, these contextual representations appear to capture general information about the topics and types of written material in which words occur. These analyses suggest that the semantic diversity metric previously proposed by Hoffman et al. (2013) facilitates word recognition because high-diversity words are likely to have been encountered no matter what one has read, whereas many participants may not have encountered lower-diversity words simply because the topics and types of written material in which they occur are more restricted.
Collapse
|
11
|
Semantic diversity in paired-associate learning: Further evidence for the information accumulation perspective of cognitive aging. Psychon Bull Rev 2020; 27:114-121. [PMID: 31823297 DOI: 10.3758/s13423-019-01691-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Normal aging is often associated with a performance decline on various cognitive tests, including paired associate learning (PAL), where participants are asked to learn and recall arbitrary word pairs. While many studies have taken this as evidence to support the notion of age-related deficits in cognitive processing, Ramscar, Hendrix, Shaoul, Milin, and Baayen (Topics in Cognitive Science, 6(1), 5-42) and Ramscar, Sun, Hendrix, and Baayen (Psychological Science, 28(8), 1171-1179, 2017) posit that the decline in performance on various cognitive tasks can be explained by the accumulation of linguistic knowledge over time. To demonstrate this, Ramscar et al. (2017) found that older bilingual participants outperformed monolingual counterparts on a verbal PAL task, proposed to be due to bilinguals having accumulated less information about the words used in the study. However, comparing bilinguals to monolinguals introduces confounding factors. For example, bilingual's better performance may be due to superior executive functioning. To minimize these between-subject confounds, the current study used a within-subject design in order to examine the influence of linguistic experience on paired associate learning in younger and older adults. Linguistic experience was modeled using a semantic diversity measure of word strength (Jones, Johns, & Recchia, 2012). When frequency is controlled for, high semantic diversity words are associated to a greater number of words and have a higher average strength of association. In the current study, PAL performance of older adults was significantly lower for word pairs involving high semantic diversity words, while their performance did not differ for low semantic diversity words, consistent with the information accumulation perspective of aging.
Collapse
|
12
|
The effects of contextual diversity on incidental vocabulary learning in the native and a foreign language. Sci Rep 2020; 10:13967. [PMID: 32811966 PMCID: PMC7435265 DOI: 10.1038/s41598-020-70922-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 08/03/2020] [Indexed: 11/09/2022] Open
Abstract
Vocabulary learning occurs throughout the lifespan, often implicitly. For foreign language learners, this is particularly challenging as they must acquire a large number of new words with little exposure. In the present study, we explore the effects of contextual diversity—namely, the number of texts a word appears in—on native and foreign language word learning. Participants read several texts that had novel pseudowords replacing high-frequency words. The total number of encounters with the novel words was held constant, but they appeared in 1, 2, 4, or 8 texts. In addition, some participants read the texts in Spanish (their native language) and others in English (their foreign language). We found that increasing contextual diversity improved recall and recognition of the word, as well as the ability to match the word with its meaning while keeping comprehension unimpaired. Using a foreign language only affected performance in the matching task, where participants had to quickly identify the meaning of the word. Results are discussed in the greater context of the word learning and foreign language literature as well as their importance as a teaching tool.
Collapse
|
13
|
Johns BT, Dye M, Jones MN. Estimating the prevalence and diversity of words in written language. Q J Exp Psychol (Hove) 2020; 73:841-855. [PMID: 31826715 DOI: 10.1177/1747021819897560] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Recently, a new crowd-sourced language metric has been introduced, entitled word prevalence, which estimates the proportion of the population that knows a given word. This measure has been shown to account for unique variance in large sets of lexical performance. This article aims to build on the work of Brysbaert et al. and Keuleers et al. by introducing new corpus-based metrics that estimate how likely a word is to be an active member of the natural language environment, and hence known by a larger subset of the general population. This metric is derived from an analysis of a newly collected corpus of over 25,000 fiction and non-fiction books and will be shown that it is capable of accounting for significantly more variance than past corpus-based measures.
Collapse
Affiliation(s)
- Brendan T Johns
- Department of Communicative Disorders and Sciences, University at Buffalo, Buffalo, NY, USA
| | - Melody Dye
- University of California, Berkeley, CA, USA
| | | |
Collapse
|
14
|
The influence of place and time on lexical behavior: A distributional analysis. Behav Res Methods 2019; 51:2438-2453. [DOI: 10.3758/s13428-019-01289-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
15
|
|
16
|
Johns BT. Mining a Crowdsourced Dictionary to Understand Consistency and Preference in Word Meanings. Front Psychol 2019; 10:268. [PMID: 30833917 PMCID: PMC6387934 DOI: 10.3389/fpsyg.2019.00268] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 01/28/2019] [Indexed: 12/02/2022] Open
Abstract
Big data approaches to psychology have become increasing popular (Jones, 2017). Two of the main developments of this line of research is the advent of distributional models of semantics (e.g., Landauer and Dumais, 1997), which learn the meaning of words from large text corpora, and the collection of mega datasets of human behavior (e.g., The English lexicon project; Balota et al., 2007). The current article combines these two approaches, with the goal being to understand the consistency and preference that people have for word meanings. This was accomplished by mining a large amount of data from an online, crowdsourced dictionary and analyzing this data with a distributional model. Overall, it was found that even for words that are not an active part of the language environment, there is a large amount of consistency in the word meanings that different people have. Additionally, it was demonstrated that users of a language have strong preferences for word meanings, such that definitions to words that do not conform to people’s conceptions are rejected by a community of language users. The results of this article provides insights into the cultural evolution of word meanings, and sheds light on alternative methodologies that can be used to understand lexical behavior.
Collapse
Affiliation(s)
- Brendan T Johns
- Department of Communicative Disorders and Sciences, University at Buffalo, Buffalo, NY, United States
| |
Collapse
|
17
|
On the predictive validity of various corpus-based frequency norms in L2 English lexical processing. Behav Res Methods 2018; 50:1-25. [PMID: 29340969 DOI: 10.3758/s13428-017-1001-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The predictive validity of various corpus-based frequency norms in first-language lexical processing has been intensively investigated in previous research, but less attention has been paid to this issue in second-language (L2) processing. To bridge the gap, in the present study we took English as a case in point and compared the predictive power of a large set of corpus-based frequency norms for the performance of an L2 English visual lexical decision task (LDT). Our results showed that, in general, the frequency norms from SUBTLEX-US and WorldLex-Blog tended to predict L2 performance better in reaction times, whereas the frequency norms from corpora with a mixture of written and spoken genres (CELEX, WorldLex-Blog, BNC, ANC, and COCA) tended to predict L2 accuracy better. Although replicated in both low- and high-proficiency L2 English learners, these patterns were not exactly the same as those found in LDT data from native English speakers. In addition, we only observed some limited advantages of the lemma frequency and contextual diversity measures over the wordform frequency measure in predicting L2 lexical processing. The results of the present study, especially the detailed comparisons among the different corpora, provide methodological implications for future L2 lexical research.
Collapse
|
18
|
|
19
|
Abstract
In a series of analyses over mega datasets, Jones, Johns, and Recchia (Canadian Journal of Experimental Psychology, 66(2), 115-124, 2012) and Johns et al. (Journal of the Acoustical Society of America, 132:2, EL74-EL80, 2012) found that a measure of contextual diversity that takes into account the semantic variability of a word's contexts provided a better fit to both visual and spoken word recognition data than traditional measures, such as word frequency or raw context counts. This measure was empirically validated with an artificial language experiment (Jones et al.). The present study extends the empirical results with a unique natural language learning paradigm, which allows for an examination of the semantic representations that are acquired as semantic diversity is varied. Subjects were incidentally exposed to novel words as they rated short selections from articles, books, and newspapers. When novel words were encountered across distinct discourse contexts, subjects were both faster and more accurate at recognizing them than when they were seen in redundant contexts. However, learning across redundant contexts promoted the development of more stable semantic representations. These findings are predicted by a distributional learning model trained on the same materials as our subjects.
Collapse
|
20
|
Estimating the average need of semantic knowledge from distributional semantic models. Mem Cognit 2017; 45:1350-1370. [DOI: 10.3758/s13421-017-0732-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
21
|
Grimm R, Cassani G, Gillis S, Daelemans W. Facilitatory Effects of Multi-Word Units in Lexical Processing and Word Learning: A Computational Investigation. Front Psychol 2017; 8:555. [PMID: 28450842 PMCID: PMC5390038 DOI: 10.3389/fpsyg.2017.00555] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 03/27/2017] [Indexed: 11/13/2022] Open
Abstract
Previous studies have suggested that children and adults form cognitive representations of co-occurring word sequences. We propose (1) that the formation of such multi-word unit (MWU) representations precedes and facilitates the formation of single-word representations in children and thus benefits word learning, and (2) that MWU representations facilitate adult word recognition and thus benefit lexical processing. Using a modified version of an existing computational model (McCauley and Christiansen, 2014), we extract MWUs from a corpus of child-directed speech (CDS) and a corpus of conversations among adults. We then correlate the number of MWUs within which each word appears with (1) age of first production and (2) adult reaction times on a word recognition task. In doing so, we take care to control for the effect of word frequency, as frequent words will naturally tend to occur in many MWUs. We also compare results to a baseline model which randomly groups words into sequences-and find that MWUs have a unique facilitatory effect on both response variables, suggesting that they benefit word learning in children and word recognition in adults. The effect is strongest on age of first production, implying that MWUs are comparatively more important for word learning than for adult lexical processing. We discuss possible underlying mechanisms and formulate testable predictions.
Collapse
Affiliation(s)
- Robert Grimm
- Department of Linguistics, Computational Linguistics and Psycholinguistics Research Center, University of AntwerpAntwerp, Belgium
| | | | | | | |
Collapse
|
22
|
|
23
|
Abstract
Recent studies have demonstrated that when contextual diversity is controlled token word frequency has minimal effects on visual word recognition. With the exception of a single experiment by Plummer, Perea, & Rayner (2014, Journal of Experimental Psychology: Learning, Memory, and Cognition, 40, 275-283), those studies have examined words in isolation. The current studies address two potential limitations of the Plummer et al. experiment. First, because Plummer et al. used different sentence frames for words in different conditions, the effects might be due to uncontrolled differences on the sentences. Second, the absence of a frequency effect might be attributed to comparing higher and lower frequency words within a limited range. Three eye-tracking experiments examined effects of contextual diversity and frequency on Mandarin Chinese, a logographic language, for words embedded in the normal sentences. In Experiment 1, yoked words were rotated through the same sentence frame. Experiments 2a and 2b used a design similar to Plummer et al., which allows use of a larger sample of words to compare results between experiments with a smaller and larger difference in log frequency (0.41 and 1.06, respectively). In all three experiments, first-pass and later eye movement measures were significantly shorter for targets with higher contextual diversity than for targets with lower contextual diversity, with no effects of frequency.
Collapse
|
24
|
Johns BT, Sheppard CL, Jones MN, Taler V. The Role of Semantic Diversity in Word Recognition across Aging and Bilingualism. Front Psychol 2016; 7:703. [PMID: 27458392 PMCID: PMC4937810 DOI: 10.3389/fpsyg.2016.00703] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2016] [Accepted: 04/26/2016] [Indexed: 11/16/2022] Open
Abstract
Frequency effects are pervasive in studies of language, with higher frequency words being recognized faster than lower frequency words. However, the exact nature of frequency effects has recently been questioned, with some studies finding that contextual information provides a better fit to lexical decision and naming data than word frequency (Adelman et al., 2006). Recent work has cemented the importance of these results by demonstrating that a measure of the semantic diversity of the contexts that a word occurs in provides a powerful measure to account for variability in word recognition latency (Johns et al., 2012, 2015; Jones et al., 2012). The goal of the current study is to extend this measure to examine bilingualism and aging, where multiple theories use frequency of occurrence of linguistic constructs as central to accounting for empirical results (Gollan et al., 2008; Ramscar et al., 2014). A lexical decision experiment was conducted with four groups of subjects: younger and older monolinguals and bilinguals. Consistent with past results, a semantic diversity variable accounted for the greatest amount of variance in the latency data. In addition, the pattern of fits of semantic diversity across multiple corpora suggests that bilinguals and older adults are more sensitive to semantic diversity information than younger monolinguals.
Collapse
Affiliation(s)
- Brendan T. Johns
- Department of Communicative Disorders and Sciences, University at Buffalo, BuffaloNY, USA
| | | | - Michael N. Jones
- Department of Psychological and Brain Sciences, Indiana University, BloomingtonIN, USA
| | - Vanessa Taler
- Bruyère Research Institute, OttawaON, Canada
- School of Psychology, University of Ottawa, OttawaON, Canada
| |
Collapse
|
25
|
Soares AP, Machado J, Costa A, Iriarte Á, Simões A, de Almeida JJ, Comesaña M, Perea M. On the advantages of word frequency and contextual diversity measures extracted from subtitles: The case of Portuguese. Q J Exp Psychol (Hove) 2014; 68:680-96. [PMID: 25263599 DOI: 10.1080/17470218.2014.964271] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
We examined the potential advantage of the lexical databases using subtitles and present SUBTLEX-PT, a new lexical database for 132,710 Portuguese words obtained from a 78 million corpus based on film and television series subtitles, offering word frequency and contextual diversity measures. Additionally we validated SUBTLEX-PT with a lexical decision study involving 1920 Portuguese words (and 1920 nonwords) with different lengths in letters (M = 6.89, SD = 2.10) and syllables (M = 2.99, SD = 0.94). Multiple regression analyses on latency and accuracy data were conducted to compare the proportion of variance explained by the Portuguese subtitle word frequency measures with that accounted by the recent written-word frequency database (Procura-PALavras; P-PAL; Soares, Iriarte, et al., 2014 ). As its international counterparts, SUBTLEX-PT explains approximately 15% more of the variance in the lexical decision performance of young adults than the P-PAL database. Moreover, in line with recent studies, contextual diversity accounted for approximately 2% more of the variance in participants' reading performance than the raw frequency counts obtained from subtitles. SUBTLEX-PT is freely available for research purposes (at http://p-pal.di.uminho.pt/about/databases ).
Collapse
Affiliation(s)
- Ana Paula Soares
- a Human Cognition Lab, CIPsi, School of Psychology , University of Minho , Minho , Portugal
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Plummer P, Perea M, Rayner K. The influence of contextual diversity on eye movements in reading. J Exp Psychol Learn Mem Cogn 2014; 40:275-83. [PMID: 23937235 PMCID: PMC4040263 DOI: 10.1037/a0034058] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent research has shown contextual diversity (i.e., the number of passages in which a given word appears) to be a reliable predictor of word processing difficulty. It has also been demonstrated that word-frequency has little or no effect on word recognition speed when accounting for contextual diversity in isolated word processing tasks. An eye-movement experiment was conducted wherein the effects of word-frequency and contextual diversity were directly contrasted in a normal sentence reading scenario. Subjects read sentences with embedded target words that varied in word-frequency and contextual diversity. All 1st-pass and later reading times were significantly longer for words with lower contextual diversity compared to words with higher contextual diversity when controlling for word-frequency and other important lexical properties. Furthermore, there was no difference in reading times for higher frequency and lower frequency words when controlling for contextual diversity. The results confirm prior findings regarding contextual diversity and word-frequency effects and demonstrate that contextual diversity is a more accurate predictor of word processing speed than word-frequency within a normal reading task. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Collapse
Affiliation(s)
- Patrick Plummer
- Department of Psychology, University of California, San Diego
| | - Manuel Perea
- Departamento de Metodología, Universitat de València
| | - Keith Rayner
- Department of Psychology, University of California, San Diego
| |
Collapse
|