1
|
childes-db: A flexible and reproducible interface to the child language data exchange system. Behav Res Methods 2019; 51:1928-1941. [PMID: 30623390 DOI: 10.3758/s13428-018-1176-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The Child Language Data Exchange System (CHILDES) has played a critical role in research on child language development, particularly in characterizing the early language learning environment. Access to these data can be both complex for novices and difficult to automate for advanced users, however. To address these issues, we introduce childes-db, a database-formatted mirror of CHILDES that improves data accessibility and usability by offering novel interfaces, including browsable web applications and an R application programming interface (API). Along with versioned infrastructure that facilitates reproducibility of past analyses, these interfaces lower barriers to analyzing naturalistic parent-child language, allowing for a wider range of researchers in language and cognitive development to easily leverage CHILDES in their work.
Collapse
|
|
6 |
25 |
2
|
Baker P, Brookes G, Atanasova D, Flint SW. Changing frames of obesity in the UK press 2008-2017. Soc Sci Med 2020; 264:113403. [PMID: 33017735 DOI: 10.1016/j.socscimed.2020.113403] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 09/09/2020] [Accepted: 09/24/2020] [Indexed: 11/25/2022]
Abstract
Obesity is a persistently newsworthy topic for the UK press and in recent years levels of coverage have increased. In this study, we examine the ways in which obesity has been framed by the press over a ten-year period (2008-2017), focussing both on areas of stability and change. The analysis is based on a ~36 million-word database of all UK newspaper articles mentioning the words 'obese' or 'obesity' published within this time frame and draws upon techniques from corpus linguistics - a collection of computational methods for examining recurrent linguistic patterns in large bodies of language data. Our analysis shows that, over time, obesity is represented increasingly as a biomedical problem that is both caused and should be prevented by individual action. Meanwhile, focus on wider environmental determinants of health, including the role of Government and the food industry, decreases over time. In the paper, we situate these trends within the wider context of UK society and argue that they both represent the increasing dominance of neoliberal models of health but also have the potential to contribute to weight stigma and the blaming of individuals. Accordingly, it is argued that the press should seek greater balance in its reporting of the potential causes of and solutions to obesity, as well as closer alignment with scientific evidence. By doing so, the press could begin to report on obesity in a way that raises useful public awareness around the topic and which challenges some of the stigma that currently attends to this social justice issue.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
13 |
3
|
Yadav H, Vaidya A, Shukla V, Husain S. Word Order Typology Interacts With Linguistic Complexity: A Cross-Linguistic Corpus Study. Cogn Sci 2020; 44:e12822. [PMID: 32223024 DOI: 10.1111/cogs.12822] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 12/26/2019] [Accepted: 01/17/2020] [Indexed: 11/28/2022]
Abstract
Much previous work has suggested that word order preferences across languages can be explained by the dependency distance minimization constraint (Ferrer-i Cancho, 2008, 2015; Hawkins, 1994). Consistent with this claim, corpus studies have shown that the average distance between a head (e.g., verb) and its dependent (e.g., noun) tends to be short cross-linguistically (Ferrer-i Cancho, 2014; Futrell, Mahowald, & Gibson, 2015; Liu, Xu, & Liang, 2017). This implies that on average languages avoid inefficient or complex structures for simpler structures. But a number of studies in psycholinguistics (Konieczny, 2000; Levy & Keller, 2013; Vasishth, Suckow, Lewis, & Kern, 2010) show that the comprehension system can adapt to the typological properties of a language, for example, verb-final order, leading to more complex structures, for example, having longer linear distance between a head and its dependent. In this paper, we conduct a corpus study for a group of 38 languages, which were either Subject-Verb-Object (SVO) or Subject-Object-Verb (SOV), in order to investigate the role of word order typology in determining syntactic complexity. We present results aggregated across all dependency types, as well as for specific verbal (objects, indirect objects, and adjuncts) and nonverbal (nominal, adjectival, and adverbial) dependencies. The results suggest that dependency distance in a language is determined by the default word order of a language, and crucially, the direction of a dependency (whether the head precedes the dependent or follows it; e.g., whether the noun precedes the verb or follows it). Particularly we show that in SOV languages (e.g., Hindi, Korean) as well as SVO languages (e.g., English, Spanish), longer linear distance (measured as number of words) between head and dependent arises in structures when they mirror the default word order of the language. In addition to showing results on linear distance, we also investigate the influence of word order typology on hierarchical distance (HD; measured as number of heads between head and dependent). The results for HD are similar to that of linear distance. At the same time, in comparison to linear distance, the influence of adaptability on HD seems less strong. In particular, the results show that most languages tend to avoid greater structural depth. Together, these results show evidence for "limited adaptability" to the default word order preferences in a language. Our results support a large body of work in the processing literature that highlights the importance of linguistic exposure and its interaction with working memory constraints in determining sentence complexity. Our results also point to the possible role of other factors such as the morphological richness of a language and a multifactor account of sentence complexity remains a promising area for future investigation.
Collapse
|
|
5 |
9 |
4
|
Crossley S, Heintz A, Choi JS, Batchelor J, Karimi M, Malatinszky A. A large-scaled corpus for assessing text readability. Behav Res Methods 2023; 55:491-507. [PMID: 35297016 PMCID: PMC10027808 DOI: 10.3758/s13428-022-01802-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2022] [Indexed: 11/08/2022]
Abstract
This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt's year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size, breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers' ratings of text difficulty for student readers. This paper discusses the development of the corpus and presents reliability metrics for the human ratings of readability.
Collapse
|
research-article |
2 |
9 |
5
|
Ou J, Wong IA, Huang GI. The coevolutionary process of restaurant CSR in the time of mega disruption. INTERNATIONAL JOURNAL OF HOSPITALITY MANAGEMENT 2021; 92:102684. [PMID: 33052164 PMCID: PMC7543789 DOI: 10.1016/j.ijhm.2020.102684] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 08/20/2020] [Accepted: 09/07/2020] [Indexed: 05/05/2023]
Abstract
This study investigates how US foodservice conglomerates have embarked on corporate social responsibility (CSR) measures to circumvent dire situations during the COVID-19 pandemic. It explores the evolution of CSR practices from restaurant enterprises to rescue and salvage their stakeholders. By analyzing press releases from ten restaurant chains in three different crisis phases (incubation, acceleration, and climax) through corpus linguistics, we identify a CSR progression mechanism that coevolves with the aftermath of the crisis among their stakeholders. This study improvises the CSR- as-process view to highlight the time-variant dynamic nature of CSR development over the course of major disruption.
Collapse
|
research-article |
4 |
9 |
6
|
Kostromitina M, Keller D, Cavusoglu M, Beloin K. "His lack of a mask ruined everything." Restaurant customer satisfaction during the COVID-19 outbreak: An analysis of Yelp review texts and star-ratings. INTERNATIONAL JOURNAL OF HOSPITALITY MANAGEMENT 2021; 98:103048. [PMID: 34493888 PMCID: PMC8412463 DOI: 10.1016/j.ijhm.2021.103048] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 06/15/2021] [Accepted: 08/04/2021] [Indexed: 05/28/2023]
Abstract
The aim of the study was to provide practical advice to restaurant managers for improving star ratings as well as information for researchers on how the pandemic has impacted established determinants of satisfaction. The study examined criteria used by restaurant customers in assigning star-ratings on Yelp during the COVID-19 pandemic using keyword analysis and Multiple Correspondence Analysis. In evaluating restaurants, the reviewers focused on service, overall experience, and food quality. Service was discussed in relation to the pandemic and included safety of the dine-in experience, contrasted with take-out options and compliance with COVID-19 guidelines. These criteria applied differently with lower-star reviews focusing on safety, social distancing, and mask policies. Higher-star reviews focused on take-out/delivery services, high-quality food, and an overall positive experience. The study provides valuable contributions to our understanding of how the COVID-19 pandemic will impact the restaurant sector in a post-pandemic world.
Collapse
|
research-article |
4 |
8 |
7
|
Huntley SJ, Mahlberg M, Wiegand V, van Gennip Y, Yang H, Dean RS, Brennan ML. Analysing the opinions of UK veterinarians on practice-based research using corpus linguistic and mathematical methods. Prev Vet Med 2017; 150:60-69. [PMID: 29406085 DOI: 10.1016/j.prevetmed.2017.11.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 11/20/2017] [Accepted: 11/22/2017] [Indexed: 11/18/2022]
Abstract
The use of corpus linguistic techniques and other related mathematical analyses have rarely, if ever, been applied to qualitative data collected from the veterinary field. The aim of this study was to explore the use of a combination of corpus linguistic analyses and mathematical methods to investigate a free-text questionnaire dataset collected from 3796 UK veterinarians on evidence-based veterinary medicine, specifically, attitudes towards practice-based research (PBR) and improving the veterinary knowledge base. The corpus methods of key word, concordance and collocate analyses were used to identify patterns of meanings within the free text responses. Key words were determined by comparing the questionnaire data with a wordlist from the British National Corpus (representing general English text) using cross-tabs and log-likelihood comparisons to identify words that occur significantly more frequently in the questionnaire data. Concordance and collocation analyses were used to account for the contextual patterns in which such key words occurred, involving qualitative analysis and Mutual Information Analysis (MI3). Additionally, a mathematical topic modelling approach was used as a comparative analysis; words within the free text responses were grouped into topics based on their weight or importance within each response to find starting points for analysis of textual patterns. Results generated from using both qualitative and quantitative techniques identified that the perceived advantages of taking part in PBR centred on the themes of improving knowledge of both individuals and of the veterinary profession as a whole (illustrated by patterns around the words learning, improving, contributing). Time constraints (lack of time, time issues, time commitments) were the main concern of respondents in relation to taking part in PBR. Opinions of what vets could do to improve the veterinary knowledge base focussed on the collecting and sharing of information (record, report), particularly recording and discussing clinical cases (interesting cases), and undertaking relevant continuing professional development activities. The approach employed here demonstrated how corpus linguistics and mathematical methods can help to both identify and contextualise relevant linguistic patterns in the questionnaire responses. The results of the study inform those seeking to coordinate PBR initiatives about the motivators of veterinarians to participate in such initiatives and what concerns need to be addressed. The approach used in this study demonstrates a novel way of analysing textual data in veterinary research.
Collapse
|
Journal Article |
8 |
4 |
8
|
Cleland J, Fahey Palma T. "Aspirations of people who come from state education are different": how language reflects social exclusion in medical education. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2018; 23:513-531. [PMID: 29368073 DOI: 10.1007/s10459-018-9809-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 01/03/2018] [Indexed: 06/07/2023]
Abstract
Despite repeated calls for change, the problem of widening access (WA) to medicine persists globally. One factor which may be operating to maintain social exclusion is the language used in representing WA applicants and students by the gatekeepers and representatives of medical schools, Admissions Deans. We therefore examined the institutional discourse of UK Medical Admissions Deans in order to determine how values regarding WA are communicated and presented in this context. We conducted a linguistic analysis of qualitative interviews with Admissions Deans and/or Staff from 24 of 32 UK medical schools. Corpus Linguistics data analysis determined broad patterns of frequency and word lists. This informed a critical discourse analysis of the data using an "othering" lens to explore and understand the judgements made of WA students by Admissions Deans, and the practices to which these judgments give rise. Representations of WA students highlighted existing divides and preconceptions in relation to WA programmes and students. Through using discourse that can be considered othering and divisive, issues of social divide and lack of integration in medicine were highlighted. Language served to reinforce pre-existing stereotypes and a significant 'us' and 'them' rhetoric exists in medical education. Even with drivers to achieve diversity and equality in medical education, existing social structures and preconceptions still influence the representations of applicants and students from outside the 'traditional' medical education model in the UK. Acknowledging this is a crucial step for medical schools wishing to address barriers to the perceived challenges to diversity.
Collapse
|
|
7 |
4 |
9
|
Menadue CB. Pandemics, epidemics, viruses, plagues, and disease: Comparative frequency analysis of a cultural pathology reflected in science fiction magazines from 1926 to 2015. ACTA ACUST UNITED AC 2020; 2:100048. [PMID: 34173491 PMCID: PMC7480741 DOI: 10.1016/j.ssaho.2020.100048] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 07/13/2020] [Accepted: 07/13/2020] [Indexed: 12/03/2022]
Abstract
Science fiction includes many dystopian narratives, often featuring epidemics, pandemics, plagues, viruses, and disease. As science fiction has grown in popularity and prevalence it appeals to an increasingly broad demographic, is employed in research communication and education, and as a genre it is frequently argued that it reflects contemporary cultural interests and concerns. To identify the relevance of science fiction as an indicator of popular trends relating to the pathologies of disease, a word frequency comparison of selected key words found in the Google Books 2012 English Corpus has been made to a representative corpus of science fiction magazines dating between 1926 and 2015. Selected issues were reviewed to identify concepts, situations, and outcomes that could readily be measured against real-world examples from current and recent pandemics. The findings indicate that science fiction does appear to mirror and magnify contemporary literary trends, and provides potentially revealing correlations to real-world historical events. In this regard, science fiction might be regarded as a form of ‘cultural pathology’ of popular interests related to the spread and impact of disease that may be valuable in gauging the degree to which society is engaged with these topics at any specific time.
Science fiction topics tend to reflect real-world historical events. Comparison of English corpus Google Books word frequencies to science fiction. Science fiction investigates social, cultural and psychological concerns. Science fiction content indicates a ‘cultural pathology’ of popular interests.
Collapse
|
Journal Article |
5 |
3 |
10
|
Webster L. "Erase/rewind": How transgender Twitter discourses challenge and (re)politicize lesbian identities. JOURNAL OF LESBIAN STUDIES 2021; 26:174-191. [PMID: 34617504 DOI: 10.1080/10894160.2021.1978369] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Competing views on the in/compatibility of transgender status and lesbian identity is a source of conflict in the ongoing antagonism over transgender recognition. Many individuals with different transgender identities might lay claim to lesbian identity or lesbian discourse(s) more generally. However, this inclusion has been disputed in some circles insofar as it is seen to challenge or contradict characteristics of lesbianism. This paper explores how transgender discourses might challenge and (re)politicize lesbianism and lesbian identities. Given that social media platforms concentrate minority communities in one space and can serve to exacerbate antagonism over identities, I focus in this paper specifically on the Twitter context. This paper uses corpus-informed critical discourse studies to explore how cognitive models of lesbianism are articulated in transgender Twitter discourse/s. Findings indicate that transgender Twitter users (re)articulate sociohistorical narratives in lesbian discourse/s. At the same time, however, they also challenge and (re)politicize the essentialism of sex and gender in relation to lesbian identity and social practice. Hence, transgender Twitter discourse/s reflect potential explanations for contesting transinclusion in lesbianism, which may serve to reinforce transexclusionary claims for retaining lesbianism's uniqueness as a female space and experience.
Collapse
|
|
4 |
3 |
11
|
Hörberg T, Larsson M, Olofsson JK. The Semantic Organization of the English Odor Vocabulary. Cogn Sci 2022; 46:e13205. [PMID: 36334010 DOI: 10.1111/cogs.13205] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 09/06/2022] [Accepted: 09/16/2022] [Indexed: 11/11/2022]
Abstract
The vocabulary for describing odors in English natural language is not well understood, as prior studies of odor descriptions have often relied on preselected descriptors and odor ratings. Here, we present a data-driven approach that automatically identifies English odor descriptors based on their degree of olfactory association, and derive their semantic organization from their distributions in natural texts, using a distributional-semantic language model. We identify 243 descriptors that are much more strongly associated with olfaction than English words in general. We then derive the semantic organization of these olfactory descriptors, and find that it is captured by four clusters that we name Offensive, Malodorous, Fragrant, and Edible. The semantic space derived from our model primarily differentiates descriptors in terms of pleasantness and edibility along which our four clusters are positioned, and is similar to a space derived from perceptual data. The semantic organization of odor vocabulary can thus be mapped using natural language data (e.g., online text), without the limitations of odor-perceptual data and preselected descriptors. Our method may thus facilitate research on olfaction, a sensory system known to often elude verbal description.
Collapse
|
|
3 |
3 |
12
|
Meylan SC, Griffiths TL. The Challenges of Large-Scale, Web-Based Language Datasets: Word Length and Predictability Revisited. Cogn Sci 2021; 45:e12983. [PMID: 34170030 DOI: 10.1111/cogs.12983] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 03/16/2021] [Accepted: 04/07/2021] [Indexed: 11/28/2022]
Abstract
Language research has come to rely heavily on large-scale, web-based datasets. These datasets can present significant methodological challenges, requiring researchers to make a number of decisions about how they are collected, represented, and analyzed. These decisions often concern long-standing challenges in corpus-based language research, including determining what counts as a word, deciding which words should be analyzed, and matching sets of words across languages. We illustrate these challenges by revisiting "Word lengths are optimized for efficient communication" (Piantadosi, Tily, & Gibson, 2011), which found that word lengths in 11 languages are more strongly correlated with their average predictability (or average information content) than their frequency. Using what we argue to be best practices for large-scale corpus analyses, we find significantly attenuated support for this result and demonstrate that a stronger relationship obtains between word frequency and length for a majority of the languages in the sample. We consider the implications of the results for language research more broadly and provide several recommendations to researchers regarding best practices.
Collapse
|
Journal Article |
4 |
2 |
13
|
Boholm M. Risk and Quantification: A Linguistic Study. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2019; 39:1243-1261. [PMID: 30586167 DOI: 10.1111/risa.13258] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 10/10/2018] [Accepted: 11/25/2018] [Indexed: 06/09/2023]
Abstract
In risk analysis and research, the concept of risk is often understood quantitatively. For example, risk is commonly defined as the probability of an unwanted event or as its probability multiplied by its consequences. This article addresses (1) to what extent and (2) how the noun risk is actually used quantitatively. Uses of the noun risk are analyzed in four linguistic corpora, both Swedish and English (mostly American English). In total, over 16,000 uses of the noun risk are studied in 14 random (n = 500) or complete samples (where n ranges from 173 to 5,144) of, for example, news and magazine articles, fiction, and websites of government agencies. In contrast to the widespread definition of risk as a quantity, a main finding is that the noun risk is mostly used nonquantitatively. Furthermore, when used quantitatively, the quantification is seldom numerical, instead relying on less precise expressions of quantification, such as high risk and increased risk. The relatively low frequency of quantification in a wide range of language material suggests a quantification bias in many areas of risk theory, that is, overestimation of the importance of quantification in defining the concept of risk. The findings are also discussed in relation to fuzzy-trace theory. Findings of this study confirm, as suggested by fuzzy-trace theory, that vague representations are prominent in quantification of risk. The application of the terminology of fuzzy-trace theory for explaining the patterns of language use are discussed.
Collapse
|
|
6 |
2 |
14
|
Altamimi S. Disseminating knowledge: A discourse analysis of terrorism in TED talks. Heliyon 2021; 7:e06312. [PMID: 33665458 PMCID: PMC7907810 DOI: 10.1016/j.heliyon.2021.e06312] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 12/22/2020] [Accepted: 02/15/2021] [Indexed: 11/22/2022] Open
Abstract
This study aims to investigate the linguistic mechanism of disseminating knowledge about terrorism by professionals to laypersons in TED Talks. The study examines the interface between knowledge, meaning and social practices in terms of text and context when speakers cognitively reconceptualize terrorism discourse as a professional practice and maintain their stance over social issues. Drawing on a multidisciplinary approach of discourse analysis and corpus linguistics, the study sets out to analyse the discursive representation of terrorism in TED talks delivered between 2002 and 2019, focusing on explanation strategies of definition, description, denomination and metaphor. The results revealed that TED talks' discourse was a less popularised genre regarding terrorism, marked by specialised terms of traditional right discourse of military actions, and impersonal reference for private intentions of building up expert identity.
Collapse
|
research-article |
4 |
2 |
15
|
Luoto S, van Cranenburgh A. Psycholinguistic dataset on language use in 1145 novels published in English and Dutch. Data Brief 2021; 34:106655. [PMID: 33385024 PMCID: PMC7772540 DOI: 10.1016/j.dib.2020.106655] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 12/06/2020] [Accepted: 12/10/2020] [Indexed: 11/04/2022] Open
Abstract
This dataset includes psycholinguistic data on 694 English-language and 451 Dutch-language novels, acquired with computerised analysis of digitised novels published mainly between 1800 and 2018. The English-language novels have a total word count of 66.9 million words, while the Dutch-language novels comprise 49.6 million words, therefore offering large, representative samples for both languages. The data provided in this article include 93 linguistic and psycholinguistic outcome variables for the English-language novels, acquired using Linguistic Inquiry and Word Count (LIWC) version 2015, and 68 linguistic and psycholinguistic outcome variables for the Dutch-language novels, acquired using Linguistic Inquiry and Word Count (LIWC) version 2001. The dataset also includes word frequencies (unigram and bigram) for each novel. The metadata for each novel include year of publication, authors’ nationality, sex, age at publication, and sexual orientation (the latter only in the English-language dataset), making it possible for researchers to study the data along these parameters. The use of these data can help researchers illuminate how word use reflects psychological processes in more than two centuries of literary art in English and in contemporary Dutch novels.
Collapse
|
|
4 |
1 |
16
|
Maatz A, Ilg Y. The Ins and Outs of 'Schizophrenia': Considering Diagnostic Terms as Ordinary Linguistic Expressions. THE JOURNAL OF MEDICAL HUMANITIES 2021; 42:387-404. [PMID: 32002725 DOI: 10.1007/s10912-019-09587-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Diagnostic terms in psychiatry like 'schizophrenia' and 'bipolar disorder' are deeply contested in the professional community, by mental health activists and the public. In this paper, we provide a theoretical framework for considering diagnostic terms as ordinary linguistic expressions and illustrate this approach by a corpus linguistic analysis of 'schizophrenia.' Our aim is to show how a focus on language itself can inform current and future debates about psychiatric terminology and provide new insights on relevant processes concerning their actual usage and change over time. We hope that this contributes to enhancing mutual understanding between different discourse spheres and stakeholders.
Collapse
|
|
4 |
1 |
17
|
McClaughlin E, Elliott S, Jewitt S, Smallman-Raynor M, Dunham S, Parnell T, Clark M, Tarlinton R. UK flockdown: A survey of smallscale poultry keepers and their understanding of governmental guidance on highly pathogenic avian influenza (HPAI). Prev Vet Med 2024; 224:106117. [PMID: 38277819 DOI: 10.1016/j.prevetmed.2024.106117] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/16/2023] [Accepted: 01/08/2024] [Indexed: 01/28/2024]
Abstract
The scale of the current outbreak of highly pathogenic avian influenza (HPAI) due to the A/H5N1 virus in the United Kingdom is unprecedented. In addition to its economic impact on the commercial poultry sector, the disease has devastated wild bird colonies and represents a potential public health concern on account of its zoonotic potential. Although the implementation of biosecurity measures is paramount to reducing the spread of HPAI in domestic and commercial settings, little is known about the attitudes and perspectives of backyard poultry keepers, who often keep their flocks in close proximity to the public. A large nationwide survey of backyard poultry keepers was undertaken in December 2021-March 2022, contemporaneous with the enforcement of an Avian Influenza Prevention Zone (AIPZ) and additional housing measures in England, Scotland and Wales. The survey explored keepers' understanding of the clinical manifestations of HPAI, compliance with housing and biosecurity measures, attitudes towards obligatory culling on confirmation of HPAI in their flocks, and the potential use of vaccination to control HPAI. Summary statistical analysis of the closed question responses was supplemented with qualitative data analysis and corpus linguistic approaches to draw out key themes and salient patterns in responses to open text questions. Survey responses were received from 1559 small-scale poultry keepers across the United Kingdom. Awareness of the HPAI outbreak was very high (99.0%). The majority of respondents learned of it via social media (53%), with Defra (49.7%), British Hen Welfare Trust (33.8%) and the APHA (22.0%) identified as the principal sources of information. Analysis revealed that backyard keepers lacked knowledge of the clinical signs of avian influenza and legal requirements relating to compliance with biosecurity measures. Some respondents dismissed the seriousness of HPAI and were unwilling to comply with the measures in force. The issue of obligatory culling proved highly emotive, and some expressed a lack of trust in authorities. Most respondents (93.1%) indicated a willingness to pay for vaccination if the option was available. Communications on biosecurity measures that are relevant to large-scale industrial setups are inappropriate for backyard contexts. Understanding the barriers that backyard keepers face is essential if official agencies are to communicate biosecurity information effectively to such groups. Lack of trust in authorities is likely to make elimination of the virus in the UK difficult. We make recommendations for tailoring HPAI-related information for backyard contexts, to aid future HPAI control measures in the UK.
Collapse
|
|
1 |
1 |
18
|
Heaton D, Nichele E, Clos J, Fischer JE. "ChatGPT says no": agency, trust, and blame in Twitter discourses after the launch of ChatGPT. AI AND ETHICS 2024; 5:653-675. [PMID: 39959574 PMCID: PMC11828844 DOI: 10.1007/s43681-023-00414-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 12/18/2023] [Indexed: 02/18/2025]
Abstract
ChatGPT, a chatbot using the GPT-n series large language model, has surged in popularity by providing conversation, assistance, and entertainment. This has raised questions about its agency and resulting implications on trust and blame, particularly when concerning its portrayal on social media platforms like Twitter. Understanding trust and blame is crucial for gauging public perception, reliance on, and adoption of AI-driven tools like ChatGPT. To explore ChatGPT's perceived status as an algorithmic social actor and uncover implications for trust and blame through agency and transitivity, we examined 88,058 tweets about ChatGPT, published in a 'hype period' between November 2022 and March 2023, using Corpus Linguistics and Critical Discourse Analysis, underpinned by Social Actor Representation. Notably, ChatGPT was presented in tweets as a social actor on 87% of occasions, using personalisation and agency metaphor to emphasise its role in content creation, information dissemination, and influence. However, a dynamic presentation, oscillating between a creative social actor and an information source, reflected users' uncertainty regarding its capabilities and, thus, blame attribution occurred. On 13% of occasions, ChatGPT was presented passively through backgrounding and exclusion. Here, the emphasis on ChatGPT's role in informing and influencing underscores interactors' reliance on it for information, bearing implications for information dissemination and trust in AI-generated content. Therefore, this study contributes to understanding the perceived social agency of decision-making algorithms and their implications on trust and blame, valuable to AI developers and policymakers and relevant in comprehending and dealing with power dynamics in today's age of AI.
Collapse
|
research-article |
1 |
1 |
19
|
Dialogic Priming and Dynamic Resonance in Autism: Creativity Competing with Engagement in Chinese Children with ASD. J Autism Dev Disord 2022; 53:2458-2474. [PMID: 35355175 DOI: 10.1007/s10803-022-05505-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/24/2022] [Indexed: 10/18/2022]
Abstract
A growing body of research has focused on the relationship between priming and engagement through dialogue (e.g. Tantucci and Wang in Appl Linguist 43(1):115-146, 2022; Mikulincer et al. in Cognit Emotion 25:519-531, 2011). The present study addresses this issue also in relation to creativity and provides a new applied model to measure intersubjective engagement in ASD vs neurotypical populations' speech. We compared two balanced corpora of naturalistic Mandarin interaction of typically developing children and children diagnosed with ASD (cf. Zhou and Zhang in Xueqian jiaoyu yanjiu [Stud Preschool Educ] 6:72-84, 2020). We fitted a mixed effects linear regression showing that in both neurotypical and ASD populations, dialogic priming significantly correlates with engagement and with whether the child could creatively re-use the original input to produce a new construction. What we found is that creativity and intersubjective engagement are in competition in children with ASD in contrast with the neurotypical population. This finding points to a relatively impeded ability in ASD to re-combine creatively a priming input during the here-and-now of a dialogic event.
Collapse
|
|
3 |
1 |
20
|
Lenart I, Markovina I. Differences of kindergarten children's linguistic picture of the world: focus on Hungary, Russia, and Laos. Heliyon 2021; 7:e05940. [PMID: 33644430 PMCID: PMC7895709 DOI: 10.1016/j.heliyon.2021.e05940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 08/19/2020] [Accepted: 01/07/2021] [Indexed: 11/25/2022] Open
Abstract
Three-to-five-year-old Laotian kindergarten children, native speakers of the Lao language were investigated in order to map the peculiarities of their picture of the world through their word associations. Results were contrasted to a previous comparative study of Hungarian and Russian kindergarteners of the same age aiming at revealing linguistic and cultural differences and similarities in this age group in the three aforementioned countries. Theories and methods of the Moscow School of Psycholinguistics were utilized for the cross-cultural comparison based on a Vygotskian cultural-historical approach, on Leontiev's speech activity theory, on the concept of verbal consciousness (linguistic picture of the world) and on the association experiment. A pedagogical perspective was incorporated through the application of the Conception of Childhood theory and the shoulder-to-shoulder method. Linguistic data gained during the association experiment was analysed by Sketch Engine, an online corpus linguistics research tool. The outcome of the investigation is a unique set of associations that on the one hand proves the overlapping of Lao children's picture of the world with Russian and Hungarian kindergarteners, on the other hand, sheds light on distinctive, culture- and language-specific characteristics of Laotian kindergarten children's verbal consciousness.
Collapse
|
|
4 |
0 |
21
|
Karlińska A. Textual strategies of forensic psychiatrists. A corpus-based analysis of how the language of psychiatry is reconciled with the language of law in polish forensic psychiatric opinions. INTERNATIONAL JOURNAL OF LAW AND PSYCHIATRY 2021; 74:101652. [PMID: 33302060 DOI: 10.1016/j.ijlp.2020.101652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 06/02/2020] [Accepted: 11/11/2020] [Indexed: 06/12/2023]
Abstract
The role of an expert forensic psychiatrist is likened with that of a translator: their task is to translate the language of medicine into the language of law. The aim of the article was to reconstruct the textual strategies adopted by forensic psychiatrists in terms of reconciling the discourses of law and medicine. The analysis covered 65 opinions/reports issued at a psychiatric reference centre in Poland. Thanks to the application of the innovative corpus linguistics methodology, the singularities of forensic psychiatric opinions as a genre have been captured and the degree of its conventionalisation has been assessed. The findings indicate that psychiatric opinions have not yet achieved the status of a homogenous genre, and the standardisation and formalisation processes have only reached the structural level. The expert psychiatrists constrained the presence of the author's voice and did not use the narrative form in their opinions. The analysis also captured the ethical challenges related to the dual role of forensic psychiatrists as medical doctors and representatives of the judicial system.
Collapse
|
|
4 |
|
22
|
Zaini MF, Sarudin A, Muhammad MM, Osman Z, Mohamed Redzwan HF, Al-Muhsin MA. House building tips (HBT) corpus dataset as a resource to discover Malay architectural ingenuity and identity. Data Brief 2021; 36:107013. [PMID: 33898671 PMCID: PMC8054094 DOI: 10.1016/j.dib.2021.107013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/12/2021] [Accepted: 03/23/2021] [Indexed: 11/28/2022] Open
Abstract
House Building Tips is the title of a classic text containing historical information on early house construction in Malay communities. These tips were written by a scholar with knowledge of house construction through observation of the surrounding environment. In Malaysia, written sources or records of house construction are scarce and underexposed. As such, this research was conducted to guarantee the written legacy of the construction of Malay houses. The purpose of this paper is to introduce a statistical data source of house building tips that is laden with Malay ingenuity and identity. The wordlists generated from this study can become a source of reference for the field of Malay architecture. Accordingly, this study utilises the quantitative method by applying the Linguistic Corpus Statistical Approach; these data utilise specific corpus development procedures, beginning with text collection, scanning and cleaning processes, text annotation, and data storing in plain text. Next, the data analysis procedure utilises a corpus software, LancsBox, to generate specialised wordlists. The bubble graphs are developed based on these wordlists through the Tableau software, and illustrate the most used lexical items with the raw and relative frequency values. This facilitates searches for, and the reading of, architectural words and architectural word references. These data represent written sources that need to be preserved and become points of reference concerning Malay architectural ingenuity and identity.
Collapse
|
Journal Article |
4 |
|
23
|
Xu Q, Chodorow M, Valian V. How infants' utterances grow: A probabilistic account of early language development. Cognition 2023; 230:105275. [PMID: 36215764 DOI: 10.1016/j.cognition.2022.105275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 08/27/2022] [Accepted: 08/29/2022] [Indexed: 11/27/2022]
Abstract
Why are children's first utterances short and ungrammatical, with some obvious constructions missing? What determines the lengthening of children's early utterances over time? The literature is replete with references to a one-word, a two-word, and a later multiword stage in language development, but with little empirical evidence, and with little account for how and why utterances grow. To address these questions, we analyze speech samples from 25 children between the ages of 14 and 43 months; we construct distributions of their utterances of lengths one to five by age. Our novel findings are that multiword utterances of different lengths appear early in acquisition and increase together until they reach relatively stable proportions similar to those found in parents' input. To explain such patterns, we develop a probabilistic computational model, VIRTUAL, that posits an interaction between a) varying, increasing resources from various developmental domains and b) target utterance lengths mirroring the input. VIRTUAL successfully accounts for most of the empirical patterns, suggesting a probabilistic and dynamic process that is nonetheless compatible with apparent distinct milestones in development. We provide a new, systematic way of showing how developmental cascade theories could work in language development. Our findings and model also suggest insights into syntactic, semantic, and cognitive development.
Collapse
|
|
2 |
|
24
|
Taboada M. Reported speech and gender in the news: Who is quoted, how are they quoted, and why it matters. DISCOURSE & COMMUNICATION 2025; 19:93-113. [PMID: 40013237 PMCID: PMC11863382 DOI: 10.1177/17504813241281713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]
Abstract
News stories have a well-defined generic structure, consisting of components such as headline, lede, and body, with reported speech a prominent feature, especially in hard news stories. Reported speech serves multiple purposes, from providing evidentiality and intertextuality to contributing to the construction of newsworthiness and to the context creation of news. It is also a site of potential bias in who is cited and how, including with respect to the gender of sources. Using a large corpus of English-language news stories for all of 2023 from the main five mainstream news outlets in Canada (over 370,000 articles from news websites), I examine the gender distribution of those quoted, the syntactic variation in the structure of quotes, and the types of reporting verbs. The study provides a comprehensive overview of the extend of gender bias in contemporary Canadian news, at the same time offering insights into the nature of reported speech in modern news and how it endures and evolves, including in news meant for digital-only publication.
Collapse
|
research-article |
1 |
|
25
|
Woodin G, Winter B. Numbers in Context: Cardinals, Ordinals, and Nominals in American English. Cogn Sci 2024; 48:e13471. [PMID: 38895756 PMCID: PMC11475258 DOI: 10.1111/cogs.13471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 04/18/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024]
Abstract
There are three main types of number used in modern, industrialized societies. Cardinals count sets (e.g., people, objects) and quantify elements of conventional scales (e.g., money, distance), ordinals index positions in ordered sequences (e.g., years, pages), and nominals serve as unique identifiers (e.g., telephone numbers, player numbers). Many studies that have cited number frequencies in support of claims about numerical cognition and mathematical cognition hinge on the assumption that most numbers analyzed are cardinal. This paper is the first to investigate the relative frequencies of different number types, presenting a corpus analysis of morphologically unmarked numbers (not, e.g., "eighth" or "21st") in which we manually annotated 3,600 concordances in the Corpus of Contemporary American English. Overall, cardinals are dominant-both pure cardinals (sets) and measurements (scales)-except in the range 1,000-10,000, which is dominated by ordinal years, like 1996 and 2004. Ordinals occur less often overall, and nominals even less so. Only for cardinals do round numbers, associated with approximation, dominate overall and increase with magnitude. In comparison with other registers, academic writing contains a lower proportion of measurements as well as a higher proportion of ordinals and, to some extent, nominals. In writing, pure cardinals and measurements are usually represented as number words, but measurements-especially larger, unround ones-are more likely to be numerals. Ordinals and nominals are mostly represented as numerals. Altogether, this paper reveals how numbers are used in American English, establishing an initial baseline for any analyses of number frequencies and shedding new light on the cognitive and psychological study of number.
Collapse
|
research-article |
1 |
|