1
|
Van Huynh A. Effect of IUCN Red List category on public attention to mammals. CONSERVATION BIOLOGY : THE JOURNAL OF THE SOCIETY FOR CONSERVATION BIOLOGY 2023; 37:e14050. [PMID: 36661058 DOI: 10.1111/cobi.14050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 05/30/2023]
Abstract
Cultural data is a powerful tool to analyze public awareness of key societal issues, including the conservation of nature. I used two publicly available repositories of cultural data, Google Trends and Google Ngram, to quantify the effect of the International Union for the Conservation of Nature (IUCN) Red List conservation status on public attention toward 4539 mammal species. With Google Trends, I calculated whether Google searches for their common and scientific names have been increasing or decreasing over time. I also ran an anomaly detection analysis to investigate whether a change in red-list status directly results in an increase in Google searches. Additionally, I quantified the mentions of species' common and scientific names in English texts with Google Ngram. Overall, Google searches for most mammal species remained at similar levels or increased since 2008. The severity of species' IUCN Red List status was a significant predictor of increasing Google searches, although the effect size was relatively small. Red-list status seemed strongly confounded with mammal body size. Species that moved to a higher-risk category spiked significantly in Google searches directly after the new designation. The mention of species' common names in the Google Ngram's English 2019 corpus significantly increased as the red-list category increased. These results provide valuable insight into the importance of the IUCN Red List for increasing public awareness and the usefulness of publicly available cultural data on examining the effectiveness of specific conservation efforts and thus evaluating targets for support and funding.
Collapse
Affiliation(s)
- Alex Van Huynh
- Department of Biology, Desales University, Center Valley, Pennsylvania, USA
| |
Collapse
|
2
|
Lawson DJ, Solanki V, Yanovich I, Dellert J, Ruck D, Endicott P. CLARITY: comparing heterogeneous data using dissimilarity. ROYAL SOCIETY OPEN SCIENCE 2021; 8:202182. [PMID: 34909208 PMCID: PMC8652278 DOI: 10.1098/rsos.202182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 10/29/2021] [Indexed: 06/14/2023]
Abstract
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation versus expression, evolution of language sounds versus word use, and country-level economic metrics versus cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a 'structural' component analogous to a clustering, and an underlying 'relationship' between those structures. This allows a 'structural comparison' between two similarity matrices using their predictability from 'structure'. Significance is assessed with the help of re-sampling appropriate for each dataset. The software, CLARITY, is available as an R package from github.com/danjlawson/CLARITY.
Collapse
Affiliation(s)
- Daniel J. Lawson
- Institute of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK
- Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, UK
| | | | - Igor Yanovich
- Department of English and American Studies, Vienna University, Vienna, Austria
| | - Johannes Dellert
- Seminar für Sprachwissenschaft; DFG Center ‘Words, Bones, Genes, Tools’, University of Tübingen, Tübingen, Germany
| | - Damian Ruck
- Department of Anthropology, University of Tennessee, Knoxville, TN, USA
| | - Phillip Endicott
- Unité Eco-Anthropologie (EA), Muséum National d’Histoire Naturelle, 17 place du Trocadero, Paris 75016, France
| |
Collapse
|
3
|
He G, Chen Y, Wang S, Dong Y, Ju G, Chen B. The Association Between PM 2.5 and Depression in China. Dose Response 2020; 18:1559325820942699. [PMID: 32733175 PMCID: PMC7370340 DOI: 10.1177/1559325820942699] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 03/07/2020] [Accepted: 03/11/2020] [Indexed: 01/08/2023] Open
Abstract
While China has been experiencing unprecedented economic growth, depression is becoming one of the most striking social and mental health problems in recent years. Such a paradox to progress may partially be due to the notoriously poor air quality of the country. To verify this argument, we constructed an index of the prevalence of depression (IPD) using internet search query volumes in Baidu to proxy the potential depression and examined how IPD is associated with PM2.5, the major air pollutant in China. Our results from 2-way fixed effects models reveal that a 100 μg·m−3 increase in previous week’s PM2.5 in a city is significantly associated with 0.279 increase in its IPD, comparable to 7.34 hours decrease in weekly daylight, and such relationship is particularly pronounced in the spring and summer and in East and South areas. Our findings of large-scale pattern suggest that PM2.5 at current levels in China poses serious mental health risks.
Collapse
Affiliation(s)
- Guangye He
- School of Social and Behavioral Sciences, Nanjing University, Nanjing, China
| | - Yunsong Chen
- The Johns Hopkins University-Nanjing University Center for Chinese and American Studies, Nanjing, China
| | - Senhu Wang
- University of Cambridge, Cambridge, United Kingdom
| | - Yiqun Dong
- School of Social and Behavioral Sciences, Nanjing University, Nanjing, China
| | - Guodong Ju
- School of Social and Behavioral Sciences, Nanjing University, Nanjing, China
| | - Buwei Chen
- The First Affiliated Hospital with Nanjing Medical University, Nanjing, China
| |
Collapse
|
4
|
Chen Y, He G, Chen B, Wang S, Ju G, Ge T. The association between PM2.5 exposure and suicidal ideation: a prefectural panel study. BMC Public Health 2020; 20:293. [PMID: 32138702 PMCID: PMC7059660 DOI: 10.1186/s12889-020-8409-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 02/25/2020] [Indexed: 11/15/2022] Open
Abstract
Background Suicidal ideation is subject to serious underestimation among existing public health studies. While numerous factors have been recognized in affecting suicidal thoughts and behaviors (STB), the associated environmental risks have been poorly understood. Foremost among the various environment risks were air pollution, in particular, the PM2.5. The present study attempted to examine the relationship between PM2.5 level and local weekly index of suicidal ideation (ISI). Methods Using Internet search query volumes in Baidu (2017), the largest internet search engine in China, we constructed a prefectural panel data (278 prefectures, 52 weeks) and employed dynamic panel GMM system estimation to analyze the relationship between weekly concentration of PM2.5 (Mean = 87 μg·m− 3) and the index of suicidal ideation (Mean = 49.9). Results The results indicate that in the spring and winter, a 10 μg·m− 3 increase in the prior week’s PM2.5 in a Chinese city is significantly associated with 0.020 increase in ISI in spring and a 0.007 increase in ISI in winter, after taking account other co-pollutants and meteorological conditions. Conclusion We innovatively proposed the measure of suicidal ideation and provided suggestive evidence of a positive association between suicidal ideation and PM2.5 level.
Collapse
Affiliation(s)
- Yunsong Chen
- Johns Hopkins University-Nanjing University Center for Chinese and American Studies, Gulou District, Nanjing, 210093, China.
| | - Guangye He
- School of Social and Behavioral Sciences, Nanjing University, 163 Xianlin Road, Qixia District, Nanjing, 210023, China.
| | - Buwei Chen
- The First Affiliated Hospital with Nanjing Medical University, Nanjing, 210029, China.
| | - Senhu Wang
- The University of Cambridge, 16 Mill Lane, Cambridge, CB2 1SB, UK
| | - Guodong Ju
- School of Social and Behavioral Sciences, Nanjing University, 163 Xianlin Road, Qixia District, Nanjing, 210023, China
| | - Ting Ge
- School of Social and Behavioral Sciences, Nanjing University, 163 Xianlin Road, Qixia District, Nanjing, 210023, China
| |
Collapse
|
5
|
Ruck DJ, Bentley RA, Lawson DJ. Cultural prerequisites of socioeconomic development. ROYAL SOCIETY OPEN SCIENCE 2020; 7:190725. [PMID: 32257300 PMCID: PMC7062048 DOI: 10.1098/rsos.190725] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 12/16/2019] [Indexed: 05/05/2023]
Abstract
In the centuries since the enlightenment, the world has seen an increase in socioeconomic development, measured as increased life expectancy, education, economic development and democracy. While the co-occurrence of these features among nations is well documented, little is known about their origins or co-evolution. Here, we compare this growth of prosperity in nations to the historical record of cultural values in the twentieth century, derived from global survey data. We find that two cultural factors, secular-rationality and cosmopolitanism, predict future increases in GDP per capita, democratization and secondary education enrollment. The converse is not true, however, which indicates that secular-rationality and cosmopolitanism are among the preconditions for socioeconomic development to emerge.
Collapse
Affiliation(s)
- Damian J. Ruck
- Department of Anthropology, University of Tennessee, 1621 Cumberland Avenue, Knoxville, TN 37996, USA
- Center for the Dynamics of Social Complexity, University of Tennessee, 403B Austin Peay, Knoxville, TN 37996, USA
- College of Communication and Information, University of Tennessee, 1345 Circle Park Drive, Knoxville, TN 37996, USA
| | - R. Alexander Bentley
- Department of Anthropology, University of Tennessee, 1621 Cumberland Avenue, Knoxville, TN 37996, USA
- Center for the Dynamics of Social Complexity, University of Tennessee, 403B Austin Peay, Knoxville, TN 37996, USA
| | - Daniel J. Lawson
- Population Health Science Institute, University of Bristol, Oakfield Grove, Bristol BS8 2BN, UK
- Institute of Statistical Sciences, University of Bristol, Fry Building, Woodland Road, Bristol BS8 1TH, UK
| |
Collapse
|
6
|
Schulz D, Bahník Š. Gender associations in the twentieth-century English-language literature. JOURNAL OF RESEARCH IN PERSONALITY 2019. [DOI: 10.1016/j.jrp.2019.05.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Younes N, Reips UD. Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms. PLoS One 2019; 14:e0213554. [PMID: 30901329 PMCID: PMC6430395 DOI: 10.1371/journal.pone.0213554] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 02/24/2019] [Indexed: 11/19/2022] Open
Abstract
The Google Books Ngram Viewer (Google Ngram) is a search engine that charts word frequencies from a large corpus of books and thereby allows for the examination of cultural change as it is reflected in books. While the tool's massive corpus of data (about 8 million books or 6% of all books ever published) has been used in various scientific studies, concerns about the accuracy of results have simultaneously emerged. This paper reviews the literature and serves as a guideline for improving Google Ngram studies by suggesting five methodological procedures suited to increase the reliability of results. In particular, we recommend the use of (I) different language corpora, (II) cross-checks on different corpora from the same language, (III) word inflections, (IV) synonyms, and (V) a standardization procedure that accounts for both the influx of data and unequal weights of word frequencies. Further, we outline how to combine these procedures and address the risk of potential biases arising from censorship and propaganda. As an example of the proposed procedures, we examine the cross-cultural expression of religion via religious terms for the years 1900 to 2000. Special emphasis is placed on the situation during World War II. In line with the strand of literature that emphasizes the decline of collectivistic values, our results suggest an overall decrease of religion's importance. However, religion re-gains importance during times of crisis such as World War II. By comparing the results obtained through the different methods, we illustrate that applying and particularly combining our suggested procedures increase the reliability of results and prevents authors from deriving wrong assumptions.
Collapse
Affiliation(s)
- Nadja Younes
- Department of Psychology, University of Konstanz, Konstanz, Germany
- * E-mail:
| | | |
Collapse
|
8
|
Wheeler MA, McGrath MJ, Haslam N. Twentieth century morality: The rise and fall of moral concepts from 1900 to 2007. PLoS One 2019; 14:e0212267. [PMID: 30811461 PMCID: PMC6392263 DOI: 10.1371/journal.pone.0212267] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 01/30/2019] [Indexed: 12/05/2022] Open
Abstract
Trends in the cultural salience of morality across the 20th century in the Anglophone world, as reflected in changing use of moral language, were explored using the Google Books (English language) database. Relative frequencies of 304 moral terms, organized into six validated sets corresponding to general morality and the five moral domains proposed by moral foundations theory, were charted for the years 1900 to 2007. Each moral language set displayed unique, often nonlinear historical trajectories. Words conveying general morality (e.g., good, bad, moral, evil), and those representing Purity-based morality, implicating sanctity and contagion, declined steeply in frequency from 1900 to around 1980, when they rebounded sharply. Ingroup-based morality, emphasizing group loyalty, rose steadily over the 20th century. Harm-based morality, focused on suffering and care, rose sharply after 1980. Authority-based morality, which emphasizes respect for hierarchy and tradition, rose to a peak around the social convulsions of the late 1960s. There were no consistent tendencies for moral language to become more individualist or less grounded in concern for social order and cohesion. These differing time series suggest that the changing moral landscape of the 20th century can be divided into five distinct periods and illuminate the re-moralization and moral polarization of the last three decades.
Collapse
Affiliation(s)
- Melissa A. Wheeler
- Department of Management and Marketing, Faculty of Business and Economics, University of Melbourne, Victoria, Australia
- Melbourne School of Psychological Sciences, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Victoria, Australia
| | - Melanie J. McGrath
- Melbourne School of Psychological Sciences, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Victoria, Australia
| | - Nick Haslam
- Melbourne School of Psychological Sciences, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Victoria, Australia
- * E-mail:
| |
Collapse
|
9
|
Chen Y, Yan F. International visibility as determinants of foreign direct investment: An empirical study of Chinese Provinces. SOCIAL SCIENCE RESEARCH 2018; 76:23-39. [PMID: 30268281 DOI: 10.1016/j.ssresearch.2018.08.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 07/08/2018] [Accepted: 08/11/2018] [Indexed: 06/08/2023]
Abstract
While previous studies use economic and institutional variables to explain transnational investment operations, we argue that regionally-specific international visibility can significantly influence the investment decisions of foreign firms with spatial and temporal dynamics. Empirically, we extract the usage frequency of the names of all of the Chinese provinces in millions of English-language books from Google Books N-gram corpus to construct the index of international visibility as a proxy measurement of international prominence. Results from dynamic panel data analysis (1994-2004) using the Generalized Method of Moments demonstrate that the level of international visibility of a province has a positive effect on the inflows of foreign direct investments, controlling for a set of economic and institutional factors. Further analyses show that this visibility effect varies with different state images of China formed in various historical periods and is stronger with regard to inland provinces compared to coastal provinces. Our results are robust across alternative corpora and different model specifications.
Collapse
Affiliation(s)
- Yunsong Chen
- Department of Sociology, Nanjing University, China; Hopkins-Nanjing Center, Nanjing University, China.
| | - Fei Yan
- Department of Sociology, Tsinghua University, China.
| |
Collapse
|
10
|
Rheault L, Beelen K, Cochrane C, Hirst G. Measuring Emotion in Parliamentary Debates with Automated Textual Analysis. PLoS One 2016; 11:e0168843. [PMID: 28006016 PMCID: PMC5179059 DOI: 10.1371/journal.pone.0168843] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 12/07/2016] [Indexed: 11/23/2022] Open
Abstract
An impressive breadth of interdisciplinary research suggests that emotions have an influence on human behavior. Nonetheless, we still know very little about the emotional states of those actors whose daily decisions have a lasting impact on our societies: politicians in parliament. We address this question by making use of methods of natural language processing and a digitized corpus of text data spanning a century of parliamentary debates in the United Kingdom. We use this approach to examine changes in aggregate levels of emotional polarity in the British parliament, and to test a hypothesis about the emotional response of politicians to economic recessions. Our findings suggest that, contrary to popular belief, the mood of politicians has become more positive during the past decades, and that variations in emotional polarity can be predicted by the state of the national economy.
Collapse
Affiliation(s)
- Ludovic Rheault
- Department of Political Science, University of Toronto, Toronto, Canada
- * E-mail:
| | - Kaspar Beelen
- Informatics Institute, University of Amsterdam, Amsterdam, Netherlands
| | | | - Graeme Hirst
- Department of Computer Science, University of Toronto, Toronto, Canada
| |
Collapse
|
11
|
Morin O, Acerbi A. Birth of the cool: a two-centuries decline in emotional expression in Anglophone fiction. Cogn Emot 2016; 31:1663-1675. [PMID: 27910735 DOI: 10.1080/02699931.2016.1260528] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The presence of emotional words and content in stories has been shown to enhance a story's memorability, and its cultural success. Yet, recent cultural trends run in the opposite direction. Using the Google Books corpus, coupled with two metadata-rich corpora of Anglophone fiction books, we show a decrease in emotionality in English-speaking literature starting plausibly in the nineteenth century. We show that this decrease cannot be explained by changes unrelated to emotionality (such as demographic dynamics concerning age or gender balance, changes in vocabulary richness, or changes in the prevalence of literary genres), and that, in our three corpora, the decrease is driven almost entirely by a decline in the proportion of positive emotion-related words, while the frequency of negative emotion-related words shows little if any decline. Consistently with previous studies, we also find a link between ageing and negative emotionality at the individual level.
Collapse
Affiliation(s)
- Olivier Morin
- a Max Planck Institute for the Science of Human History , Jena , Germany
| | - Alberto Acerbi
- b School of Innovation Sciences , Eindhoven University of Technology , Eindhoven , The Netherlands
| |
Collapse
|
12
|
Chen Y, Yan F. Economic performance and public concerns about social class in twentieth-century books. SOCIAL SCIENCE RESEARCH 2016; 59:37-51. [PMID: 27480370 DOI: 10.1016/j.ssresearch.2016.04.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 01/28/2016] [Accepted: 04/04/2016] [Indexed: 06/06/2023]
Abstract
What is the association between macroeconomic conditions and public perceptions of social class? Applying a novel approach based on the Google Books N-gram corpus, this study addresses the relationship between public concerns about social class and economic conditions throughout the twentieth century. The usage of class-related words/phrases, or "literary references to class," in American English-language books is related to US economic performance and income inequality. The findings of this study demonstrate that economic conditions play a significant role in literary references to class throughout the century, whereas income inequality does not. Similar results are obtained from further analyses using alternative measures of class concerns as well as different corpora of English Fiction and the New York Times. We add to the social class literature by showing that the long-term temporal dynamics of an economy can be exhibited by aggregate class concerns. The application of massive culture-wide content analysis using data of unprecedented size also represents a contribution to the literature.
Collapse
Affiliation(s)
- Yunsong Chen
- Department of Sociology, Nanjing University, China.
| | - Fei Yan
- Department of Sociology, Tsinghua University, China; The Walter H. Shorenstein Asia-Pacific Research Center, Stanford University, USA.
| |
Collapse
|
13
|
Skrebyte A, Garnett P, Kendal JR. Temporal Relationships Between Individualism–Collectivism and the Economy in Soviet Russia. JOURNAL OF CROSS-CULTURAL PSYCHOLOGY 2016. [DOI: 10.1177/0022022116659540] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Collectivism and individualism are commonly used to delineate societies that differ in their cultural values and patterns of social behavior, prioritizing the relative importance of the group and the individual, respectively. Collectivist and individualist expression is likely to be intricately linked with the political and economic history of a society. Scholars have proposed mechanisms for both positive and negative correlations between economic growth and a culture of either individualism or collectivism. Here, we consider these relationships across the dramatic history of 20th- and early 21st-century Russia (1901-2009), spanning the late Russian Empire, the communist state, and the growth of capitalism. We sample Russian speakers to identify common Russian words expressing individualism or collectivism, and examine the changing frequencies of these terms in Russian publications collected in Google’s Ngram corpus. We correlate normalized individualism and collectivism expression against published estimates of economic growth (GDP and net material product [NMP]) available between 1961 and 1995, finding high collectivist expression and economic growth rate followed by the correlated decline of both prior to the end of Soviet system. Temporal trends in the published expression of individualism and collectivism, in addition to their correlations with estimated economic growth rates, are examined in relation to the change in economic and political structures, ideology and public discourse. We also compare our sampled Russian-language terms for individualism and collectivism with Twenge et al.’s equivalent collection from American English speakers.
Collapse
|
14
|
Koplenig A, Müller-Spitzer C. Population Size Predicts Lexical Diversity, but so Does the Mean Sea Level --Why It Is Important to Correctly Account for the Structure of Temporal Data. PLoS One 2016; 11:e0150771. [PMID: 26938719 PMCID: PMC4777502 DOI: 10.1371/journal.pone.0150771] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/17/2016] [Indexed: 11/19/2022] Open
Abstract
In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering.
Collapse
Affiliation(s)
- Alexander Koplenig
- Department of Lexical Studies, Institute for the German language (IDS), Mannheim, Germany
| | - Carolin Müller-Spitzer
- Department of Lexical Studies, Institute for the German language (IDS), Mannheim, Germany
| |
Collapse
|
15
|
Samothrakis S, Fasli M. Emotional Sentence Annotation Helps Predict Fiction Genre. PLoS One 2015; 10:e0141922. [PMID: 26524352 PMCID: PMC4629906 DOI: 10.1371/journal.pone.0141922] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 10/14/2015] [Indexed: 11/19/2022] Open
Abstract
Fiction, a prime form of entertainment, has evolved into multiple genres which one can broadly attribute to different forms of stories. In this paper, we examine the hypothesis that works of fiction can be characterised by the emotions they portray. To investigate this hypothesis, we use the work of fictions in the Project Gutenberg and we attribute basic emotional content to each individual sentence using Ekman's model. A time-smoothed version of the emotional content for each basic emotion is used to train extremely randomized trees. We show through 10-fold Cross-Validation that the emotional content of each work of fiction can help identify each genre with significantly higher probability than random. We also show that the most important differentiator between genre novels is fear.
Collapse
Affiliation(s)
- Spyridon Samothrakis
- Institute for Analytics and Data Science, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, United Kingdom
- * E-mail:
| | - Maria Fasli
- Institute for Analytics and Data Science, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, United Kingdom
| |
Collapse
|
16
|
Pechenick EA, Danforth CM, Dodds PS. Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution. PLoS One 2015; 10:e0137041. [PMID: 26445406 PMCID: PMC4596490 DOI: 10.1371/journal.pone.0137041] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 07/02/2015] [Indexed: 12/03/2022] Open
Abstract
It is tempting to treat frequency trends from the Google Books data sets as indicators of the “true” popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We use information theoretic methods to highlight these dynamics by examining and comparing major contributions via a divergence measure of English data sets between decades in the period 1800–2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts. Overall, our findings call into question the vast majority of existing claims drawn from the Google Books corpus, and point to the need to fully characterize the dynamics of the corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.
Collapse
Affiliation(s)
- Eitan Adam Pechenick
- Department of Mathematics and Statistics, University of Vermont, Burlington, Vermont, United States of America
- Center for Complex Systems, University of Vermont, Burlington, Vermont, United States of America
- Computational Story Lab, University of Vermont, Burlington, Vermont, United States of America
- Vermont Advanced Computing Core, University of Vermont, Burlington, Vermont, United States of America
- * E-mail: (EAP); (PSD)
| | - Christopher M. Danforth
- Department of Mathematics and Statistics, University of Vermont, Burlington, Vermont, United States of America
- Center for Complex Systems, University of Vermont, Burlington, Vermont, United States of America
- Computational Story Lab, University of Vermont, Burlington, Vermont, United States of America
- Vermont Advanced Computing Core, University of Vermont, Burlington, Vermont, United States of America
| | - Peter Sheridan Dodds
- Department of Mathematics and Statistics, University of Vermont, Burlington, Vermont, United States of America
- Center for Complex Systems, University of Vermont, Burlington, Vermont, United States of America
- Computational Story Lab, University of Vermont, Burlington, Vermont, United States of America
- Vermont Advanced Computing Core, University of Vermont, Burlington, Vermont, United States of America
- * E-mail: (EAP); (PSD)
| |
Collapse
|