1
|
Webb F, Stimpson D, Purcell M, López E. Organizational Labor Flow Networks and Career Forecasting. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25050784. [PMID: 37238540 DOI: 10.3390/e25050784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/21/2023] [Accepted: 05/01/2023] [Indexed: 05/28/2023]
Abstract
The movement of employees within an organization is a research area of great relevance in a variety of fields such as economics, management science, and operations research, among others. In econophysics, however, only a few initial incursions have been made into this problem. In this paper, based on an approach inspired by the concept of labor flow networks which capture the movement of workers among firms of entire national economies, we construct empirically calibrated high-resolution networks of internal labor markets with nodes and links defined on the basis of different descriptions of job positions, such as operating units or occupational codes. The model is constructed and tested for a dataset from a large U.S. government organization. Using two versions of Markov processes, one without and another with limited memory, we show that our network descriptions of internal labor markets have strong predictive power. Among the most relevant findings, we observe that the organizational labor flow networks created by our method based on operational units possess a power law feature consistent with the distribution of firm sizes in an economy. This signals the surprising and important result that this regularity is pervasive across the landscape of economic entities. We expect our work to provide a novel approach to study careers and help connect the different disciplines that currently study them.
Collapse
Affiliation(s)
- Frank Webb
- Department of Computational and Data Sciences, George Mason University, Fairfax, VA 22030, USA
| | - Daniel Stimpson
- United States Army Acquisition Support Center (USAASC), 9900 Belvoir Road, Fort Belvoir, VA 22060, USA
| | - Miesha Purcell
- United States Army Acquisition Support Center (USAASC), 9900 Belvoir Road, Fort Belvoir, VA 22060, USA
| | - Eduardo López
- Department of Computational and Data Sciences, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
2
|
Barron ATJ, Bollen J. Quantifying collective identity online from self-defining hashtags. Sci Rep 2022; 12:15044. [PMID: 36057691 PMCID: PMC9440909 DOI: 10.1038/s41598-022-19181-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 08/25/2022] [Indexed: 11/09/2022] Open
Abstract
Mass communication over social media can drive rapid changes in our sense of collective identity. Hashtags in particular have acted as powerful social coordinators, playing a key role in organizing social movements like the Gezi park protests, Occupy Wall Street, #metoo, and #blacklivesmatter. Here we quantify collective identity from the use of hashtags as self-labels in over 85,000 actively-maintained Twitter user profiles spanning 2017-2019. Collective identities emerge from a graph model of individuals' overlapping self-labels, producing a hierarchy of graph clusters. Each cluster is bound together and characterized semantically by specific hashtags key to its formation. We define and apply two information-theoretic measures to quantify the strength of identities in the hierarchy. First we measure collective identity coherence to determine how integrated any identity is from local to global scales. Second, we consider the conspicuousness of any identity given its vocabulary versus the global identity map. Our work reveals a rich landscape of online identity emerging from the hierarchical alignment of uncoordinated self-labeling actions.
Collapse
Affiliation(s)
- Alexander T J Barron
- Luddy School of Informatics, Computing, & Engineering, Indiana University-Bloomington, Bloomington, USA.
| | - Johan Bollen
- Luddy School of Informatics, Computing, & Engineering, Indiana University-Bloomington, Bloomington, USA.,Cognitive Science Program, Indiana University-Bloomington, Bloomington, USA
| |
Collapse
|
3
|
Seal S, Vu T, Ghosh T, Wrobel J, Ghosh D. DenVar: density-based variation analysis of multiplex imaging data. BIOINFORMATICS ADVANCES 2022; 2:vbac039. [PMID: 36699398 PMCID: PMC9710661 DOI: 10.1093/bioadv/vbac039] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/17/2022] [Accepted: 05/18/2022] [Indexed: 02/01/2023]
Abstract
Summary Multiplex imaging platforms have become popular for studying complex single-cell biology in the tumor microenvironment (TME) of cancer subjects. Studying the intensity of the proteins that regulate important cell-functions becomes extremely crucial for subject-specific assessment of risks. The conventional approach requires selection of two thresholds, one to define the cells of the TME as positive or negative for a particular protein, and the other to classify the subjects based on the proportion of the positive cells. We present a threshold-free approach in which distance between a pair of subjects is computed based on the probability density of the protein in their TMEs. The distance matrix can either be used to classify the subjects into meaningful groups or can directly be used in a kernel machine regression framework for testing association with clinical outcomes. The method gets rid of the subjectivity bias of the thresholding-based approach, enabling easier but interpretable analysis. We analyze a lung cancer dataset, finding the difference in the density of protein HLA-DR to be significantly associated with the overall survival and a triple-negative breast cancer dataset, analyzing the effects of multiple proteins on survival and recurrence. The reliability of our method is demonstrated through extensive simulation studies. Availability and implementation The associated R package can be found here, https://github.com/sealx017/DenVar. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Souvik Seal
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA,To whom correspondence should be addressed.
| | - Thao Vu
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA
| | - Tusharkanti Ghosh
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA
| | - Julia Wrobel
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
4
|
Holur P, Shahsavari S, Ebrahimzadeh E, Tangherlini TR, Roychowdhury V. Modelling social readers: novel tools for addressing reception from online book reviews. ROYAL SOCIETY OPEN SCIENCE 2021; 8:210797. [PMID: 34950484 PMCID: PMC8692958 DOI: 10.1098/rsos.210797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Social reading sites offer an opportunity to capture a segment of readers' responses to literature, while data-driven analysis of these responses can provide new critical insight into how people 'read'. Posts discussing an individual book on the social reading site, Goodreads, are referred to as 'reviews', and consist of summaries, opinions, quotes or some mixture of these. Computationally modelling these reviews allows one to discover the non-professional discussion space about a work, including an aggregated summary of the work's plot, an implicit sequencing of various subplots and readers' impressions of main characters. We develop a pipeline of interlocking computational tools to extract a representation of this reader-generated shared narrative model. Using a corpus of reviews of five popular novels, we discover readers' distillation of the novels' main storylines and their sequencing, as well as the readers' varying impressions of characters in the novel. In so doing, we make three important contributions to the study of infinite-vocabulary networks: (i) an automatically derived narrative network that includes meta-actants; (ii) a sequencing algorithm, REV2SEQ, that generates a consensus sequence of events based on partial trajectories aggregated from reviews, and (iii) an 'impressions' algorithm, SENT2IMP, that provides multi-modal insight into readers' opinions of characters.
Collapse
Affiliation(s)
- Pavan Holur
- Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, CA, USA
| | - Shadi Shahsavari
- Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, CA, USA
| | - Ehsan Ebrahimzadeh
- Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, CA, USA
| | | | - Vwani Roychowdhury
- Department of Electrical and Computer Engineering, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
5
|
Jing Y, Widmer P, Bickel B. Word Order Variation is Partially Constrained by Syntactic Complexity. Cogn Sci 2021; 45:e13056. [PMID: 34758151 PMCID: PMC9287024 DOI: 10.1111/cogs.13056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 07/21/2021] [Accepted: 09/14/2021] [Indexed: 12/02/2022]
Abstract
Previous work suggests that when speakers linearize syntactic structures, they place longer and more complex dependents further away from the head word to which they belong than shorter and simpler dependents, and that they do so with increasing rigidity the longer expressions get, for example, longer objects tend to be placed further away from their verb, and with less variation. Current theories of sentence processing furthermore make competing predictions on whether longer expressions are preferentially placed as early or as late as possible. Here we test these predictions using hierarchical distributional regression models that allow estimates of word order and word order variation at the level of individual dependencies in corpora from 71 languages, while controlling for confounding effects from the type of dependency (e.g., subject vs. object), and the type of clause (main vs. subordinate) involved as well as from trends that are characteristic of individual languages, language families, and language contact areas. Our results show the expected correlations of length with position and variation only for two out of six dependency types (obliques and nominal modifiers) and no difference between clause types. These findings challenge received theories of across‐the‐board effects of complexity on word order and word order variation and call for theoretical models that relativize effects to specific kinds of syntactic structures and dependencies.
Collapse
Affiliation(s)
- Yingqi Jing
- Department of Comparative Language Science, University of Zurich.,Center for the Interdisciplinary Study of Language Evolution, University of Zurich.,Department of Linguistics and Philology, Uppsala University
| | - Paul Widmer
- Department of Comparative Language Science, University of Zurich.,Center for the Interdisciplinary Study of Language Evolution, University of Zurich
| | - Balthasar Bickel
- Department of Comparative Language Science, University of Zurich.,Center for the Interdisciplinary Study of Language Evolution, University of Zurich
| |
Collapse
|
6
|
Camargo CQ, John P, Margetts HZ, Hale SA. Measuring the Volatility of the Political agenda in Public Opinion and News Media. PUBLIC OPINION QUARTERLY 2021; 85:493-516. [PMID: 34690575 PMCID: PMC8530552 DOI: 10.1093/poq/nfab032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent election surprises, regime changes, and political shocks indicate that political agendas have become more fast-moving and volatile. The ability to measure the complex dynamics of agenda change and capture the nature and extent of volatility in political systems is therefore more crucial than ever before. This study proposes a definition and operationalization of volatility that combines insights from political science, communications, information theory, and computational techniques. The proposed measures of fractionalization and agenda change encompass the shifting salience of issues in the agenda as a whole and allow the study of agendas across different domains. We evaluate these metrics and compare them to other measures such as issue-level survival rates and the Pedersen Index, which uses public-opinion poll data to measure public agendas, as well as traditional media content to measure media agendas in the UK and Germany. We show how these measures complement existing approaches and could be employed in future agenda-setting research.
Collapse
Affiliation(s)
- Chico Q Camargo
- Address correspondence to Chico Q. Camargo, College of Engineering, Mathematics and Physical Sciences, Harrison Building, Streatham Campus, University of Exeter, Exeter EX4 4QF, UK;
| | | | | | | |
Collapse
|
7
|
Hobson EA, Mønster D, DeDeo S. Aggression heuristics underlie animal dominance hierarchies and provide evidence of group-level social information. Proc Natl Acad Sci U S A 2021; 118:e2022912118. [PMID: 33658380 PMCID: PMC7958391 DOI: 10.1073/pnas.2022912118] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Members of a social species need to make appropriate decisions about who, how, and when to interact with others in their group. However, it has been difficult for researchers to detect the inputs to these decisions and, in particular, how much information individuals actually have about their social context. We present a method that can serve as a social assay to quantify how patterns of aggression depend upon information about the ranks of individuals within social dominance hierarchies. Applied to existing data on aggression in 172 social groups across 85 species in 23 orders, it reveals three main patterns of rank-dependent social dominance: the downward heuristic (aggress uniformly against lower-ranked opponents), close competitors (aggress against opponents ranked slightly below self), and bullying (aggress against opponents ranked much lower than self). The majority of the groups (133 groups, 77%) follow a downward heuristic, but a significant minority (38 groups, 22%) show more complex social dominance patterns (close competitors or bullying) consistent with higher levels of social information use. These patterns are not phylogenetically constrained and different groups within the same species can use different patterns, suggesting that heuristic use may depend on context and the structuring of aggression by social information should not be considered a fixed characteristic of a species. Our approach provides opportunities to study the use of social information within and across species and the evolution of social complexity and cognition.
Collapse
Affiliation(s)
- Elizabeth A Hobson
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221;
- Santa Fe Institute, Santa Fe, NM 87501
| | - Dan Mønster
- Interacting Minds Centre, Aarhus University, 8000 Aarhus C, Denmark
- School of Business and Social Sciences, Aarhus University, 8210 Aarhus V, Denmark
- Cognition and Behavior Lab, Aarhus University, 8210 Aarhus V, Denmark
| | - Simon DeDeo
- Santa Fe Institute, Santa Fe, NM 87501
- Department of Social and Decision Sciences, Dietrich College of Humanities and Social Sciences, Carnegie Mellon University, Pittsburgh, PA 15213
| |
Collapse
|
8
|
Pele DT, Lazar E, Mazurencu-Marinescu-Pele M. Modeling Expected Shortfall Using Tail Entropy. ENTROPY 2019; 21:1204. [PMCID: PMC7514549 DOI: 10.3390/e21121204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 12/05/2019] [Indexed: 06/16/2023]
Abstract
Given the recent replacement of value-at-risk as the regulatory standard measure of risk with expected shortfall (ES) undertaken by the Basel Committee on Banking Supervision, it is imperative that ES gives correct estimates for the value of expected levels of losses in crisis situations. However, the measurement of ES is affected by a lack of observations in the tail of the distribution. While kernel-based smoothing techniques can be used to partially circumvent this problem, in this paper we propose a simple nonparametric tail measure of risk based on information entropy and compare its backtesting performance with that of other standard ES models.
Collapse
Affiliation(s)
- Daniel Traian Pele
- Department of Statistics and Econometrics, Faculty of Cybernetics, Statistics and Economic Informatics, The Bucharest University of Economic Studies, Piata Romana, nr.6, Sector 1, 010371 Bucharest, Romania;
| | - Emese Lazar
- Henley Business School, University of Reading, ICMA Centre, Whiteknights, Reading RG6 6BA, UK;
| | - Miruna Mazurencu-Marinescu-Pele
- Department of Statistics and Econometrics, Faculty of Cybernetics, Statistics and Economic Informatics, The Bucharest University of Economic Studies, Piata Romana, nr.6, Sector 1, 010371 Bucharest, Romania;
| |
Collapse
|
9
|
Abstract
Modelling causal relationships has become popular across various disciplines. Most common frameworks for causality are the Pearlian causal directed acyclic graphs (DAGs) and the Neyman-Rubin potential outcome framework. In this paper, we propose an information theoretic framework for causal effect quantification. To this end, we formulate a two step causal deduction procedure in the Pearl and Rubin frameworks and introduce its equivalent which uses information theoretic terms only. The first step of the procedure consists of ensuring no confounding or finding an adjustment set with directed information. In the second step, the causal effect is quantified. We subsequently unify previous definitions of directed information present in the literature and clarify the confusion surrounding them. We also motivate using chain graphs for directed information in time series and extend our approach to chain graphs. The proposed approach serves as a translation between causality modelling and information theory.
Collapse
|
10
|
Nielsen F. On the Jensen-Shannon Symmetrization of Distances Relying on Abstract Means. ENTROPY 2019; 21:e21050485. [PMID: 33267199 PMCID: PMC7514974 DOI: 10.3390/e21050485] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 05/08/2019] [Accepted: 05/09/2019] [Indexed: 11/16/2022]
Abstract
The Jensen-Shannon divergence is a renowned bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the average mixture distribution. However, the Jensen-Shannon divergence between Gaussian distributions is not available in closed form. To bypass this problem, we present a generalization of the Jensen-Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions. More generally, we define the JS-symmetrizations of any distance using parameter mixtures derived from abstract means. In particular, we first show that the geometric mean is well-suited for exponential families, and report two closed-form formula for (i) the geometric Jensen-Shannon divergence between probability densities of the same exponential family; and (ii) the geometric JS-symmetrization of the reverse Kullback-Leibler divergence between probability densities of the same exponential family. As a second illustrating example, we show that the harmonic mean is well-suited for the scale Cauchy distributions, and report a closed-form formula for the harmonic Jensen-Shannon divergence between scale Cauchy distributions. Applications to clustering with respect to these novel Jensen-Shannon divergences are touched upon.
Collapse
Affiliation(s)
- Frank Nielsen
- Sony Computer Science Laboratories, Takanawa Muse Bldg., 3-14-13, Higashigotanda, Shinagawa-ku, Tokyo 141-0022, Japan
| |
Collapse
|
11
|
Studying Lexical Dynamics and Language Change via Generalized Entropies: The Problem of Sample Size. ENTROPY 2019; 21:e21050464. [PMID: 33267178 PMCID: PMC7514953 DOI: 10.3390/e21050464] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 04/24/2019] [Accepted: 04/30/2019] [Indexed: 12/03/2022]
Abstract
Recently, it was demonstrated that generalized entropies of order α offer novel and important opportunities to quantify the similarity of symbol sequences where α is a free parameter. Varying this parameter makes it possible to magnify differences between different texts at specific scales of the corresponding word frequency spectrum. For the analysis of the statistical properties of natural languages, this is especially interesting, because textual data are characterized by Zipf’s law, i.e., there are very few word types that occur very often (e.g., function words expressing grammatical relationships) and many word types with a very low frequency (e.g., content words carrying most of the meaning of a sentence). Here, this approach is systematically and empirically studied by analyzing the lexical dynamics of the German weekly news magazine Der Spiegel (consisting of approximately 365,000 articles and 237,000,000 words that were published between 1947 and 2017). We show that, analogous to most other measures in quantitative linguistics, similarity measures based on generalized entropies depend heavily on the sample size (i.e., text length). We argue that this makes it difficult to quantify lexical dynamics and language change and show that standard sampling approaches do not solve this problem. We discuss the consequences of the results for the statistical analysis of languages.
Collapse
|
12
|
Tasnim H, Fricke GM, Byrum JR, Sotiris JO, Cannon JL, Moses ME. Quantitative Measurement of Naïve T Cell Association With Dendritic Cells, FRCs, and Blood Vessels in Lymph Nodes. Front Immunol 2018; 9:1571. [PMID: 30093900 PMCID: PMC6070610 DOI: 10.3389/fimmu.2018.01571] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 06/25/2018] [Indexed: 12/25/2022] Open
Abstract
T cells play a vital role in eliminating pathogenic infections. To activate, naïve T cells search lymph nodes (LNs) for dendritic cells (DCs). Positioning and movement of T cells in LNs is influenced by chemokines including CCL21 as well as multiple cell types and structures in the LNs. Previous studies have suggested that T cell positioning facilitates DC colocalization leading to T:DC interaction. Despite the influence chemical signals, cells, and structures can have on naïve T cell positioning, relatively few studies have used quantitative measures to directly compare T cell interactions with key cell types. Here, we use Pearson correlation coefficient (PCC) and normalized mutual information (NMI) to quantify the extent to which naïve T cells spatially associate with DCs, fibroblastic reticular cells (FRCs), and blood vessels in LNs. We measure spatial associations in physiologically relevant regions. We find that T cells are more spatially associated with FRCs than with their ultimate targets, DCs. We also investigated the role of a key motility chemokine receptor, CCR7, on T cell colocalization with DCs. We find that CCR7 deficiency does not decrease naïve T cell association with DCs, in fact, CCR7-/- T cells show slightly higher DC association compared with wild type T cells. By revealing these associations, we gain insights into factors that drive T cell localization, potentially affecting the timing of productive T:DC interactions and T cell activation.
Collapse
Affiliation(s)
- Humayra Tasnim
- Moses Biological Computation Laboratory, Department of Computer Science, The University of New Mexico, Albuquerque, NM, United States
| | - G. Matthew Fricke
- Moses Biological Computation Laboratory, Department of Computer Science, The University of New Mexico, Albuquerque, NM, United States
- UNM Center for Advanced Research Computing (CARC), The University of New Mexico, Albuquerque, NM, United States
| | - Janie R. Byrum
- The Cannon Laboratory, Molecular Genetics & Microbiology, The University of New Mexico, Albuquerque, NM, United States
| | - Justyna O. Sotiris
- Moses Biological Computation Laboratory, Department of Computer Science, The University of New Mexico, Albuquerque, NM, United States
| | - Judy L. Cannon
- The Cannon Laboratory, Molecular Genetics & Microbiology, The University of New Mexico, Albuquerque, NM, United States
- Department of Pathology, The University of New Mexico, Albuquerque, NM, United States
- Autophagy, Inflammation, and Metabolism Center of Biomedical Research Excellence, The University of New Mexico, Albuquerque, NM, United States
| | - Melanie E. Moses
- Moses Biological Computation Laboratory, Department of Computer Science, The University of New Mexico, Albuquerque, NM, United States
- Biology Department, The University of New Mexico, Albuquerque, NM, United States
- Santa Fe Institute, Santa Fe, NM, United States
| |
Collapse
|
13
|
Liu H, Zhang X, Zhang X, Cui Y. Self-adapted mixture distance measure for clustering uncertain data. Knowl Based Syst 2017. [DOI: 10.1016/j.knosys.2017.04.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
14
|
Conflict and Computation on Wikipedia: A Finite-State Machine Analysis of Editor Interactions. FUTURE INTERNET 2016. [DOI: 10.3390/fi8030031] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
15
|
|
16
|
Gao S, Ver Steeg G, Galstyan A. Understanding Confounding Effects in Linguistic Coordination: An Information-Theoretic Approach. PLoS One 2015; 10:e0130167. [PMID: 26115446 PMCID: PMC4483141 DOI: 10.1371/journal.pone.0130167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 05/18/2015] [Indexed: 11/18/2022] Open
Abstract
We suggest an information-theoretic approach for measuring stylistic coordination in dialogues. The proposed measure has a simple predictive interpretation and can account for various confounding factors through proper conditioning. We revisit some of the previous studies that reported strong signatures of stylistic accommodation, and find that a significant part of the observed coordination can be attributed to a simple confounding effect--length coordination. Specifically, longer utterances tend to be followed by longer responses, which gives rise to spurious correlations in the other stylistic features. We propose a test to distinguish correlations in length due to contextual factors (topic of conversation, user verbosity, etc.) and turn-by-turn coordination. We also suggest a test to identify whether stylistic coordination persists even after accounting for length coordination and contextual factors.
Collapse
Affiliation(s)
- Shuyang Gao
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States of America
- * E-mail:
| | - Greg Ver Steeg
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States of America
| | - Aram Galstyan
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States of America
| |
Collapse
|
17
|
Abstract
The jury trial is a critical point where the state and its citizens come together to define the limits of acceptable behavior. Here we present a large-scale quantitative analysis of trial transcripts from the Old Bailey that reveal a major transition in the nature of this defining moment. By coarse-graining the spoken word testimony into synonym sets and dividing the trials based on indictment, we demonstrate the emergence of semantically distinct violent and nonviolent trial genres. We show that although in the late 18th century the semantic content of trials for violent offenses is functionally indistinguishable from that for nonviolent ones, a long-term, secular trend drives the system toward increasingly clear distinctions between violent and nonviolent acts. We separate this process into the shifting patterns that drive it, determine the relative effects of bureaucratic change and broader cultural shifts, and identify the synonym sets most responsible for the eventual genre distinguishability. This work provides a new window onto the cultural and institutional changes that accompany the monopolization of violence by the state, described in qualitative historical analysis as the civilizing process.
Collapse
|