1
|
Pi H, Burghardt K, Percus AG, Lerman K. Clique densification in networks. Phys Rev E 2023; 107:L042301. [PMID: 37198821 DOI: 10.1103/physreve.107.l042301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/05/2023] [Indexed: 05/19/2023]
Abstract
Real-world networks are rarely static. Recently, there has been increasing interest in both network growth and network densification, in which the number of edges scales superlinearly with the number of nodes. Less studied but equally important, however, are scaling laws of higher-order cliques, which can drive clustering and network redundancy. In this paper, we study how cliques grow with network size, by analyzing several empirical networks from emails to Wikipedia interactions. Our results show superlinear scaling laws whose exponents increase with clique size, in contrast to predictions from a previous model. We then show that these results are in qualitative agreement with a model that we propose, the local preferential attachment model, where an incoming node links not only to a target node, but also to its higher-degree neighbors. Our results provide insights into how networks grow and where network redundancy occurs.
Collapse
Affiliation(s)
- Haochen Pi
- Department of Computer Science, University of Southern California, Los Angeles, California 90007, USA
| | - Keith Burghardt
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
| | - Allon G Percus
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
- Institute of Mathematical Sciences, Claremont Graduate University, Claremont, California 91711, USA
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
| |
Collapse
|
2
|
Shemirani R, Belbin GM, Burghardt K, Lerman K, Avery CL, Kenny EE, Gignoux CR, Ambite JL. Selecting Clustering Algorithms for Identity-By-Descent Mapping. Pac Symp Biocomput 2023; 28:121-132. [PMID: 36540970 PMCID: PMC9782725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare the statistical power of clustering algorithms via simulating 2.3 million clusters across 850 experiments. We found Infomap and Markov Clustering (MCL) community detection methods to have high statistical power in most of the scenarios. They yield a 30% increase in power compared to the current state-of-art approach, with a 3 orders of magnitude lower runtime. We also found that standard clustering metrics, such as modularity, cannot predict statistical power of algorithms in IBD mapping applications. We extend our findings to real datasets by analyzing the Population Architecture using Genomics and Epidemiology (PAGE) Study dataset with 51,000 samples and 2 million shared segments on Chromosome 1, resulting in the extraction of 39 million local IBD clusters. We demonstrate the power of our approach by recovering signals of rare genetic variation in the Whole-Exome Sequence data of 200,000 individuals in the UK Biobank. We provide an efficient implementation to enable clustering at scale for IBD mapping for various populations and scenarios.Supplementary Information: The code, along with supplementary methods and figures are available at https://github.com/roohy/localIBDClustering.
Collapse
Affiliation(s)
- Ruhollah Shemirani
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA,
| | | | | | | | | | | | | | | |
Collapse
|
3
|
He Y, Burghardt KA, Lerman K. Leveraging change point detection to discover natural experiments in data. EPJ Data Sci 2022; 11:49. [PMID: 36090462 PMCID: PMC9440658 DOI: 10.1140/epjds/s13688-022-00361-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 08/09/2022] [Indexed: 06/15/2023]
Abstract
Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as before or after a candidate change point and trains a classifier to predict these labels. The accuracy of this classifier varies for different candidate change points. By modeling the accuracy change we can infer the true change point and fraction of data affected by the change (a proxy for detection confidence). We demonstrate how our framework can achieve low bias over a wide range of conditions and detect changes in high dimensional, noisy data more accurately than alternative methods. We use the framework to identify changes in real-world data and measure their effects using regression discontinuity designs, thereby uncovering potential natural experiments, such as the effect of pandemic lockdowns on air pollution and the effect of policy changes on performance and persistence in a learning platform. Our method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in data.
Collapse
Affiliation(s)
- Yuzi He
- Information Sciences Institute, University of Southern California, Marina del Rey, CA USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA USA
| | - Keith A. Burghardt
- Information Sciences Institute, University of Southern California, Marina del Rey, CA USA
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, CA USA
| |
Collapse
|
4
|
Yau JC, Girault B, Feng T, Mundnich K, Nadarajan A, Booth BM, Ferrara E, Lerman K, Hsieh E, Narayanan S. TILES-2019: A longitudinal physiologic and behavioral data set of medical residents in an intensive care unit. Sci Data 2022; 9:536. [PMID: 36050329 PMCID: PMC9436730 DOI: 10.1038/s41597-022-01636-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 08/16/2022] [Indexed: 11/09/2022] Open
Abstract
The TILES-2019 data set consists of behavioral and physiological data gathered from 57 medical residents (i.e., trainees) working in an intensive care unit (ICU) in the United States. The data set allows for the exploration of longitudinal changes in well-being, teamwork, and job performance in a demanding environment, as residents worked in the ICU for three weeks. Residents wore a Fitbit, a Bluetooth-based proximity sensor, and an audio-feature recorder. They completed daily surveys and interviews at the beginning and end of their rotation. In addition, we collected data from environmental sensors (i.e., Internet-of-Things Bluetooth data hubs) and obtained hospital records (e.g., patient census) and residents’ job evaluations. This data set may be may be of interest to researchers interested in workplace stress, group dynamics, social support, the physical and psychological effects of witnessing patient deaths, predicting survey data from sensors, and privacy-aware and privacy-preserving machine learning. Notably, a small subset of the data was collected during the first wave of the COVID-19 pandemic. Measurement(s) | Stress • Burnout • Affect • Depression • Sleep • Physical Activity Measurement • Alcohol Use History • Frequency Any Tobacco Use • Personality • Social Support • Intragroup Conflict • Challenge and Hindrance Stressors • Demographics • Context and Atypical Events • Daily Stressors • Most Stressful Event • Work Context • Job Performance • Job Satisfaction • Stressors at Work • Charting at Home • Coworker Trust • Social Networks at Work • Socialization Outside of Work • Use of Wellness Resources • Heart Rate • Step Count • Acoustic Features • Team Interactions • Proximity to Key Objects • Cell Phone Use • Hospital Contextual Data • Coping with Stress • Productivity at Work • Pride at Work • Teamwork • Support System | Technology Type(s) | Perceived Stress Scale - 14 Questionnaire • Survey • Patient Health Questionnaire - 9 Item • Pittsburgh Sleep Quality Index • FitBit • International Physical Activity Questionnaire (August 2002) Short Last 7 Days Self-Administered Format • Unihertz Atom Phone • Minew E8- TILES Interaction Sensors • Minew E8- Eddystone Beach • Rescuetime • Evaluations • Patient Census • Interview | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Location | Los Angeles County and University of Southern California Medical Center |
Collapse
Affiliation(s)
- Joanna C Yau
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA.
| | - Benjamin Girault
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Tiantian Feng
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Karel Mundnich
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Amrutha Nadarajan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Brandon M Booth
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Emilio Ferrara
- Information Sciences Institute (USC), Marina del Rey, CA, USA
| | - Kristina Lerman
- Information Sciences Institute (USC), Marina del Rey, CA, USA
| | - Eric Hsieh
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Shrikanth Narayanan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA.,Information Sciences Institute (USC), Marina del Rey, CA, USA
| |
Collapse
|
5
|
Abstract
The COVID-19 pandemic has posed unprecedented challenges to public health world-wide. To make decisions about mitigation strategies and to understand the disease dynamics, policy makers and epidemiologists must know how the disease is spreading in their communities. Here we analyse confirmed infections and deaths over multiple geographic scales to show that COVID-19's impact is highly unequal: many regions have nearly zero infections, while others are hot spots. We attribute the effect to a Reed-Hughes-like mechanism in which the disease arrives to regions at different times and grows exponentially at different rates. Faster growing regions correspond to hot spots that dominate spatially aggregated statistics, thereby skewing growth rates at larger spatial scales. Finally, we use these analyses to show that, across multiple spatial scales, the growth rate of COVID-19 has slowed down with each surge. These results demonstrate a trade-off when estimating growth rates: while spatial aggregation lowers noise, it can increase bias. Public policy and epidemic modelling should be aware of, and aim to address, this distortion. This article is part of the theme issue 'Data science approaches to infectious disease surveillance'.
Collapse
Affiliation(s)
- Keith Burghardt
- Information Sciences Institute, 4676 Admiralty Road, Marina del Rey, CA 90292, USA
| | - Siyi Guo
- Information Sciences Institute, 4676 Admiralty Road, Marina del Rey, CA 90292, USA
- Department of Computer Science, University of Southern California, 941 Bloom Walk, Los Angeles, CA 90089, USA
| | - Kristina Lerman
- Information Sciences Institute, 4676 Admiralty Road, Marina del Rey, CA 90292, USA
- Department of Computer Science, University of Southern California, 941 Bloom Walk, Los Angeles, CA 90089, USA
| |
Collapse
|
6
|
Rao A, Morstatter F, Hu M, Chen E, Burghardt K, Ferrara E, Lerman K. Political Partisanship and Antiscience Attitudes in Online Discussions About COVID-19: Twitter Content Analysis. J Med Internet Res 2021; 23:e26692. [PMID: 34014831 PMCID: PMC8204937 DOI: 10.2196/26692] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/01/2021] [Accepted: 04/14/2021] [Indexed: 12/02/2022] Open
Abstract
Background The novel coronavirus pandemic continues to ravage communities across the United States. Opinion surveys identified the importance of political ideology in shaping perceptions of the pandemic and compliance with preventive measures. Objective The aim of this study was to measure political partisanship and antiscience attitudes in the discussions about the pandemic on social media, as well as their geographic and temporal distributions. Methods We analyzed a large set of tweets from Twitter related to the pandemic, collected between January and May 2020, and developed methods to classify the ideological alignment of users along the moderacy (hardline vs moderate), political (liberal vs conservative), and science (antiscience vs proscience) dimensions. Results We found a significant correlation in polarized views along the science and political dimensions. Moreover, politically moderate users were more aligned with proscience views, while hardline users were more aligned with antiscience views. Contrary to expectations, we did not find that polarization grew over time; instead, we saw increasing activity by moderate proscience users. We also show that antiscience conservatives in the United States tended to tweet from the southern and northwestern states, while antiscience moderates tended to tweet from the western states. The proportion of antiscience conservatives was found to correlate with COVID-19 cases. Conclusions Our findings shed light on the multidimensional nature of polarization and the feasibility of tracking polarized opinions about the pandemic across time and space through social media data.
Collapse
Affiliation(s)
- Ashwin Rao
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| | - Fred Morstatter
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| | - Minda Hu
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| | - Emily Chen
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| | - Keith Burghardt
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| | - Emilio Ferrara
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| |
Collapse
|
7
|
Muric G, Lerman K, Ferrara E. Gender Disparity in the Authorship of Biomedical Research Publications During the COVID-19 Pandemic: Retrospective Observational Study. J Med Internet Res 2021; 23:e25379. [PMID: 33735097 PMCID: PMC8043146 DOI: 10.2196/25379] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/09/2020] [Accepted: 03/14/2021] [Indexed: 11/13/2022] Open
Abstract
Background Gender imbalances in academia have been evident historically and persist today. For the past 60 years, we have witnessed the increase of participation of women in biomedical disciplines, showing that the gender gap is shrinking. However, preliminary evidence suggests that women, including female researchers, are disproportionately affected by the COVID-19 pandemic in terms of unequal distribution of childcare, elderly care, and other kinds of domestic and emotional labor. Sudden lockdowns and abrupt shifts in daily routines have had disproportionate consequences on their productivity, which is reflected by a sudden drop in research output in biomedical research, consequently affecting the number of female authors of scientific publications. Objective The objective of this study is to test the hypothesis that the COVID-19 pandemic has had a disproportionate adverse effect on the productivity of female researchers in the biomedical field in terms of authorship of scientific publications. Methods This is a retrospective observational bibliometric study. We investigated the proportion of male and female researchers who published scientific papers during the COVID-19 pandemic, using bibliometric data from biomedical preprint servers and selected Springer-Nature journals. We used the ordinary least squares regression model to estimate the expected proportions over time by correcting for temporal trends. We also used a set of statistical methods, such as the Kolmogorov-Smirnov test and regression discontinuity design, to test the validity of the results. Results A total of 78,950 papers from the bioRxiv and medRxiv repositories and from 62 selected Springer-Nature journals by 346,354 unique authors were analyzed. The acquired data set consisted of papers that were published between January 1, 2019, and August 2, 2020. The proportion of female first authors publishing in the biomedical field during the pandemic dropped by 9.1%, on average, across disciplines (expected arithmetic mean yest=0.39; observed arithmetic mean y=0.35; standard error of the estimate, Sest=0.007; standard error of the observation, σx=0.004). The impact was particularly pronounced for papers related to COVID-19 research, where the proportion of female scientists in the first author position dropped by 28% (yest=0.39; y=0.28; Sest=0.007; σx=0.007). When looking at the last authors, the proportion of women dropped by 7.9%, on average (yest=0.25; y=0.23; Sest=0.005; σx=0.003), while the proportion of women writing about COVID-19 as the last author decreased by 18.8% (yest=0.25; y=0.21; Sest=0.005; σx=0.007). Further, by geocoding authors’ affiliations, we showed that the gender disparities became even more apparent when disaggregated by country, up to 35% in some cases. Conclusions Our findings document a decrease in the number of publications by female authors in the biomedical field during the global pandemic. This effect was particularly pronounced for papers related to COVID-19, indicating that women are producing fewer publications related to COVID-19 research. This sudden increase in the gender gap was persistent across the 10 countries with the highest number of researchers. These results should be used to inform the scientific community of this worrying trend in COVID-19 research and the disproportionate effect that the pandemic has had on female academics.
Collapse
Affiliation(s)
- Goran Muric
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States.,Department of Computer Science, University of Southern California, Los Angeles, CA, United States
| | - Emilio Ferrara
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States.,Department of Computer Science, University of Southern California, Los Angeles, CA, United States.,Annenberg School for Communication and Journalism, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
8
|
Ambite JL, Fierro L, Gordon J, Burns GA, Geigl F, Lerman K, Van Horn JD. BD2K Training Coordinating Center's ERuDIte: the Educational Resource Discovery Index for Data Science. IEEE Trans Emerg Top Comput 2021; 9:316-328. [PMID: 35548703 PMCID: PMC9089329 DOI: 10.1109/tetc.2019.2903466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Data science is a field that has developed to enable efficient integration and analysis of increasingly large data sets in many domains. In particular, big data in genetics, neuroimaging, mobile health, and other subfields of biomedical science, promises new insights, but also poses challenges. To address these challenges, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, including a Training Coordinating Center (TCC) tasked with developing a resource for personalized data science training for biomedical researchers. The BD2K TCC web portal is powered by ERuDIte, the Educational Resource Discovery Index, which collects training resources for data science, including online courses, videos of tutorials and research talks, textbooks, and other web-based materials. While the availability of so many potential learning resources is exciting, they are highly heterogeneous in quality, difficulty, format, and topic, making the field intimidating to enter and difficult to navigate. Moreover, data science is rapidly evolving, so there is a constant influx of new materials and concepts. We leverage data science techniques to build ERuDIte itself, using data extraction, data integration, machine learning, information retrieval, and natural language processing to automatically collect, integrate, describe, and organize existing online resources for learning data science.
Collapse
Affiliation(s)
- José Luis Ambite
- University of Southern California's Information Sciences Institute (ISI), Marina del Rey, CA 90292
| | - Lily Fierro
- University of Southern California's Information Sciences Institute (ISI), Marina del Rey, CA 90292
| | - Jonathan Gordon
- University of Southern California's Information Sciences Institute (ISI), Marina del Rey, CA 90292
| | - Gully A Burns
- University of Southern California's Information Sciences Institute (ISI), Marina del Rey, CA 90292
| | | | - Kristina Lerman
- University of Southern California's Information Sciences Institute (ISI), Marina del Rey, CA 90292
| | - John D Van Horn
- University of SouthernCalifornia's Stevens Neuroimaging and Informatics Institute, Los Angeles, CA 90033
| |
Collapse
|
9
|
Abstract
Applications from finance to epidemiology and cyber-security require accurate forecasts of dynamic phenomena, which are often only partially observed. We demonstrate that a system's predictability degrades as a function of temporal sampling, regardless of the adopted forecasting model. We quantify the loss of predictability due to sampling, and show that it cannot be recovered by using external signals. We validate the generality of our theoretical findings in real-world partially observed systems representing infectious disease outbreaks, online discussions, and software development projects. On a variety of prediction tasks-forecasting new infections, the popularity of topics in online discussions, or interest in cryptocurrency projects-predictability irrecoverably decays as a function of sampling, unveiling predictability limits in partially observed systems.
Collapse
Affiliation(s)
- Andrés Abeliuk
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, 90292, USA
- Department of Computer Science, University of Chile, Santiago, Chile
| | - Zhishen Huang
- University of Colorado Boulder, Boulder, CO, 80302, USA
| | - Emilio Ferrara
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, 90292, USA.
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, 90292, USA.
| |
Collapse
|
10
|
Affiliation(s)
- Sandeep Soni
- School of Interactive Computing Georgia Institute of Technology Atlanta Georgia USA
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California Los Angeles California USA
| | | |
Collapse
|
11
|
Mundnich K, Booth BM, L'Hommedieu M, Feng T, Girault B, L'Hommedieu J, Wildman M, Skaaden S, Nadarajan A, Villatte JL, Falk TH, Lerman K, Ferrara E, Narayanan S. TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers. Sci Data 2020; 7:354. [PMID: 33067468 PMCID: PMC7567859 DOI: 10.1038/s41597-020-00655-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 08/27/2020] [Indexed: 01/07/2023] Open
Abstract
We present a novel longitudinal multimodal corpus of physiological and behavioral data collected from direct clinical providers in a hospital workplace. We designed the study to investigate the use of off-the-shelf wearable and environmental sensors to understand individual-specific constructs such as job performance, interpersonal interaction, and well-being of hospital workers over time in their natural day-to-day job settings. We collected behavioral and physiological data from n = 212 participants through Internet-of-Things Bluetooth data hubs, wearable sensors (including a wristband, a biometrics-tracking garment, a smartphone, and an audio-feature recorder), together with a battery of surveys to assess personality traits, behavioral states, job performance, and well-being over time. Besides the default use of the data set, we envision several novel research opportunities and potential applications, including multi-modal and multi-task behavioral modeling, authentication through biometrics, and privacy-aware and privacy-preserving machine learning. Measurement(s) | Overall Sleep Quality Rating • Step Unit of Distance • Speech • Mean Heart Rate • Proximity • Electrocardiogram Sequence • heart rate variability measurement • Respiratory Rate • physical activity measurement • light • door motion • Changes in Ambient Temperature in Medical Device Environment • humidity • Overall Emotional Well-Being • Stress • psychological flexibility • work-related acceptance • work engagement • psychological capital • intelligence • job performance • organizational citizenship behavior • counter-productive work behavior • personality trait measurement • Negative affectivity • positive affectivity • anxiety-related behavior trait • Alcohol Use History • Overall Health Rating During Past Week | Technology Type(s) | photoplethysmography • Accelerometer • Microphone Device • Bluetooth-enabled Activity Monitor • electrocardiogram • Sensor Device • Photodetector Device • Temperature Sensor Device • questionnaire • Multidimensional Psychological Flexibility Inventory (MPFI) • Utrecht work engagement scale • survey method • individual task proficiency • Search Results Web results Organizational Citizenship Behavior Checklist • big five inventory • Positive and Negative Affect Schedule (PANAS-X) • State-Trait Anxiety Inventory | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Environment | hospital |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12465101
Collapse
Affiliation(s)
- Karel Mundnich
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA.
| | - Brandon M Booth
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Michelle L'Hommedieu
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Tiantian Feng
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Benjamin Girault
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Justin L'Hommedieu
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | | | - Sophia Skaaden
- Information Sciences Institute (USC), Marina del Rey, CA, USA
| | - Amrutha Nadarajan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA
| | - Jennifer L Villatte
- Department of Psychiatry and Behavioral Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tiago H Falk
- INRS-EMT, University of Quebec, Montreal, QC, Canada
| | - Kristina Lerman
- Information Sciences Institute (USC), Marina del Rey, CA, USA
| | - Emilio Ferrara
- Information Sciences Institute (USC), Marina del Rey, CA, USA
| | - Shrikanth Narayanan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA.,Information Sciences Institute (USC), Marina del Rey, CA, USA
| |
Collapse
|
12
|
Jiang J, Chen E, Lerman K, Ferrara E. Political Polarization Drives Online Conversations About COVID-19 in the United States. Hum Behav Emerg Technol 2020; 2:200-211. [PMID: 32838229 PMCID: PMC7323338 DOI: 10.1002/hbe2.202] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 05/28/2020] [Accepted: 05/28/2020] [Indexed: 01/25/2023]
Abstract
Since the outbreak in China in late 2019, the novel coronavirus (COVID-19) has spread around the world and has come to dominate online conversations. By linking 2.3 million Twitter users to locations within the United States, we study in aggregate how political characteristics of the locations affect the evolution of online discussions about COVID-19. We show that COVID-19 chatter in the US is largely shaped by political polarization. Partisanship correlates with sentiment toward government measures and the tendency to share health and prevention messaging. Cross-ideological interactions are modulated by user segregation and polarized network structure. We also observe a correlation between user engagement with topics related to public health and the varying impact of the disease outbreak in different US states. These findings may help inform policies both online and offline. Decision-makers may calibrate their use of online platforms to measure the effectiveness of public health campaigns, and to monitor the reception of national and state-level policies, by tracking in real-time discussions in a highly polarized social media ecosystem.
Collapse
Affiliation(s)
- Julie Jiang
- USC Information Sciences Institute University of Southern California CA United States.,Department of Computer Science University of Southern California Los Angeles CA United States
| | - Emily Chen
- USC Information Sciences Institute University of Southern California CA United States.,Department of Computer Science University of Southern California Los Angeles CA United States
| | - Kristina Lerman
- USC Information Sciences Institute University of Southern California CA United States.,Department of Computer Science University of Southern California Los Angeles CA United States
| | - Emilio Ferrara
- USC Information Sciences Institute University of Southern California CA United States.,Department of Computer Science University of Southern California Los Angeles CA United States.,Annenberg School of Communication University of Southern California Los Angeles CA United States
| |
Collapse
|
13
|
Ngo SC, Percus AG, Burghardt K, Lerman K. The transsortative structure of networks. Proc Math Phys Eng Sci 2020; 476:20190772. [PMID: 32523411 DOI: 10.1098/rspa.2019.0772] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 04/02/2020] [Indexed: 11/12/2022] Open
Abstract
Network topologies can be highly non-trivial, due to the complex underlying behaviours that form them. While past research has shown that some processes on networks may be characterized by local statistics describing nodes and their neighbours, such as degree assortativity, these quantities fail to capture important sources of variation in network structure. We define a property called transsortativity that describes correlations among a node's neighbours. Transsortativity can be systematically varied, independently of the network's degree distribution and assortativity. Moreover, it can significantly impact the spread of contagions as well as the perceptions of neighbours, known as the majority illusion. Our work improves our ability to create and analyse more realistic models of complex networks.
Collapse
Affiliation(s)
- Shin-Chieng Ngo
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA.,Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA
| | - Allon G Percus
- Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA.,Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711, USA
| | - Keith Burghardt
- Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA
| |
Collapse
|
14
|
Chen E, Lerman K, Ferrara E. Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set. JMIR Public Health Surveill 2020; 6:e19273. [PMID: 32427106 PMCID: PMC7265654 DOI: 10.2196/19273] [Citation(s) in RCA: 204] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 05/15/2020] [Accepted: 05/15/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND At the time of this writing, the coronavirus disease (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources, and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much of the conversation about these phenomena now occurs online on social media platforms like Twitter. OBJECTIVE In this paper, we describe a multilingual COVID-19 Twitter data set that we are making available to the research community via our COVID-19-TweetIDs GitHub repository. METHODS We started this ongoing data collection on January 28, 2020, leveraging Twitter's streaming application programming interface (API) and Tweepy to follow certain keywords and accounts that were trending at the time data collection began. We used Twitter's search API to query for past tweets, resulting in the earliest tweets in our collection dating back to January 21, 2020. RESULTS Since the inception of our collection, we have actively maintained and updated our GitHub repository on a weekly basis. We have published over 123 million tweets, with over 60% of the tweets in English. This paper also presents basic statistics that show that Twitter activity responds and reacts to COVID-19-related events. CONCLUSIONS It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This data set could also help track COVID-19-related misinformation and unverified rumors or enable the understanding of fear and panic-and undoubtedly more.
Collapse
Affiliation(s)
- Emily Chen
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| | - Emilio Ferrara
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States
| |
Collapse
|
15
|
Alipourfard N, Nettasinghe B, Abeliuk A, Krishnamurthy V, Lerman K. Friendship paradox biases perceptions in directed networks. Nat Commun 2020; 11:707. [PMID: 32024843 PMCID: PMC7002371 DOI: 10.1038/s41467-020-14394-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 12/19/2019] [Indexed: 11/09/2022] Open
Abstract
Social networks shape perceptions by exposing people to the actions and opinions of their peers. However, the perceived popularity of a trait or an opinion may be very different from its actual popularity. We attribute this perception bias to friendship paradox and identify conditions under which it appears. We validate the findings empirically using Twitter data. Within posts made by users in our sample, we identify topics that appear more often within users’ social feeds than they do globally among all posts. We also present a polling algorithm that leverages the friendship paradox to obtain a statistically efficient estimate of a topic’s global prevalence from biased individual perceptions. We characterize the polling estimate and validate it through synthetic polling experiments on Twitter data. Our paper elucidates the non-intuitive ways in which the structure of directed networks can distort perceptions and presents approaches to mitigate this bias. Individuals within social networks rarely observe the network as a whole; rather, their observations are limited to their social circles. Here we show that network structure can distort observations, making a trait appear far more common within many social circles than it is in the network as a whole.
Collapse
Affiliation(s)
- Nazanin Alipourfard
- Information Sciences Institute, 4676 Admiralty Way, Marina Del Rey, Los Angeles, CA, 90292, USA.
| | | | - Andrés Abeliuk
- Information Sciences Institute, 4676 Admiralty Way, Marina Del Rey, Los Angeles, CA, 90292, USA
| | | | - Kristina Lerman
- Information Sciences Institute, 4676 Admiralty Way, Marina Del Rey, Los Angeles, CA, 90292, USA
| |
Collapse
|
16
|
L'Hommedieu M, L'Hommedieu J, Begay C, Schenone A, Dimitropoulou L, Margolin G, Falk T, Ferrara E, Lerman K, Narayanan S. Lessons Learned: Recommendations For Implementing a Longitudinal Study Using Wearable and Environmental Sensors in a Health Care Organization. JMIR Mhealth Uhealth 2019; 7:e13305. [PMID: 31821155 PMCID: PMC6930504 DOI: 10.2196/13305] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Revised: 08/12/2019] [Accepted: 10/01/2019] [Indexed: 11/13/2022] Open
Abstract
Although traditional methods of data collection in naturalistic settings can shed light on constructs of interest to researchers, advances in sensor-based technology allow researchers to capture continuous physiological and behavioral data to provide a more comprehensive understanding of the constructs that are examined in a dynamic health care setting. This study gives examples for implementing technology-facilitated approaches and provides the following recommendations for conducting such longitudinal, sensor-based research, with both environmental and wearable sensors in a health care setting: pilot test sensors and software early and often; build trust with key stakeholders and with potential participants who may be wary of sensor-based data collection and concerned about privacy; generate excitement for novel, new technology during recruitment; monitor incoming sensor data to troubleshoot sensor issues; and consider the logistical constraints of sensor-based research. The study describes how these recommendations were successfully implemented by providing examples from a large-scale, longitudinal, sensor-based study of hospital employees at a large hospital in California. The knowledge gained from this study may be helpful to researchers interested in obtaining dynamic, longitudinal sensor data from both wearable and environmental sensors in a health care setting (eg, a hospital) to obtain a more comprehensive understanding of constructs of interest in an ecologically valid, secure, and efficient way.
Collapse
Affiliation(s)
- Michelle L'Hommedieu
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States
| | - Justin L'Hommedieu
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States
| | - Cynthia Begay
- Department of Human Resources, Keck Medicine of University of Southern California, Los Angeles, CA, United States
| | - Alison Schenone
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States
| | - Lida Dimitropoulou
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States
| | - Gayla Margolin
- Department of Psychology, University of Southern California, Los Angeles, CA, United States
| | - Tiago Falk
- Institut national de la recherche scientifique, University of Québec, Montreal, QC, Canada
| | - Emilio Ferrara
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States
| | - Shrikanth Narayanan
- Information Sciences Institute, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
17
|
|
18
|
Abstract
Networks facilitate the spread of cascades, allowing a local perturbation to percolate via interactions between nodes and their neighbors. We investigate how network structure affects the dynamics of a spreading cascade. By accounting for the joint degree distribution of a network within a generating function framework, we can quantify how degree correlations affect both the onset of global cascades and the propensity of nodes of specific degree class to trigger large cascades. However, not all degree correlations are equally important in a spreading process. We introduce a new measure of degree assortativity that accounts for correlations among nodes relevant to a spreading cascade. We show that the critical point defining the onset of global cascades has a monotone relationship to this new assortativity measure. In addition, we show that the choice of nodes to seed the largest cascades is strongly affected by degree correlations. Contrary to traditional wisdom, when degree assortativity is positive, low degree nodes are more likely to generate largest cascades. Our work suggests that it may be possible to tailor spreading processes by manipulating the higher-order structure of networks.
Collapse
Affiliation(s)
- Xin-Zeng Wu
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, USA
| | - Peter G Fennell
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
| | - Allon G Percus
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
- Institute of Mathematical Sciences, Claremont Graduate University, Claremont, California 91711, USA
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
| |
Collapse
|
19
|
Sapienza A, Zeng Y, Bessi A, Lerman K, Ferrara E. Individual performance in team-based online games. R Soc Open Sci 2018; 5:180329. [PMID: 30110428 PMCID: PMC6030337 DOI: 10.1098/rsos.180329] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 05/22/2018] [Indexed: 06/08/2023]
Abstract
Complex real-world challenges are often solved through teamwork. Of special interest are ad hoc teams assembled to complete some task. Many popular multiplayer online battle arena (MOBA) video-games adopt this team formation strategy and thus provide a natural environment to study ad hoc teams. Our work examines data from a popular MOBA game, League of Legends, to understand the evolution of individual performance within ad hoc teams. Our analysis of player performance in successive matches of a gaming session demonstrates that a player's success deteriorates over the course of the session, but this effect is mitigated by the player's experience. We also find no significant long-term improvement in the individual performance of most players. Modelling the short-term performance dynamics allows us to accurately predict when players choose to continue to play or end the session. Our findings suggest possible directions for individualized incentives aimed at steering the player's behaviour and improving team performance.
Collapse
Affiliation(s)
- Anna Sapienza
- USC Information Sciences Institute, Marina del Rey, CA 90292, USA
| | - Yilei Zeng
- USC Information Sciences Institute, Marina del Rey, CA 90292, USA
- USC Department of Computer Science, Los Angeles, CA 90089, USA
| | - Alessandro Bessi
- USC Information Sciences Institute, Marina del Rey, CA 90292, USA
| | - Kristina Lerman
- USC Information Sciences Institute, Marina del Rey, CA 90292, USA
- USC Department of Computer Science, Los Angeles, CA 90089, USA
| | - Emilio Ferrara
- USC Information Sciences Institute, Marina del Rey, CA 90292, USA
- USC Department of Computer Science, Los Angeles, CA 90089, USA
| |
Collapse
|
20
|
Van Horn JD, Fierro L, Kamdar J, Gordon J, Stewart C, Bhattrai A, Abe S, Lei X, O'Driscoll C, Sinha A, Jain P, Burns G, Lerman K, Ambite JL. Democratizing data science through data science training. Pac Symp Biocomput 2018; 23:292-303. [PMID: 29218890 PMCID: PMC5731238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The biomedical sciences have experienced an explosion of data which promises to overwhelm many current practitioners. Without easy access to data science training resources, biomedical researchers may find themselves unable to wrangle their own datasets. In 2014, to address the challenges posed such a data onslaught, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative. To this end, the BD2K Training Coordinating Center (TCC; bigdatau.org) was funded to facilitate both in-person and online learning, and open up the concepts of data science to the widest possible audience. Here, we describe the activities of the BD2K TCC and its focus on the construction of the Educational Resource Discovery Index (ERuDIte), which identifies, collects, describes, and organizes online data science materials from BD2K awardees, open online courses, and videos from scientific lectures and tutorials. ERuDIte now indexes over 9,500 resources. Given the richness of online training materials and the constant evolution of biomedical data science, computational methods applying information retrieval, natural language processing, and machine learning techniques are required - in effect, using data science to inform training in data science. In so doing, the TCC seeks to democratize novel insights and discoveries brought forth via large-scale data science training.
Collapse
Affiliation(s)
- John Darrell Van Horn
- USC Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, 2025 Zonal Avenue, SHN, Los Angeles, CA 90033, USA,
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Allem JP, Ramanujam J, Lerman K, Chu KH, Boley Cruz T, Unger JB. Identifying Sentiment of Hookah-Related Posts on Twitter. JMIR Public Health Surveill 2017; 3:e74. [PMID: 29046267 PMCID: PMC5667930 DOI: 10.2196/publichealth.8133] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 09/02/2017] [Accepted: 09/20/2017] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The increasing popularity of hookah (or waterpipe) use in the United States and elsewhere has consequences for public health because it has similar health risks to that of combustible cigarettes. While hookah use rapidly increases in popularity, social media data (Twitter, Instagram) can be used to capture and describe the social and environmental contexts in which individuals use, perceive, discuss, and are marketed this tobacco product. These data may allow people to organically report on their sentiment toward tobacco products like hookah unprimed by a researcher, without instrument bias, and at low costs. OBJECTIVE This study describes the sentiment of hookah-related posts on Twitter and describes the importance of debiasing Twitter data when attempting to understand attitudes. METHODS Hookah-related posts on Twitter (N=986,320) were collected from March 24, 2015, to December 2, 2016. Machine learning models were used to describe sentiment on 20 different emotions and to debias the data so that Twitter posts reflected sentiment of legitimate human users and not of social bots or marketing-oriented accounts that would possibly provide overly positive or overly negative sentiment of hookah. RESULTS From the analytical sample, 352,116 tweets (59.50%) were classified as positive while 177,537 (30.00%) were classified as negative, and 62,139 (10.50%) neutral. Among all positive tweets, 218,312 (62.00%) were classified as highly positive emotions (eg, active, alert, excited, elated, happy, and pleasant), while 133,804 (38.00%) positive tweets were classified as passive positive emotions (eg, contented, serene, calm, relaxed, and subdued). Among all negative tweets, 95,870 (54.00%) were classified as subdued negative emotions (eg, sad, unhappy, depressed, and bored) while the remaining 81,667 (46.00%) negative tweets were classified as highly negative emotions (eg, tense, nervous, stressed, upset, and unpleasant). Sentiment changed drastically when comparing a corpus of tweets with social bots to one without. For example, the probability of any one tweet reflecting joy was 61.30% from the debiased (or bot free) corpus of tweets. In contrast, the probability of any one tweet reflecting joy was 16.40% from the biased corpus. CONCLUSIONS Social media data provide researchers the ability to understand public sentiment and attitudes by listening to what people are saying in their own words. Tobacco control programmers in charge of risk communication may consider targeting individuals posting positive messages about hookah on Twitter or designing messages that amplify the negative sentiments. Posts on Twitter communicating positive sentiment toward hookah could add to the normalization of hookah use and is an area of future research. Findings from this study demonstrated the importance of debiasing data when attempting to understand attitudes from Twitter data.
Collapse
Affiliation(s)
| | | | - Kristina Lerman
- University of Southern California, Los Angeles, CA, United States
| | - Kar-Hai Chu
- University of Pittsburgh, Pittsburgh, PA, United States
| | - Tess Boley Cruz
- Keck School of Medicine of USC, Los Angeles, CA, United States
| | | |
Collapse
|
22
|
Abstract
In numerous physical models on networks, dynamics are based on interactions that exclusively involve properties of a node’s nearest neighbors. However, a node’s local view of its neighbors may systematically bias perceptions of network connectivity or the prevalence of certain traits. We investigate the strong friendship paradox, which occurs when the majority of a node’s neighbors have more neighbors than does the node itself. We develop a model to predict the magnitude of the paradox, showing that it is enhanced by negative correlations between degrees of neighboring nodes. We then show that by including neighbor-neighbor correlations, which are degree correlations one step beyond those of neighboring nodes, we accurately predict the impact of the strong friendship paradox in real-world networks. Understanding how the paradox biases local observations can inform better measurements of network structure and our understanding of collective phenomena.
Collapse
Affiliation(s)
- Xin-Zeng Wu
- Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA. .,Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA.
| | - Allon G Percus
- Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711, USA
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA
| |
Collapse
|
23
|
Burghardt K, Alsina EF, Girvan M, Rand W, Lerman K. The myopia of crowds: Cognitive load and collective evaluation of answers on Stack Exchange. PLoS One 2017; 12:e0173610. [PMID: 28301531 PMCID: PMC5354439 DOI: 10.1371/journal.pone.0173610] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 02/22/2017] [Indexed: 11/18/2022] Open
Abstract
Crowds can often make better decisions than individuals or small groups of experts by leveraging their ability to aggregate diverse information. Question answering sites, such as Stack Exchange, rely on the “wisdom of crowds” effect to identify the best answers to questions asked by users. We analyze data from 250 communities on the Stack Exchange network to pinpoint factors affecting which answers are chosen as the best answers. Our results suggest that, rather than evaluate all available answers to a question, users rely on simple cognitive heuristics to choose an answer to vote for or accept. These cognitive heuristics are linked to an answer’s salience, such as the order in which it is listed and how much screen space it occupies. While askers appear to depend on heuristics to a greater extent than voters when choosing an answer to accept as the most helpful one, voters use acceptance itself as a heuristic, and they are more likely to choose the answer after it has been accepted than before that answer was accepted. These heuristics become more important in explaining and predicting behavior as the number of available answers to a question increases. Our findings suggest that crowd judgments may become less reliable as the number of answers grows.
Collapse
Affiliation(s)
- Keith Burghardt
- Dept of Computer Science, University of California at Davis, Davis, CA, United States of America
- Dept of Political Science, University of California at Davis, Davis, CA, United States of America
- * E-mail:
| | | | - Michelle Girvan
- Dept of Physics, University of Maryland, College Park, MD, United States of America
- Santa Fe Institute, Santa Fe, NM, United States of America
| | - William Rand
- Department of Business Management, North Carolina State University, Raleigh, NC, United States of America
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States of America
| |
Collapse
|
24
|
Singer P, Ferrara E, Kooti F, Strohmaier M, Lerman K. Evidence of Online Performance Deterioration in User Sessions on Reddit. PLoS One 2016; 11:e0161636. [PMID: 27560185 PMCID: PMC4999233 DOI: 10.1371/journal.pone.0161636] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2016] [Accepted: 08/09/2016] [Indexed: 11/19/2022] Open
Abstract
This article presents evidence of performance deterioration in online user sessions quantified by studying a massive dataset containing over 55 million comments posted on Reddit in April 2015. After segmenting the sessions (i.e., periods of activity without a prolonged break) depending on their intensity (i.e., how many posts users produced during sessions), we observe a general decrease in the quality of comments produced by users over the course of sessions. We propose mixed-effects models that capture the impact of session intensity on comments, including their length, quality, and the responses they generate from the community. Our findings suggest performance deterioration: Sessions of increasing intensity are associated with the production of shorter, progressively less complex comments, which receive declining quality scores (as rated by other users), and are less and less engaging (i.e., they attract fewer responses). Our contribution evokes a connection between cognitive and attention dynamics and the usage of online social peer production platforms, specifically the effects of deterioration of user performance.
Collapse
Affiliation(s)
- Philipp Singer
- GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
- University of Koblenz-Landau, Koblenz, Germany
| | - Emilio Ferrara
- Information Sciences Institute, University of Southern California, Los Angeles, United States of America
| | - Farshad Kooti
- Information Sciences Institute, University of Southern California, Los Angeles, United States of America
| | - Markus Strohmaier
- GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
- University of Koblenz-Landau, Koblenz, Germany
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Los Angeles, United States of America
| |
Collapse
|
25
|
Abstract
Dynamic task allocation is an essential requirement for multi-robot systems operating in unknown dynamic environments. It allows robots to change their behavior in response to environmental changes or actions of other robots in order to improve overall system performance. Emergent coordination algorithms for task allocation that use only local sensing and no direct communication between robots are attractive because they are robust and scalable. However, a lack of formal analysis tools makes emergent coordination algorithms difficult to design. In this paper we present a mathematical model of a general dynamic task allocation mechanism. Robots using this mechanism have to choose between two types of tasks, and the goal is to achieve a desired task division in the absence of explicit communication and global knowledge. Robots estimate the state of the environment from repeated local observations and decide which task to choose based on these observations. We model the robots and observations as stochastic processes and study the dynamics of the collective behavior. Specifically, we analyze the effect that the number of observations and the choice of the decision function have on the performance of the system. The mathematical models are validated in a multi-robot multi-foraging scenario. The model's predictions agree very closely with results of embodied simulations.
Collapse
Affiliation(s)
- Kristina Lerman
- Information Sciences Institute University of Southern California Los Angeles, CA 90089-0781, USA
| | - Chris Jones
- iRobot Corporation 63 South Ave Burlington, MA 01803
| | - Aram Galstyan
- Information Sciences Institute University of Southern California Los Angeles, CA 90089-0781, USA
| | - Maja J Matarić
- Computer Science Department University of Southern California Los Angeles, CA 90089-0781, USA
| |
Collapse
|
26
|
Lamprecht D, Lerman K, Helic D, Strohmaier M. How the structure of Wikipedia articles influences user navigation. NEW REV HYPERMEDIA M 2016; 23:29-50. [PMID: 28670171 PMCID: PMC5468769 DOI: 10.1080/13614568.2016.1179798] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 04/14/2016] [Indexed: 11/03/2022]
Abstract
In this work we study how people navigate the information network of Wikipedia and investigate (i) free-form navigation by studying all clicks within the English Wikipedia over an entire month and (ii) goal-directed Wikipedia navigation by analyzing wikigames, where users are challenged to retrieve articles by following links. To study how the organization of Wikipedia articles in terms of layout and links affects navigation behavior, we first investigate the characteristics of the structural organization and of hyperlinks in Wikipedia and then evaluate link selection models based on article structure and other potential influences in navigation, such as the generality of an article's topic. In free-form Wikipedia navigation, covering all Wikipedia usage scenarios, we find that click choices can be best modeled by a bias towards article structure, such as a tendency to click links located in the lead section. For the goal-directed navigation of wikigames, our findings confirm the zoom-out and the homing-in phases identified by previous work, where users are guided by generality at first and textual similarity to the target later. However, our interpretation of the link selection models accentuates that article structure is the best explanation for the navigation paths in all except these initial and final stages. Overall, we find evidence that users more frequently click on links that are located close to the top of an article. The structure of Wikipedia articles, which places links to more general concepts near the top, supports navigation by allowing users to quickly find the better-connected articles that facilitate navigation. Our results highlight the importance of article structure and link position in Wikipedia navigation and suggest that better organization of information can help make information networks more navigable.
Collapse
Affiliation(s)
- Daniel Lamprecht
- Knowledge Technologies Institute, Graz University of Technology, Graz, Austria
| | - Kristina Lerman
- Information Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | - Denis Helic
- Knowledge Technologies Institute, Graz University of Technology, Graz, Austria
| | - Markus Strohmaier
- Department of Computer Science, University of Koblenz-Landau, Mainz, Germany.,GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
| |
Collapse
|
27
|
Abstract
Individual’s decisions, from what product to buy to whether to engage in risky behavior, often depend on the choices, behaviors, or states of other people. People, however, rarely have global knowledge of the states of others, but must estimate them from the local observations of their social contacts. Network structure can significantly distort individual’s local observations. Under some conditions, a state that is globally rare in a network may be dramatically over-represented in the local neighborhoods of many individuals. This effect, which we call the “majority illusion,” leads individuals to systematically overestimate the prevalence of that state, which may accelerate the spread of social contagions. We develop a statistical model that quantifies this effect and validate it with measurements in synthetic and real-world networks. We show that the illusion is exacerbated in networks with a heterogeneous degree distribution and disassortative structure.
Collapse
Affiliation(s)
- Kristina Lerman
- USC Information Sciences Institute, Marina del Rey, CA, United States of America
- * E-mail:
| | - Xiaoran Yan
- USC Information Sciences Institute, Marina del Rey, CA, United States of America
| | - Xin-Zeng Wu
- USC Information Sciences Institute, Marina del Rey, CA, United States of America
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, United States of America
| |
Collapse
|
28
|
Abstract
With the advent of social media and peer production, the amount of new online content has grown dramatically. To identify interesting items in the vast stream of new content, providers must rely on peer recommendation to aggregate opinions of their many users. Due to human cognitive biases, the presentation order strongly affects how people allocate attention to the available content. Moreover, we can manipulate attention through the presentation order of items to change the way peer recommendation works. We experimentally evaluate this effect using Amazon Mechanical Turk. We find that different policies for ordering content can steer user attention so as to improve the outcomes of peer recommendation.
Collapse
Affiliation(s)
- Kristina Lerman
- USC Information Sciences Institute, Marina Del Rey, California, United States of America
- * E-mail:
| | - Tad Hogg
- Institute for Molecular Manufacturing, Palo Alto, California, United States of America
| |
Collapse
|
29
|
Abstract
It is commonly believed that information spreads between individuals like a pathogen, with each exposure by an informed friend potentially resulting in a naive individual becoming infected. However, empirical studies of social media suggest that individual response to repeated exposure to information is far more complex. As a proxy for intervention experiments, we compare user responses to multiple exposures on two different social media sites, Twitter and Digg. We show that the position of exposing messages on the user-interface strongly affects social contagion. Accounting for this visibility significantly simplifies the dynamics of social contagion. The likelihood an individual will spread information increases monotonically with exposure, while explicit feedback about how many friends have previously spread it increases the likelihood of a response. We provide a framework for unifying information visibility, divided attention, and explicit social feedback to predict the temporal dynamics of user behavior.
Collapse
Affiliation(s)
- Nathan O Hodas
- USC Information Sciences Institute, Marina del Rey, CA 90292
| | - Kristina Lerman
- USC Information Sciences Institute, Marina del Rey, CA 90292
| |
Collapse
|
30
|
|
31
|
Smith LM, Lerman K, Garcia-Cardona C, Percus AG, Ghosh R. Spectral clustering with epidemic diffusion. Phys Rev E Stat Nonlin Soft Matter Phys 2013; 88:042813. [PMID: 24229231 DOI: 10.1103/physreve.88.042813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Indexed: 06/02/2023]
Abstract
Spectral clustering is widely used to partition graphs into distinct modules or communities. Existing methods for spectral clustering use the eigenvalues and eigenvectors of the graph Laplacian, an operator that is closely associated with random walks on graphs. We propose a spectral partitioning method that exploits the properties of epidemic diffusion. An epidemic is a dynamic process that, unlike the random walk, simultaneously transitions to all the neighbors of a given node. We show that the replicator, an operator describing epidemic diffusion, is equivalent to the symmetric normalized Laplacian of a reweighted graph with edges reweighted by the eigenvector centralities of their incident nodes. Thus, more weight is given to edges connecting more central nodes. We describe a method that partitions the nodes based on the componentwise ratio of the replicator's second eigenvector to the first and compare its performance to traditional spectral clustering techniques on synthetic graphs with known community structure. We demonstrate that the replicator gives preference to dense, clique-like structures, enabling it to more effectively discover communities that may be obscured by dense intercommunity linking.
Collapse
Affiliation(s)
- Laura M Smith
- California State University, Fullerton, California 92831, USA
| | | | | | | | | |
Collapse
|
32
|
Abstract
The popularity of content in social media is unequally distributed, with some items receiving a disproportionate share of attention from users. Predicting which newly-submitted items will become popular is critically important for both the hosts of social media content and its consumers. Accurate and timely prediction would enable hosts to maximize revenue through differential pricing for access to content or ad placement. Prediction would also give consumers an important tool for filtering the content. Predicting the popularity of content in social media is challenging due to the complex interactions between content quality and how the social media site highlights its content. Moreover, most social media sites selectively present content that has been highly rated by similar users, whose similarity is indicated implicitly by their behavior or explicitly by links in a social network. While these factors make it difficult to predict popularitya priori, stochastic models of user behavior on these sites can allow predicting popularity based on early user reactions to new content. By incorporating the various mechanisms through which web sites display content, such models improve on predictions that are based on simply extrapolating from the early votes. Specifically, for one such site, the news aggregator Digg, we show how a stochastic model distinguishes the effect of the increased visibility due to the network from how interested users are in the content. We find a wide range of interest, distinguishing stories primarily of interest to users in the network (“niche interests”) from those of more general interest to the user community. This distinction is useful for predicting a story’s eventual popularity from users’ early reactions to the story.
Collapse
Affiliation(s)
| | - Tad Hogg
- Institute for Molecular Manufacturing
| |
Collapse
|
33
|
Lerman K, Ghosh R. Network structure, topology, and dynamics in generalized models of synchronization. Phys Rev E Stat Nonlin Soft Matter Phys 2012; 86:026108. [PMID: 23005826 DOI: 10.1103/physreve.86.026108] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 04/25/2012] [Indexed: 06/01/2023]
Abstract
Network structure is a product of both its topology and interactions between its nodes. We explore this claim using the paradigm of distributed synchronization in a network of coupled oscillators. As the network evolves to a global steady state, nodes synchronize in stages, revealing the network's underlying community structure. Traditional models of synchronization assume that interactions between nodes are mediated by a conservative process similar to diffusion. However, social and biological processes are often nonconservative. We propose a model of synchronization in a network of oscillators coupled via nonconservative processes. We study the dynamics of synchronization of a synthetic and real-world networks and show that the traditional and nonconservative models of synchronization reveal different structures within the same network.
Collapse
Affiliation(s)
- Kristina Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA
| | | |
Collapse
|
34
|
Ghosh R, Lerman K. Parameterized centrality metric for network analysis. Phys Rev E Stat Nonlin Soft Matter Phys 2011; 83:066118. [PMID: 21797452 DOI: 10.1103/physreve.83.066118] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Revised: 03/14/2011] [Indexed: 05/31/2023]
Abstract
A variety of metrics have been proposed to measure the relative importance of nodes in a network. One of these, alpha-centrality [P. Bonacich, Am. J. Sociol. 92, 1170 (1987)], measures the number of attenuated paths that exist between nodes. We introduce a normalized version of this metric and use it to study network structure, for example, to rank nodes and find community structure of the network. Specifically, we extend the modularity-maximization method for community detection to use this metric as the measure of node connectivity. Normalized alpha-centrality is a powerful tool for network analysis, since it contains a tunable parameter that sets the length scale of interactions. Studying how rankings and discovered communities change when this parameter is varied allows us to identify locally and globally important nodes and structures. We apply the proposed metric to several benchmark networks and show that it leads to better insights into network structure than alternative metrics.
Collapse
Affiliation(s)
- Rumi Ghosh
- USC Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, California 90292, USA.
| | | |
Collapse
|
35
|
Balduccini M, Baral C, Brodaric B, Colton S, Fox P, Gutelius D, Hinkelmann K, Horswill I, Huberman B, Hudlicka E, Lerman K, Lisetti C, McGuinness DL, Maher ML, Musen MA, Sahami M, Sleeman D, Thönssen B, Velasquez JD, Ventura D. AAAI 2008 Spring Symposia Reports. AI MAG 2008. [DOI: 10.1609/aimag.v29i3.2148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
The Association for the Advancement of Artificial Intelligence (AAAI) was pleased to present the AAAI 2008 Spring Symposium Series, held Wednesday through Friday, March 26–28, 2008 at Stanford University, California. The titles of the eight symposia were as follows: (1) AI Meets Business Rules and Process Management, (2) Architectures for Intelligent Theory-Based Agents, (3) Creative Intelligent Systems, (4) Emotion, Personality, and Social Behavior, (5) Semantic Scientific Knowledge Integration, (6) Social Information Processing, (7) Symbiotic Relationships between Semantic Web and Knowledge Engineering, (8) Using AI to Motivate Greater Participation in Computer Science The goal of the AI Meets Business Rules and Process Management AAAI symposium was to investigate the various approaches and standards to represent business rules, business process management and the semantic web with respect to expressiveness and reasoning capabilities. The focus of the Architectures for Intelligent Theory-Based Agents AAAI symposium was the definition of architectures for intelligent theory-based agents, comprising languages, knowledge representation methodologies, reasoning algorithms, and control loops. The Creative Intelligent Systems Symposium included five major discussion sessions and a general poster session (in which all contributing papers were presented). The purpose of this symposium was to explore the synergies between creative cognition and intelligent systems. The goal of the Emotion, Personality, and Social Behavior symposium was to examine fundamental issues in affect and personality in both biological and artificial agents, focusing on the roles of these factors in mediating social behavior. The Semantic Scientific Knowledge Symposium was interested in bringing together the semantic technologies community with the scientific information technology community in an effort to build the general semantic science information community. The Social Information Processing's goal was to investigate computational and analytic approaches that will enable users to harness the efforts of large numbers of other users to solve a variety of information processing problems, from discovering high-quality content to managing common resources. The goal of the Symbiotic Relationships between the Semantic Web and Software Engineering symposium was to explore how the lessons learned by the knowledge-engineering community over the past three decades could be applied to the bold research agenda of current workers in semantic web technologies. The purpose of the Using AI to Motivate Greater Participation in Computer Science symposium was to identify ways that topics in AI may be used to motivate greater student participation in computer science by highlighting fun, engaging, and intellectually challenging developments in AI-related curriculum at a number of educational levels. Technical reports of the symposia were published by AAAI Press.
Collapse
|
36
|
|
37
|
Abstract
The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task.
Collapse
|
38
|
Galstyan A, Lerman K. Adaptive Boolean networks and minority games with time-dependent capacities. Phys Rev E Stat Nonlin Soft Matter Phys 2002; 66:015103. [PMID: 12241409 DOI: 10.1103/physreve.66.015103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2001] [Indexed: 05/23/2023]
Abstract
In this paper we consider a network of Boolean agents that compete for a limited resource. The agents play the so called generalized minority game where the capacity level is allowed to vary externally. We study the properties of such a system for different values of the mean connectivity K of the network, and show that the system with K=2 shows a high degree of coordination for relatively large variations of the capacity level.
Collapse
Affiliation(s)
- Aram Galstyan
- Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, California 90292, USA
| | | |
Collapse
|
39
|
|
40
|
Abstract
In this article, we present a macroscopic analytical model of collaboration in a group of reactive robots. The model consists of a series of coupled differential equations that describe the dynamics of group behavior. After presenting the general model, we analyze in detail a case study of collaboration, the stick-pulling experiment, studied experimentally and in simulation by Ijspeert et al. [Autonomous Robots, 11, 149-171]. The robots' task is to pull sticks out of their holes, and it can be successfully achieved only through the collaboration of two robots. There is no explicit communication or coordination between the robots. Unlike microscopic simulations (sensor-based or using a probabilistic numerical model), in which computational time scales with the robot group size, the macroscopic model is computationally efficient, because its solutions are independent of robot group size. Analysis reproduces several qualitative conclusions of Ijspeert et al.: namely, the different dynamical regimes for different values of the ratio of robots to sticks, the existence of optimal control parameters that maximize system performance as a function of group size, and the transition from superlinear to sublinear performance as the number of robots is increased.
Collapse
Affiliation(s)
- K Lerman
- Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA.
| | | | | | | |
Collapse
|
41
|
Lerman K, Ahlers G, Cannell DS. Different convection dynamics in mixtures with the same separation ratio. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 1996; 53:R2041-R2044. [PMID: 9964602 DOI: 10.1103/physreve.53.r2041] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
|
42
|
Lerman K, Bodenschatz E, Cannell DS, Ahlers G. Transient localized states in 2D binary liquid convection. Phys Rev Lett 1993; 70:3572-3575. [PMID: 10053908 DOI: 10.1103/physrevlett.70.3572] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
|
43
|
Hasson KC, Cates GD, Lerman K, Bogorad P, Happer W. Erratum: Spin relaxation due to magnetic-field inhomogeneities: Quartic dependence and diffusion-constant measurements. Phys Rev A 1990; 42:5766. [PMID: 9904732 DOI: 10.1103/physreva.42.5766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
|
44
|
Hasson KC, Cates GD, Lerman K, Bogorad P, Happer W. Spin relaxation due to magnetic-field inhomogeneities: Quartic dependence and diffusion-constant measurements. Phys Rev A 1990; 41:3672-3688. [PMID: 9903538 DOI: 10.1103/physreva.41.3672] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
|