1
|
Grubb J, Lopez D, Mohan B, Matta J. Network centrality for the identification of biomarkers in respondent-driven sampling datasets. PLoS One 2021; 16:e0256601. [PMID: 34428228 PMCID: PMC8384166 DOI: 10.1371/journal.pone.0256601] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 08/10/2021] [Indexed: 12/24/2022] Open
Abstract
Networks science techniques are frequently used to provide meaningful insights into the populations underlying medical and social data. This paper examines SATHCAP, a dataset related to HIV and drug use in three US cities. In particular, we use network measures such as betweenness centrality, closeness centrality, and eigenvector centrality to find central, important nodes in a network derived from SATHCAP data. We evaluate the attributes of these important nodes and create an exceptionality score based on the number of nodes that share a particular attribute. This score, along with the underlying network itself, is used to reveal insight into the attributes of groups that can be effectively targeted to slow the spread of disease. Our research confirms a known connection between homelessness and HIV, as well as drug abuse and HIV, and shows support for the theory that individuals without easy access to transportation are more likely to be central to the spread of HIV in urban, high risk populations.
Collapse
Affiliation(s)
- Jacob Grubb
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
| | - Derek Lopez
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
| | - Bhuvaneshwar Mohan
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
| | - John Matta
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
| |
Collapse
|
2
|
Shang Y. Immunization of networks with limited knowledge and temporary immunity. CHAOS (WOODBURY, N.Y.) 2021; 31:053117. [PMID: 34240934 DOI: 10.1063/5.0045445] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/29/2021] [Indexed: 06/13/2023]
Abstract
Modern view of network resilience and epidemic spreading has been shaped by percolation tools from statistical physics, where nodes and edges are removed or immunized randomly from a large-scale network. In this paper, we produce a theoretical framework for studying targeted immunization in networks, where only n nodes can be observed at a time with the most connected one among them being immunized and the immunity it has acquired may be lost subject to a decay probability ρ. We examine analytically the percolation properties as well as scaling laws, which uncover distinctive characters for Erdős-Rényi and power-law networks in the two dimensions of n and ρ. We study both the case of a fixed immunity loss rate as well as an asymptotic total loss scenario, paving the way to further understand temporary immunity in complex percolation processes with limited knowledge.
Collapse
Affiliation(s)
- Y Shang
- Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne NE1 8ST, United Kingdom
| |
Collapse
|
3
|
Location inference for hidden population with online text analysis. Int J Health Geogr 2020; 19:57. [PMID: 33298074 PMCID: PMC7724834 DOI: 10.1186/s12942-020-00245-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Accepted: 11/09/2020] [Indexed: 11/21/2022] Open
Abstract
Background Understanding the geographic distribution of hidden population, such as men who have sex with men (MSM), sex workers, or injecting drug users, are of great importance for the adequate deployment of intervention strategies and public health decision making. However, due to the hard-to-access properties, e.g., lack of a sampling frame, sensitivity issue, reporting error, etc., traditional survey methods are largely limited when studying such populations. With data extracted from the very active online community of MSM in China, in this study we adopt and develop location inferring methods to achieve a high-resolution mapping of users in this community at national level. Methods We collect a comprehensive dataset from the largest sub-community related to MSM topics in Baidu Tieba, covering 628,360 MSM-related users. Based on users’ publicly available posts, we evaluate and compare the performances of mainstream location inference algorithms on the online locating problem of Chinese MSM population. To improve the inference accuracy, other approaches in natural language processing are introduced into the location extraction, such as context analysis and pattern recognition. In addition, we develop a hybrid voting algorithm (HVA-LI) by allowing different approaches to vote to determine the best inference results, which guarantees a more effective way on location inference for hidden population. Results By comparing the performances of popular inference algorithms, we find that the classic gazetteer-based algorithm has achieved better results. And in the HVA-LI algorithms, the hybrid algorithm consisting of the simple gazetteer-based method and named entity recognition (NER) is proven to be the best to deal with inferring users’ locations disclosed in short texts on online communities, improving the inferring accuracy from 50.3 to 71.3% on the MSM-related dataset. Conclusions In this study, we have explored the possibility of location inferring by analyzing textual content posted by online users. A more effective hybrid algorithm, i.e., the Gazetteer & NER algorithm is proposed, which is conducive to overcoming the sparse location labeling problem in user profiles, and can be extended to the inference of geo-statistics for other hidden populations.
Collapse
|
4
|
Rosenblatt SF, Smith JA, Gauthier GR, Hébert-Dufresne L. Immunization strategies in networks with missing data. PLoS Comput Biol 2020; 16:e1007897. [PMID: 32645081 PMCID: PMC7386582 DOI: 10.1371/journal.pcbi.1007897] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 07/28/2020] [Accepted: 04/22/2020] [Indexed: 11/18/2022] Open
Abstract
Network-based intervention strategies can be effective and cost-efficient approaches to curtailing harmful contagions in myriad settings. As studied, these strategies are often impractical to implement, as they typically assume complete knowledge of the network structure, which is unusual in practice. In this paper, we investigate how different immunization strategies perform under realistic conditions-where the strategies are informed by partially-observed network data. Our results suggest that global immunization strategies, like degree immunization, are optimal in most cases; the exception is at very high levels of missing data, where stochastic strategies, like acquaintance immunization, begin to outstrip them in minimizing outbreaks. Stochastic strategies are more robust in some cases due to the different ways in which they can be affected by missing data. In fact, one of our proposed variants of acquaintance immunization leverages a logistically-realistic ongoing survey-intervention process as a form of targeted data-recovery to improve with increasing levels of missing data. These results support the effectiveness of targeted immunization as a general practice. They also highlight the risks of considering networks as idealized mathematical objects: overestimating the accuracy of network data and foregoing the rewards of additional inquiry.
Collapse
Affiliation(s)
- Samuel F. Rosenblatt
- Department of Computer Science, University of Vermont, Burlington, Vermont, United States of America
- Vermont Complex Systems Center, University of Vermont, Burlington, Vermont, United States of America
| | - Jeffrey A. Smith
- Department of Sociology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - G. Robin Gauthier
- Department of Sociology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Laurent Hébert-Dufresne
- Department of Computer Science, University of Vermont, Burlington, Vermont, United States of America
- Vermont Complex Systems Center, University of Vermont, Burlington, Vermont, United States of America
| |
Collapse
|
5
|
Mulberry N, Rutherford AR, Wittenberg RW, Williams BG. HIV control strategies for sex worker-client contact networks. J R Soc Interface 2019; 16:20190497. [PMID: 31551046 DOI: 10.1098/rsif.2019.0497] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Controlling the spread of HIV among hidden, high-risk populations such as survival sex workers and their clients is becoming increasingly important in the ongoing fight against HIV/AIDS. Several sociological and structural factors render general control strategies ineffective in these settings; instead, focused prevention, testing and treatment strategies which take into account the nature of survival sex work are required. Using a dynamic bipartite network model of sexual contacts, we investigate the optimal distribution of treatment and preventative resources among sex workers and their clients; specifically, we consider control strategies that randomly allocate antiretroviral therapy and pre-exposure prophylaxis within each subpopulation separately. Motivated by historical data from a South African mining community, three main asymmetries between sex workers and clients are considered in our model: relative population sizes, migration rates and partner distributions. We find that preventative interventions targeted at female sex workers are the lowest cost strategies for reducing HIV prevalence, since the sex workers form a smaller population and have, on average, more sexual contacts. However, the high migration rate among survival sex workers limits the extent to which prevalence can be reduced using this strategy. To achieve a further reduction in HIV prevalence, testing and treatment in the client population cannot be ignored.
Collapse
Affiliation(s)
- Nicola Mulberry
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Alexander R Rutherford
- Department of Mathematics and SFU Big Data, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Ralf W Wittenberg
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Brian G Williams
- South African Centre for Epidemiological Modelling and Analysis, University of Stellenbosch, Stellenbosch, South Africa
| |
Collapse
|
6
|
Chen S, Lu X, Liu Z, Jia Z. Inferring the Population Mean with Second-Order Information in Online Social Networks. ENTROPY 2018; 20:e20060480. [PMID: 33265570 PMCID: PMC7512998 DOI: 10.3390/e20060480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Revised: 06/16/2018] [Accepted: 06/17/2018] [Indexed: 12/03/2022]
Abstract
With the increasing use of online social networking platforms, online surveys are widely used in many fields, e.g., public health, business and sociology, to collect samples and to infer the population characteristics through self-reported data of respondents. Although the online surveys can protect the privacy of respondents, self-reporting is challenged by a low response rate and unreliable answers when the survey contains sensitive questions, such as drug use, sexual behaviors, abortion or criminal activity. To overcome this limitation, this paper develops an approach that collects the second-order information of the respondents, i.e., asking them about the characteristics of their friends, instead of asking the respondents’ own characteristics directly. Then, we generate the inference about the population variable with the Hansen-Hurwitz estimator for the two classic sampling strategies (simple random sampling or random walk-based sampling). The method is evaluated by simulations on both artificial and real-world networks. Results show that the method is able to generate population estimates with high accuracy without knowing the respondents’ own characteristics, and the biases of estimates under various settings are relatively small and are within acceptable limits. The new method offers an alternative way for implementing surveys online and is expected to be able to collect more reliable data with improved population inference on sensitive variables.
Collapse
Affiliation(s)
- Saran Chen
- College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
| | - Xin Lu
- College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
- School of Business, Central South University, Changsha 410083, China
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
- Department of Public Health Sciences, Karolinska Institutet, 17177 Stockholm, Sweden
- Correspondence: or ; Tel.: +86-186-2756-1577
| | - Zhong Liu
- College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
| | - Zhongwei Jia
- National Institute of Drug Dependence, Health Science Center, Peking University, Beijing 100191, China
| |
Collapse
|
7
|
Liu C, Lu X. Analyzing hidden populations online: topic, emotion, and social network of HIV-related users in the largest Chinese online community. BMC Med Inform Decis Mak 2018; 18:2. [PMID: 29304788 PMCID: PMC5755307 DOI: 10.1186/s12911-017-0579-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 12/21/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Traditional survey methods are limited in the study of hidden populations due to the hard to access properties, including lack of a sampling frame, sensitivity issue, reporting error, small sample size, etc. The rapid increase of online communities, of which members interact with others via the Internet, have generated large amounts of data, offering new opportunities for understanding hidden populations with unprecedented sample sizes and richness of information. In this study, we try to understand the multidimensional characteristics of a hidden population by analyzing the massive data generated in the online community. METHODS By elaborately designing crawlers, we retrieved a complete dataset from the "HIV bar," the largest bar related to HIV on the Baidu Tieba platform, for all records from January 2005 to August 2016. Through natural language processing and social network analysis, we explored the psychology, behavior and demand of online HIV population and examined the network community structure. RESULTS In HIV communities, the average topic similarity among members is positively correlated to network efficiency (r = 0.70, p < 0.001), indicating that the closer the social distance between members of the community, the more similar their topics. The proportion of negative users in each community is around 60%, weakly correlated with community size (r = 0.25, p = 0.002). It is found that users suspecting initial HIV infection or first in contact with high-risk behaviors tend to seek help and advice on the social networking platform, rather than immediately going to a hospital for blood tests. CONCLUSIONS Online communities have generated copious amounts of data offering new opportunities for understanding hidden populations with unprecedented sample sizes and richness of information. It is recommended that support through online services for HIV/AIDS consultation and diagnosis be improved to avoid privacy concerns and social discrimination in China.
Collapse
Affiliation(s)
- Chuchu Liu
- College of Information System and Management, National University of Defense Technology, Changsha, 410073, China
| | - Xin Lu
- College of Information System and Management, National University of Defense Technology, Changsha, 410073, China. .,School of Business Administration, Southwestern University of Finance and Economics, Chengdu, 610074, China. .,Department of Public Health Sciences, Karolinska Institutet, 17 177, Stockholm, Sweden.
| |
Collapse
|