151
|
Unifying community detection and network embedding in attributed networks. Knowl Inf Syst 2021. [DOI: 10.1007/s10115-021-01557-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
152
|
Ma H, Yang H, Zhou K, Zhang L, Zhang X. A local-to-global scheme-based multi-objective evolutionary algorithm for overlapping community detection on large-scale complex networks. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05311-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
153
|
Bogerd K, Castro RM, van der Hofstad R, Verzelen N. Detecting a planted community in an inhomogeneous random graph. BERNOULLI 2021. [DOI: 10.3150/20-bej1269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Kay Bogerd
- Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Rui M. Castro
- Eindhoven University of Technology, Eindhoven, The Netherlands
| | | | | |
Collapse
|
154
|
A New Framework for Discovering Protein Complex and Disease Association via Mining Multiple Databases. Interdiscip Sci 2021; 13:683-692. [PMID: 33905111 DOI: 10.1007/s12539-021-00432-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 03/31/2021] [Accepted: 04/09/2021] [Indexed: 10/21/2022]
Abstract
One important challenge in the post-genomic era is to explore disease mechanisms by efficiently integrating different types of biological data. In fact, a single disease is usually caused through multiple genes products such as protein complexes rather than single gene. Therefore, it is meaningful for us to discover protein communities from the protein-protein interaction network and use them for inferring disease-disease associations. In this article, we propose a new framework including protein-protein networks, disease-gene associations and disease-complex pairs to cluster protein complexes and infer disease associations. Complexes discovered by our approach is superior in quality (Sn, PPV and ACC) and clustering quantity than other four popular methods on three PPI networks. A systematic analysis shows that disease pairs sharing more protein complexes (such as Glucose and Lipid Metabolic Disorders) are more similar and overlapping proteins may have different roles in different diseases. These findings can provide clinical scholars and medical practitioners with new ideas on disease identification and treatment.
Collapse
|
155
|
Swain A, Devereux M, Fagan WF. Deciphering trophic interactions in a mid-Cambrian assemblage. iScience 2021; 24:102271. [PMID: 33817576 PMCID: PMC8010449 DOI: 10.1016/j.isci.2021.102271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 12/07/2020] [Accepted: 03/02/2021] [Indexed: 11/23/2022] Open
Abstract
Exceptionally preserved fossil sites have allowed specimen-based identification of trophic interactions to which network analyses have been applied. However, network analyses of the fossil record suffer from incomplete and indirect data, time averaging that obscures species coexistence, and biases in preservation. Here, we present a high-resolution fossil data set from Raymond Quarry member of the mid-Cambrian Burgess Shale (7,549 specimens, 61 taxa, ∼510 Mya) and formulate a measure of "preservation bias" that aids identification of assemblage subsets to which network analyses can be reliably applied. For these sections, abundance correlation network analyses predicted longitudinally consistent trophic and competitive interactions. Our analyses predicted previously postulated trophic interactions with 83.5% accuracy and demonstrated a shift from specialist interaction-dominated assemblages to ones dominated by generalist and competitive interactions. This approach provides a robust, taphonomically corrected framework to explore and predict in detail the existence and ecological character of putative interactions in fossil data sets.
Collapse
Affiliation(s)
- Anshuman Swain
- Department of Biology, University of Maryland, College Park, MD 20742, USA
| | - Matthew Devereux
- Department of Earth Science, Western University, London, ON, Canada
| | - William F. Fagan
- Department of Biology, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
156
|
Hierarchical clustering with discrete latent variable models and the integrated classification likelihood. ADV DATA ANAL CLASSI 2021. [DOI: 10.1007/s11634-021-00440-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
157
|
Kirkley A, Cantwell GT, Newman MEJ. Belief propagation for networks with loops. SCIENCE ADVANCES 2021; 7:7/17/eabf1211. [PMID: 33893102 DOI: 10.1126/sciadv.abf1211] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 03/09/2021] [Indexed: 06/12/2023]
Abstract
Belief propagation is a widely used message passing method for the solution of probabilistic models on networks such as epidemic models, spin models, and Bayesian graphical models, but it suffers from the serious shortcoming that it works poorly in the common case of networks that contain short loops. Here, we provide a solution to this long-standing problem, deriving a belief propagation method that allows for fast calculation of probability distributions in systems with short loops, potentially with high density, as well as giving expressions for the entropy and partition function, which are notoriously difficult quantities to compute. Using the Ising model as an example, we show that our approach gives excellent results on both real and synthetic networks, improving substantially on standard message passing methods. We also discuss potential applications of our method to a variety of other problems.
Collapse
Affiliation(s)
- Alec Kirkley
- Department of Physics, University of Michigan, Ann Arbor, MI 48109, USA.
| | - George T Cantwell
- Department of Physics, University of Michigan, Ann Arbor, MI 48109, USA
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| | - M E J Newman
- Department of Physics, University of Michigan, Ann Arbor, MI 48109, USA
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
- Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
158
|
Gao LL, Witten D, Bien J. Testing for association in multiview network data. Biometrics 2021; 78:1018-1030. [PMID: 33792914 DOI: 10.1111/biom.13464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 02/09/2021] [Accepted: 03/18/2021] [Indexed: 12/01/2022]
Abstract
In this paper, we consider data consisting of multiple networks, each composed of a different edge set on a common set of nodes. Many models have been proposed for the analysis of such multiview network data under the assumption that the data views are closely related. In this paper, we provide tools for evaluating this assumption. In particular, we ask: given two networks that each follow a stochastic block model, is there an association between the latent community memberships of the nodes in the two networks? To answer this question, we extend the stochastic block model for a single network view to the two-view setting, and develop a new hypothesis test for the null hypothesis that the latent community memberships in the two data views are independent. We apply our test to protein-protein interaction data from the HINT database. We find evidence of a weak association between the latent community memberships of proteins defined with respect to binary interaction data and the latent community memberships of proteins defined with respect to cocomplex association data. We also extend this proposal to the setting of a network with node covariates. The proposed methods extend readily to three or more network/multivariate data views.
Collapse
Affiliation(s)
- Lucy L Gao
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Daniela Witten
- Departments of Statistics and Biostatistics, University of Washington, Seattle, Washington, USA
| | - Jacob Bien
- Department of Data Sciences and Operations, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
159
|
|
160
|
Affiliation(s)
- Jing Lei
- Department of Statistics and Data Science, Carnegie Mellon University
| |
Collapse
|
161
|
Wang S, Arroyo J, Vogelstein JT, Priebe CE. Joint Embedding of Graphs. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1324-1336. [PMID: 31675314 DOI: 10.1109/tpami.2019.2948619] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Feature extraction and dimension reduction for networks is critical in a wide variety of domains. Efficiently and accurately learning features for multiple graphs has important applications in statistical inference on graphs. We propose a method to jointly embed multiple undirected graphs. Given a set of graphs, the joint embedding method identifies a linear subspace spanned by rank one symmetric matrices and projects adjacency matrices of graphs into this subspace. The projection coefficients can be treated as features of the graphs, while the embedding components can represent vertex features. We also propose a random graph model for multiple graphs that generalizes other classical models for graphs. We show through theory and numerical experiments that under the model, the joint embedding method produces estimates of parameters with small errors. Via simulation experiments, we demonstrate that the joint embedding method produces features which lead to state of the art performance in classifying graphs. Applying the joint embedding method to human brain graphs, we find it extracts interpretable features with good prediction accuracy in different tasks.
Collapse
|
162
|
Wang J, Guo J, Liu B. A fast algorithm for integrative community detection of multi‐layer networks. Stat (Int Stat Inst) 2021. [DOI: 10.1002/sta4.348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Jiangzhou Wang
- School of Mathematics and Statistics & KLAS Northeast Normal University Changchun 130024 China
| | - Jianhua Guo
- School of Mathematics and Statistics & KLAS Northeast Normal University Changchun 130024 China
| | - Binghui Liu
- School of Mathematics and Statistics & KLAS Northeast Normal University Changchun 130024 China
| |
Collapse
|
163
|
Kavran AJ, Clauset A. Denoising large-scale biological data using network filters. BMC Bioinformatics 2021; 22:157. [PMID: 33765911 PMCID: PMC7992843 DOI: 10.1186/s12859-021-04075-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 03/15/2021] [Indexed: 11/29/2022] Open
Abstract
Background Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation. Results We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data. Conclusions Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology. Supplementary Information The online version supplementary material available at 10.1186/s12859-021-04075-x.
Collapse
Affiliation(s)
- Andrew J Kavran
- Department of Biochemistry, University of Colorado, Boulder, CO, USA.,BioFrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Aaron Clauset
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA. .,Department of Computer Science, University of Colorado, Boulder, CO, USA. .,Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
164
|
Kuikka V. Modelling community structure and temporal spreading on complex networks. COMPUTATIONAL SOCIAL NETWORKS 2021. [DOI: 10.1186/s40649-021-00094-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractWe present methods for analysing hierarchical and overlapping community structure and spreading phenomena on complex networks. Different models can be developed for describing static connectivity or dynamical processes on a network topology. In this study, classical network connectivity and influence spreading models are used as examples for network models. Analysis of results is based on a probability matrix describing interactions between all pairs of nodes in the network. One popular research area has been detecting communities and their structure in complex networks. The community detection method of this study is based on optimising a quality function calculated from the probability matrix. The same method is proposed for detecting underlying groups of nodes that are building blocks of different sub-communities in the network structure. We present different quantitative measures for comparing and ranking solutions of the community detection algorithm. These measures describe properties of sub-communities: strength of a community, probability of formation and robustness of composition. The main contribution of this study is proposing a common methodology for analysing network structure and dynamics on complex networks. We illustrate the community detection methods with two small network topologies. In the case of network spreading models, time development of spreading in the network can be studied. Two different temporal spreading distributions demonstrate the methods with three real-world social networks of different sizes. The Poisson distribution describes a random response time and the e-mail forwarding distribution describes a process of receiving and forwarding messages.
Collapse
|
165
|
Gallagher RJ, Young JG, Welles BF. A clarified typology of core-periphery structure in networks. SCIENCE ADVANCES 2021; 7:7/12/eabc9800. [PMID: 33731343 PMCID: PMC7968838 DOI: 10.1126/sciadv.abc9800] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 01/29/2021] [Indexed: 06/12/2023]
Abstract
Core-periphery structure, the arrangement of a network into a dense core and sparse periphery, is a versatile descriptor of various social, biological, and technological networks. In practice, different core-periphery algorithms are often applied interchangeably despite the fact that they can yield inconsistent descriptions of core-periphery structure. For example, two of the most widely used algorithms, the k-cores decomposition and the classic two-block model of Borgatti and Everett, extract fundamentally different structures: The latter partitions a network into a binary hub-and-spoke layout, while the former divides it into a layered hierarchy. We introduce a core-periphery typology to clarify these differences, along with Bayesian stochastic block modeling techniques to classify networks in accordance with this typology. Empirically, we find a rich diversity of core-periphery structure among networks. Through a detailed case study, we demonstrate the importance of acknowledging this diversity and situating networks within the core-periphery typology when conducting domain-specific analyses.
Collapse
Affiliation(s)
- Ryan J Gallagher
- Network Science Institute, Northeastern University, Boston, MA 02115, USA.
| | - Jean-Gabriel Young
- Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
| | - Brooke Foucault Welles
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
- Department of Communication Studies, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
166
|
Bouguessa M, Nouri K. BiNeTClus. ACM T INTEL SYST TEC 2021. [DOI: 10.1145/3423067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
We investigate the problem of community detection in bipartite networks that are characterized by the presence of two types of nodes such that connections exist only between nodes of different types. While some approaches have been proposed to identify community structures in bipartite networks, there are a number of problems still to solve. In fact, the majority of the proposed approaches suffer from one or even more of the following limitations: (1) difficulty in detecting communities in the presence of many non-discriminating nodes with atypical connections that hide the community structures, (2) loss of relevant topological information due to the transformation of the bipartite network to standard plain graphs, and (3) manually specifying several input parameters, including the number of communities to be identified. To alleviate these problems, we propose BiNeTClus, a parameter-free community detection algorithm in bipartite networks that operates in two phases. The first phase focuses on identifying an initial grouping of nodes through a transactional data model capable of dealing with the situation that involves networks with many atypical connections, that is, sparsely connected nodes and nodes of one type that massively connect to all other nodes of the second type. The second phase aims to refine the clustering results of the first phase via an optimization strategy of the bipartite modularity to identify the final community structures. Our experiments on both synthetic and real networks illustrate the suitability of the proposed approach.
Collapse
Affiliation(s)
| | - Khaled Nouri
- University of Quebec at Montreal, Montreal, Canada
| |
Collapse
|
167
|
Keller-Ressel M, Nargang S. The hyperbolic geometry of financial networks. Sci Rep 2021; 11:4732. [PMID: 33637827 PMCID: PMC7910495 DOI: 10.1038/s41598-021-83328-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 02/01/2021] [Indexed: 11/09/2022] Open
Abstract
Based on data from the European banking stress tests of 2014, 2016 and the transparency exercise of 2018 we construct networks of European banks and demonstrate that the latent geometry of these financial networks can be well-represented by geometry of negative curvature, i.e., by hyperbolic geometry. Using two different hyperbolic embedding methods, hydra+ and Mercator, this allows us to connect the network structure to the popularity-vs-similarity model of Papdopoulos et al., which is based on the Poincaré disc model of hyperbolic geometry. We show that the latent dimensions of 'popularity' and 'similarity' in this model are strongly associated to systemic importance and to geographic subdivisions of the banking system, independent of the embedding method that is used. In a longitudinal analysis over the time span from 2014 to 2018 we find that the systemic importance of individual banks has remained rather stable, while the peripheral community structure exhibits more (but still moderate) variability. Based on our analysis we argue that embeddings into hyperbolic geometry can be used to monitor structural change in financial networks and are able to distinguish between changes in systemic relevance and other (peripheral) structural changes.
Collapse
Affiliation(s)
| | - Stephanie Nargang
- Institute for Mathematical Stochastics, TU Dresden, 01062, Dresden, Germany
| |
Collapse
|
168
|
Interpretable Variational Graph Autoencoder with Noninformative Prior. FUTURE INTERNET 2021. [DOI: 10.3390/fi13020051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.
Collapse
|
169
|
Noroozi M, Rimal R, Pensky M. Estimation and clustering in popularity adjusted block model. J R Stat Soc Series B Stat Methodol 2021. [DOI: 10.1111/rssb.12410] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Majid Noroozi
- Department of Mathematics and Statistics Washington University in St. Louis St. Louis MO63130USA
| | - Ramchandra Rimal
- Department of Mathematical Sciences Middle Tennessee State University Murfreesboro TN37132USA
| | - Marianna Pensky
- Department of Mathematics University of Central Florida Orlando FL32816USA
| |
Collapse
|
170
|
Affiliation(s)
- Narges Motalebi
- Department of Industrial Engineering, Yazd University, Yazd, Iran
| | - Nathaniel T. Stevens
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| | - Stefan H. Steiner
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
171
|
Brusco M, Doreian P, Steinley D. Deterministic blockmodelling of signed and two-mode networks: A tutorial with software and psychological examples. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2021; 74:34-63. [PMID: 31705539 DOI: 10.1111/bmsp.12192] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 09/21/2019] [Accepted: 09/26/2019] [Indexed: 06/10/2023]
Abstract
Deterministic blockmodelling is a well-established clustering method for both exploratory and confirmatory social network analysis seeking partitions of a set of actors so that actors within each cluster are similar with respect to their patterns of ties to other actors (or, in some cases, other objects when considering two-mode networks). Even though some of the historical foundations for certain types of blockmodelling stem from the psychological literature, applications of deterministic blockmodelling in psychological research are relatively rare. This scarcity is potentially attributable to three factors: a general unfamiliarity with relevant blockmodelling methods and applications; a lack of awareness of the value of partitioning network data for understanding group structures and processes; and the unavailability of such methods on software platforms familiar to most psychological researchers. To tackle the first two items, we provide a tutorial presenting a general framework for blockmodelling and describe two of the most important types of deterministic blockmodelling applications relevant to psychological research: structural balance partitioning and two-mode partitioning based on structural equivalence. To address the third problem, we developed a suite of software programs that are available as both Fortran executable files and compiled Fortran dynamic-link libraries that can be implemented in the R software system. We demonstrate these software programs using networks from the literature.
Collapse
Affiliation(s)
| | - Patrick Doreian
- University of Ljubljana, Ljubljana, Slovenia
- Univerity of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | |
Collapse
|
172
|
Gao C, Ma Z. Minimax Rates in Network Analysis: Graphon Estimation, Community Detection and Hypothesis Testing. Stat Sci 2021. [DOI: 10.1214/19-sts736] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
173
|
Wang D, Yu Y, Rinaldo A. Optimal change point detection and localization in sparse dynamic networks. Ann Stat 2021. [DOI: 10.1214/20-aos1953] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
174
|
|
175
|
Athreya A, Tang M, Park Y, Priebe CE. On Estimation and Inference in Latent Structure Random Graphs. Stat Sci 2021. [DOI: 10.1214/20-sts787] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
176
|
Ting CM, Samdin SB, Tang M, Ombao H. Detecting Dynamic Community Structure in Functional Brain Networks Across Individuals: A Multilayer Approach. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:468-480. [PMID: 33044929 DOI: 10.1109/tmi.2020.3030047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
OBJECTIVE We present a unified statistical framework for characterizing community structure of brain functional networks that captures variation across individuals and evolution over time. Existing methods for community detection focus only on single-subject analysis of dynamic networks; while recent extensions to multiple-subjects analysis are limited to static networks. METHOD To overcome these limitations, we propose a multi-subject, Markov-switching stochastic block model (MSS-SBM) to identify state-related changes in brain community organization over a group of individuals. We first formulate a multilayer extension of SBM to describe the time-dependent, multi-subject brain networks. We develop a novel procedure for fitting the multilayer SBM that builds on multislice modularity maximization which can uncover a common community partition of all layers (subjects) simultaneously. By augmenting with a dynamic Markov switching process, our proposed method is able to capture a set of distinct, recurring temporal states with respect to inter-community interactions over subjects and the change points between them. RESULTS Simulation shows accurate community recovery and tracking of dynamic community regimes over multilayer networks by the MSS-SBM. Application to task fMRI reveals meaningful non-assortative brain community motifs, e.g., core-periphery structure at the group level, that are associated with language comprehension and motor functions suggesting their putative role in complex information integration. Our approach detected dynamic reconfiguration of modular connectivity elicited by varying task demands and identified unique profiles of intra and inter-community connectivity across different task conditions. CONCLUSION The proposed multilayer network representation provides a principled way of detecting synchronous, dynamic modularity in brain networks across subjects.
Collapse
|
177
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
178
|
Pamfil AR, Howison SD, Porter MA. Inference of edge correlations in multilayer networks. Phys Rev E 2021; 102:062307. [PMID: 33466038 DOI: 10.1103/physreve.102.062307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Accepted: 09/14/2020] [Indexed: 11/07/2022]
Abstract
Many recent developments in network analysis have focused on multilayer networks, which one can use to encode time-dependent interactions, multiple types of interactions, and other complications that arise in complex systems. Like their monolayer counterparts, multilayer networks in applications often have mesoscale features, such as community structure. A prominent approach for inferring such structures is the employment of multilayer stochastic block models (SBMs). A common (but potentially inadequate) assumption of these models is the sampling of edges in different layers independently, conditioned on the community labels of the nodes. In this paper, we relax this assumption of independence by incorporating edge correlations into an SBM-like model. We derive maximum-likelihood estimates of the key parameters of our model, and we propose a measure of layer correlation that reflects the similarity between the connectivity patterns in different layers. Finally, we explain how to use correlated models for edge "prediction" (i.e., inference) in multilayer networks. By incorporating edge correlations, we find that prediction accuracy improves both in synthetic networks and in a temporal network of shoppers who are connected to previously purchased grocery products.
Collapse
Affiliation(s)
- A Roxana Pamfil
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Sam D Howison
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Mason A Porter
- Department of Mathematics, University of California, Los Angeles, Los Angeles, California 90095, USA and Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| |
Collapse
|
179
|
Abstract
The Erdős-Rényi (ER) random graph G(n, p) analytically characterizes the behaviors in complex networks. However, attempts to fit real-world observations need more sophisticated structures (e.g., multilayer networks), rules (e.g., Achlioptas processes), and projections onto geometric, social, or geographic spaces. The p-adic number system offers a natural representation of hierarchical organization of complex networks. The p-adic random graph interprets n as the cardinality of a set of p-adic numbers. Constructing a vast space of hierarchical structures is equivalent for combining number sequences. Although the giant component is vital in dynamic evolution of networks, the structure of multiple big components is also essential. Fitting the sizes of the few largest components to empirical data was rarely demonstrated. The p-adic ultrametric enables the ER model to simulate multiple big components from the observations of genetic interaction networks, social networks, and epidemics. Community structures lead to multimodal distributions of the big component sizes in networks, which have important implications in intervention of spreading processes.
Collapse
Affiliation(s)
- Hao Hua
- School of Architecture, Southeast University, 2 Sipailou, Nanjing, 210096, China.
- Key Laboratory of Urban and Architectural Heritage Conservation (Southeast University), Ministry of Education, Nanjing, China.
| | | |
Collapse
|
180
|
Okuda M, Satoh S, Sato Y, Kidawara Y. Community Detection Using Restrained Random-Walk Similarity. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:89-103. [PMID: 31265385 DOI: 10.1109/tpami.2019.2926033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we propose a restrained random-walk similarity method for detecting the community structures of graphs. The basic premise of our method is that the starting vertices of finite-length random walks are judged to be in the same community if the walkers pass similar sets of vertices. This idea is based on our consideration that a random walker tends to move in the community including the walker's starting vertex for some time after starting the walk. Therefore, the sets of vertices passed by random walkers starting from vertices in the same community must be similar. The idea is reinforced with two conditions. First, we exclude abnormal random walks. Random walks that depart from each vertex are executed many times, and vertices that are rarely passed by the walkers are excluded from the set of vertices that the walkers may pass. Second, we forcibly restrain random walks to an appropriate length. In our method, a random walk is terminated when the walker repeatedly visits vertices that they have already passed. Experiments on real-world networks demonstrate that our method outperforms previous techniques in terms of accuracy.
Collapse
|
181
|
Allen C, Kuhn BN, Cannella N, Crow AD, Roberts AT, Lunerti V, Ubaldi M, Hardiman G, Solberg Woods LC, Ciccocioppo R, Kalivas PW, Chung D. Network-Based Discovery of Opioid Use Vulnerability in Rats Using the Bayesian Stochastic Block Model. Front Psychiatry 2021; 12:745468. [PMID: 34975564 PMCID: PMC8718996 DOI: 10.3389/fpsyt.2021.745468] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/29/2021] [Indexed: 12/11/2022] Open
Abstract
Opioid use disorder is a psychological condition that affects over 200,000 people per year in the U.S., causing the Centers for Disease Control and Prevention to label the crisis as a rapidly spreading public health epidemic. The behavioral relationship between opioid exposure and development of opioid use disorder (OUD) varies greatly between individuals, implying existence of sup-populations with varying degrees of opioid vulnerability. However, effective pre-clinical identification of these sub-populations remains challenging due to the complex multivariate measurements employed in animal models of OUD. In this study, we propose a novel non-linear network-based data analysis workflow that employs seven behavioral traits to identify opioid use sub-populations and assesses contributions of behavioral variables to opioid vulnerability and resiliency. Through this analysis workflow we determined how behavioral variables across heroin taking, refraining and seeking interact with one another to identify potentially heroin resilient and vulnerable behavioral sub-populations. Data were collected from over 400 heterogeneous stock rats in two geographically distinct locations. Rats underwent heroin self-administration training, followed by a progressive ratio and heroin-primed reinstatement test. Next, rats underwent extinction training and a cue-induced reinstatement test. To enter the analysis workflow, we integrated data from different cohorts of rats and removed possible batch effects. We then constructed a rat-rat similarity network based on their behavioral patterns and implemented community detection on this similarity network using a Bayesian degree-corrected stochastic block model to uncover sub-populations of rats with differing levels of opioid vulnerability. We identified three statistically distinct clusters corresponding to distinct behavioral sub-populations, vulnerable, resilient and intermediate for heroin use, refraining and seeking. We implement this analysis workflow as an open source R package, named mlsbm.
Collapse
Affiliation(s)
- Carter Allen
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Brittany N Kuhn
- Department of Neuroscience, Medical University of South Carolina, Charleston, SC, United States
| | | | - Ayteria D Crow
- Department of Neuroscience, Medical University of South Carolina, Charleston, SC, United States
| | - Analyse T Roberts
- Department of Neuroscience, Medical University of South Carolina, Charleston, SC, United States
| | | | - Massimo Ubaldi
- School of Pharmacy, University of Camerino, Camerino, Italy
| | - Gary Hardiman
- School of Biological Sciences, Queen's University Belfast, Belfast, United Kingdom
| | - Leah C Solberg Woods
- Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | | | - Peter W Kalivas
- Department of Neuroscience, Medical University of South Carolina, Charleston, SC, United States
| | - Dongjun Chung
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
182
|
Guleva V, Shikov E, Bochenina K, Kovalchuk S, Alodjants A, Boukhanovsky A. Emerging Complexity in Distributed Intelligent Systems. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1437. [PMID: 33352754 PMCID: PMC7766450 DOI: 10.3390/e22121437] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 12/14/2020] [Accepted: 12/16/2020] [Indexed: 12/31/2022]
Abstract
Distributed intelligent systems (DIS) appear where natural intelligence agents (humans) and artificial intelligence agents (algorithms) interact, exchanging data and decisions and learning how to evolve toward a better quality of solutions. The networked dynamics of distributed natural and artificial intelligence agents leads to emerging complexity different from the ones observed before. In this study, we review and systematize different approaches in the distributed intelligence field, including the quantum domain. A definition and mathematical model of DIS (as a new class of systems) and its components, including a general model of DIS dynamics, are introduced. In particular, the suggested new model of DIS contains both natural (humans) and artificial (computer programs, chatbots, etc.) intelligence agents, which take into account their interactions and communications. We present the case study of domain-oriented DIS based on different agents' classes and show that DIS dynamics shows complexity effects observed in other well-studied complex systems. We examine our model by means of the platform of personal self-adaptive educational assistants (avatars), especially designed in our University. Avatars interact with each other and with their owners. Our experiment allows finding an answer to the vital question: How quickly will DIS adapt to owners' preferences so that they are satisfied? We introduce and examine in detail learning time as a function of network topology. We have shown that DIS has an intrinsic source of complexity that needs to be addressed while developing predictable and trustworthy systems of natural and artificial intelligence agents. Remarkably, our research and findings promoted the improvement of the educational process at our university in the presence of COVID-19 pandemic conditions.
Collapse
Affiliation(s)
| | | | | | - Sergey Kovalchuk
- National Center for Cognitive Research, ITMO University, 197101 Saint Petersburg, Russia; (V.G.); (E.S.); (K.B.); (A.A.); (A.B.)
| | | | | |
Collapse
|
183
|
Bar-Hen A, Barbillon P, Donnet S. Block models for generalized multipartite networks: Applications in ecology and ethnobiology. STAT MODEL 2020. [DOI: 10.1177/1471082x20963254] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Generalized multipartite networks consist in the joint observation of several networks implying some common pre-specified groups of individuals. Such complex networks arise commonly in social sciences, biology, ecology, etc. We propose a flexible probabilistic model named Multipartite Block Model (MBM) able to unravel the topology of multipartite networks by identifying clusters (blocks) of nodes sharing the same patterns of connectivity across the collection of networks they are involved in. The model parameters are estimated through a variational version of the Expectation–Maximization algorithm. The numbers of blocks are chosen using an Integrated Completed Likelihood criterion specifically designed for our model. A simulation study illustrates the robustness of the inference strategy. Finally, two datasets respectively issued from ecology and ethnobiology are analyzed with the MBM in order to illustrate its flexibility and its relevance for the analysis of real datasets. The inference procedure is implemented in an R -package GREMLIN , available on Github ( https://github.com/Demiperimetre/GREMLINhttps://github.com/Demiperimetre/GREMLIN ).
Collapse
Affiliation(s)
| | - Pierre Barbillon
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA-Paris, 75005, Paris, France
| | - Sophie Donnet
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA-Paris, 75005, Paris, France
| |
Collapse
|
184
|
Correspondence analysis-based network clustering and importance of degenerate solutions unification of spectral clustering and modularity maximization. SOCIAL NETWORK ANALYSIS AND MINING 2020. [DOI: 10.1007/s13278-020-00686-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
185
|
Ludkin M. Inference for a generalised stochastic block model with unknown number of blocks and non-conjugate edge models. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2020.107051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
186
|
Li T, Lei L, Bhattacharyya S, Van den Berge K, Sarkar P, Bickel PJ, Levina E. Hierarchical Community Detection by Recursive Partitioning. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1833888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Tianxi Li
- Department of Statistics, University of Virginia, Charllottesville, VA
| | - Lihua Lei
- Department of Statistics, Stanford University, Stanford, CA
| | | | - Koen Van den Berge
- Department of Statistics, University of California, Berkeley, Berkeley, CA
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
| | - Purnamrita Sarkar
- Department of Statistics and Data Sciences, University of Texas at Austin, Austin, TX
| | - Peter J. Bickel
- Department of Statistics, University of California, Berkeley, Berkeley, CA
| | | |
Collapse
|
187
|
Schawe H, Hartmann AK. Large deviations of connected components in the stochastic block model. Phys Rev E 2020; 102:052108. [PMID: 33327148 DOI: 10.1103/physreve.102.052108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 10/19/2020] [Indexed: 06/12/2023]
Abstract
We study the stochastic block model, which is often used to model community structures and study community-detection algorithms. We consider the case of two blocks in regard to its largest connected component and largest biconnected component, respectively. We are especially interested in the distributions of their sizes including the tails down to probabilities smaller than 10^{-800}. For this purpose we use sophisticated Markov chain Monte Carlo simulations to sample graphs from the stochastic block model ensemble. We use these data to study the large-deviation rate function and conjecture that the large-deviation principle holds. Further we compare the distribution to the well-known Erdős-Rényi ensemble, where we notice subtle differences at and above the percolation threshold.
Collapse
Affiliation(s)
- Hendrik Schawe
- Laboratoire de Physique Théorique et Modélisation, UMR-8089 CNRS, CY Cergy Paris Université, 95000 Cergy, France
- Institut für Physik, Universität Oldenburg, 26111 Oldenburg, Germany
| | | |
Collapse
|
188
|
Sussman DL, Park Y, Priebe CE, Lyzinski V. Matched Filters for Noisy Induced Subgraph Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2887-2900. [PMID: 31059426 PMCID: PMC7598933 DOI: 10.1109/tpami.2019.2914651] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The problem of finding the vertex correspondence between two noisy graphs with different number of vertices where the smaller graph is still large has many applications in social networks, neuroscience, and computer vision. We propose a solution to this problem via a graph matching matched filter: centering and padding the smaller adjacency matrix and applying graph matching methods to align it to the larger network. The centering and padding schemes can be incorporated into any algorithm that matches using adjacency matrices. Under a statistical model for correlated pairs of graphs, which yields a noisy copy of the small graph within the larger graph, the resulting optimization problem can be guaranteed to recover the true vertex correspondence between the networks. However, there are currently no efficient algorithms for solving this problem. To illustrate the possibilities and challenges of such problems, we use an algorithm that can exploit a partially known correspondence and show via varied simulations and applications to Drosophila and human connectomes that this approach can achieve good performance.
Collapse
|
189
|
Peng H, Nematzadeh A, Romero DM, Ferrara E. Network modularity controls the speed of information diffusion. Phys Rev E 2020; 102:052316. [PMID: 33327110 DOI: 10.1103/physreve.102.052316] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 11/08/2020] [Indexed: 11/07/2022]
Abstract
The rapid diffusion of information and the adoption of social behaviors are of critical importance in situations as diverse as collective actions, pandemic prevention, or advertising and marketing. Although the dynamics of large cascades have been extensively studied in various contexts, few have systematically examined the impact of network topology on the efficiency of information diffusion. Here, by employing the linear threshold model on networks with communities, we demonstrate that a prominent network feature-the modular structure-strongly affects the speed of information diffusion in complex contagion. Our simulations show that there always exists an optimal network modularity for the most efficient spreading process. Beyond this critical value, either a stronger or a weaker modular structure actually hinders the diffusion speed. These results are confirmed by an analytical approximation. We further demonstrate that the optimal modularity varies with both the seed size and the target cascade size and is ultimately dependent on the network under investigation. We underscore the importance of our findings in applications from marketing to epidemiology, from neuroscience to engineering, where the understanding of the structural design of complex systems focuses on the efficiency of information propagation.
Collapse
Affiliation(s)
- Hao Peng
- School of Information, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | | - Daniel M Romero
- School of Information, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Emilio Ferrara
- Information Sciences Institute, University of Southern California, Los Angeles, California 90292, USA
| |
Collapse
|
190
|
Morel-Balbi S, Peixoto TP. Null models for multioptimized large-scale network structures. Phys Rev E 2020; 102:032306. [PMID: 33075868 DOI: 10.1103/physreve.102.032306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 08/31/2020] [Indexed: 11/07/2022]
Abstract
We study the emerging large-scale structures in networks subject to selective pressures that simultaneously drive toward higher modularity and robustness against random failures. We construct maximum-entropy null models that isolate the effects of the joint optimization on the network structure from any kind of evolutionary dynamics. Our analysis reveals a rich phase diagram of optimized structures, composed of many combinations of modular, core-periphery, and bipartite patterns. Furthermore, we observe parameter regions where the simultaneous optimization can be either synergistic or antagonistic, with the improvement of one criterion directly aiding or hindering the other, respectively. Our results show how interactions between different selective pressures can be pivotal in determining the emerging network structure, and that these interactions can be captured by simple network models.
Collapse
Affiliation(s)
- Sebastian Morel-Balbi
- Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom
| | - Tiago P Peixoto
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria; ISI Foundation, 10126 Torino, Italy; and Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom
| |
Collapse
|
191
|
Yen TC, Larremore DB. Community detection in bipartite networks with stochastic block models. Phys Rev E 2020; 102:032309. [PMID: 33075933 DOI: 10.1103/physreve.102.032309] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/23/2020] [Indexed: 11/07/2022]
Abstract
In bipartite networks, community structures are restricted to being disassortative, in that nodes of one type are grouped according to common patterns of connection with nodes of the other type. This makes the stochastic block model (SBM), a highly flexible generative model for networks with block structure, an intuitive choice for bipartite community detection. However, typical formulations of the SBM do not make use of the special structure of bipartite networks. Here we introduce a Bayesian nonparametric formulation of the SBM and a corresponding algorithm to efficiently find communities in bipartite networks which parsimoniously chooses the number of communities. The biSBM improves community detection results over general SBMs when data are noisy, improves the model resolution limit by a factor of sqrt[2], and expands our understanding of the complicated optimization landscape associated with community detection tasks. A direct comparison of certain terms of the prior distributions in the biSBM and a related high-resolution hierarchical SBM also reveals a counterintuitive regime of community detection problems, populated by smaller and sparser networks, where nonhierarchical models outperform their more flexible counterpart.
Collapse
Affiliation(s)
- Tzu-Chi Yen
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Daniel B Larremore
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA.,BioFrontiers Institute, University of Colorado, Boulder, Colorado 80303, USA
| |
Collapse
|
192
|
|
193
|
Hu J, Zhang J, Qin H, Yan T, Zhu J. Using Maximum Entry-Wise Deviation to Test the Goodness of Fit for Stochastic Block Models. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1722676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Jianwei Hu
- Department of Statistics, Central China Normal University , Wuhan , China
| | - Jingfei Zhang
- Department of Management Science, University of Miami , Coral Gables , FL
| | - Hong Qin
- Department of Statistics, Central China Normal University , Wuhan , China
- Department of Statistics, Zhongnan University of Economics and Law , Wuhan , China
| | - Ting Yan
- Department of Statistics, Central China Normal University , Wuhan , China
| | - Ji Zhu
- Department of Statistics, University of Michigan , Ann Arbor , MI
| |
Collapse
|
194
|
Malvestio I, Cardillo A, Masuda N. Interplay between [Formula: see text]-core and community structure in complex networks. Sci Rep 2020; 10:14702. [PMID: 32895432 PMCID: PMC7477593 DOI: 10.1038/s41598-020-71426-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Accepted: 08/07/2020] [Indexed: 11/12/2022] Open
Abstract
The organisation of a network in a maximal set of nodes having at least k neighbours within the set, known as [Formula: see text]-core decomposition, has been used for studying various phenomena. It has been shown that nodes in the innermost [Formula: see text]-shells play a crucial role in contagion processes, emergence of consensus, and resilience of the system. It is known that the [Formula: see text]-core decomposition of many empirical networks cannot be explained by the degree of each node alone, or equivalently, random graph models that preserve the degree of each node (i.e., configuration model). Here we study the [Formula: see text]-core decomposition of some empirical networks as well as that of some randomised counterparts, and examine the extent to which the [Formula: see text]-shell structure of the networks can be accounted for by the community structure. We find that preserving the community structure in the randomisation process is crucial for generating networks whose [Formula: see text]-core decomposition is close to the empirical one. We also highlight the existence, in some networks, of a concentration of the nodes in the innermost [Formula: see text]-shells into a small number of communities.
Collapse
Affiliation(s)
- Irene Malvestio
- Department of Engineering Mathematics, University of Bristol, Bristol, BS8 1UB UK
| | - Alessio Cardillo
- Department of Engineering Mathematics, University of Bristol, Bristol, BS8 1UB UK
- Department of Computer Science and Mathematics, University Rovira i Virgili, 43007 Tarragona, Spain
- GOTHAM Lab – Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, 50018 Zaragoza, Spain
| | - Naoki Masuda
- Department of Engineering Mathematics, University of Bristol, Bristol, BS8 1UB UK
- Department of Mathematics, University at Buffalo, Buffalo, NY 14260-2900 United States
- Computational and Data-Enabled Science and Engineering Program, University at Buffalo, State University of New York, Buffalo, NY 14260-5030 USA
| |
Collapse
|
195
|
Elliott A, Chiu A, Bazzi M, Reinert G, Cucuringu M. Core-periphery structure in directed networks. Proc Math Phys Eng Sci 2020; 476:20190783. [PMID: 33061788 PMCID: PMC7544362 DOI: 10.1098/rspa.2019.0783] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 06/25/2020] [Indexed: 11/17/2022] Open
Abstract
Empirical networks often exhibit different meso-scale structures, such as community and core-periphery structures. Core-periphery structure typically consists of a well-connected core and a periphery that is well connected to the core but sparsely connected internally. Most core-periphery studies focus on undirected networks. We propose a generalization of core-periphery structure to directed networks. Our approach yields a family of core-periphery block model formulations in which, contrary to many existing approaches, core and periphery sets are edge-direction dependent. We focus on a particular structure consisting of two core sets and two periphery sets, which we motivate empirically. We propose two measures to assess the statistical significance and quality of our novel structure in empirical data, where one often has no ground truth. To detect core-periphery structure in directed networks, we propose three methods adapted from two approaches in the literature, each with a different trade-off between computational complexity and accuracy. We assess the methods on benchmark networks where our methods match or outperform standard methods from the literature, with a likelihood approach achieving the highest accuracy. Applying our methods to three empirical networks-faculty hiring, a world trade dataset and political blogs-illustrates that our proposed structure provides novel insights in empirical networks.
Collapse
Affiliation(s)
- Andrew Elliott
- The Alan Turing Institute, London, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | - Angus Chiu
- Department of Statistics, University of Oxford, Oxford, UK
| | - Marya Bazzi
- The Alan Turing Institute, London, UK
- Mathematical Institute, University of Oxford, Oxford, UK
- Mathematics Institute, University of Warwick, Coventry, UK
| | - Gesine Reinert
- The Alan Turing Institute, London, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | - Mihai Cucuringu
- The Alan Turing Institute, London, UK
- Department of Statistics, University of Oxford, Oxford, UK
- Mathematical Institute, University of Oxford, Oxford, UK
| |
Collapse
|
196
|
Peixoto TP. Merge-split Markov chain Monte Carlo for community detection. Phys Rev E 2020; 102:012305. [PMID: 32794904 DOI: 10.1103/physreve.102.012305] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 06/19/2020] [Indexed: 11/07/2022]
Abstract
We present a Markov chain Monte Carlo scheme based on merges and splits of groups that is capable of efficiently sampling from the posterior distribution of network partitions, defined according to the stochastic block model (SBM). We demonstrate how schemes based on the move of single nodes between groups systematically fail at correctly sampling from the posterior distribution even on small networks, and how our merge-split approach behaves significantly better, and improves the mixing time of the Markov chain by several orders of magnitude in typical cases. We also show how the scheme can be straightforwardly extended to nested versions of the SBM, yielding asymptotically exact samples of hierarchical network partitions.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Department of Network and Data Science, Central European University, H-1051 Budapest, Hungary; ISI Foundation, Via Chisola 5, 10126 Torino, Italy; and Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom
| |
Collapse
|
197
|
Publishing Community-Preserving Attributed Social Graphs with a Differential Privacy Guarantee. PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES 2020. [DOI: 10.2478/popets-2020-0066] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Abstract
We present a novel method for publishing differentially private synthetic attributed graphs. Our method allows, for the first time, to publish synthetic graphs simultaneously preserving structural properties, user attributes and the community structure of the original graph. Our proposal relies on CAGM, a new community-preserving generative model for attributed graphs. We equip CAGM with efficient methods for attributed graph sampling and parameter estimation. For the latter, we introduce differentially private computation methods, which allow us to release communitypreserving synthetic attributed social graphs with a strong formal privacy guarantee. Through comprehensive experiments, we show that our new model outperforms its most relevant counterparts in synthesising differentially private attributed social graphs that preserve the community structure of the original graph, as well as degree sequences and clustering coefficients.
Collapse
|
198
|
Functional Connectome Analyses Reveal the Human Olfactory Network Organization. eNeuro 2020; 7:ENEURO.0551-19.2020. [PMID: 32471848 PMCID: PMC7418535 DOI: 10.1523/eneuro.0551-19.2020] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 05/18/2020] [Accepted: 05/19/2020] [Indexed: 01/24/2023] Open
Abstract
The olfactory system is uniquely heterogeneous, performing multifaceted functions (beyond basic sensory processing) across diverse, widely distributed neural substrates. While knowledge of human olfaction continues to grow, it remains unclear how the olfactory network is organized to serve this unique set of functions. The olfactory system is uniquely heterogeneous, performing multifaceted functions (beyond basic sensory processing) across diverse, widely distributed neural substrates. While knowledge of human olfaction continues to grow, it remains unclear how the olfactory network is organized to serve this unique set of functions. Leveraging a large and high-quality resting-state functional magnetic resonance imaging (rs-fMRI) dataset of nearly 900 participants from the Human Connectome Project (HCP), we identified a human olfactory network encompassing cortical and subcortical regions across the temporal and frontal lobes. Highlighting its reliability and generalizability, the connectivity matrix of this olfactory network mapped closely onto that extracted from an independent rs-fMRI dataset. Graph theoretical analysis further explicated the organizational principles of the network. The olfactory network exhibits a modular composition of three (i.e., the sensory, limbic, and frontal) subnetworks and demonstrates strong small-world properties, high in both global integration and local segregation (i.e., circuit specialization). This network organization thus ensures the segregation of local circuits, which are nonetheless integrated via connecting hubs [i.e., amygdala (AMY) and anterior insula (INSa)], thereby enabling the specialized, yet integrative, functions of olfaction. In particular, the degree of local segregation positively predicted olfactory discrimination performance in the independent sample, which we infer as a functional advantage of the network organization. In sum, an olfactory functional network has been identified through the large HCP dataset, affording a representative template of the human olfactory functional neuroanatomy. Importantly, the topological analysis of the olfactory network provides network-level insights into the remarkable functional specialization and spatial segregation of the olfactory system.
Collapse
|
199
|
Ghosh P, Pati D, Bhattacharya A. Posterior Contraction Rates for Stochastic Block Models. SANKHYA A 2020. [DOI: 10.1007/s13171-019-00180-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
200
|
Loyal JD, Chen Y. Statistical Network Analysis: A Review with Applications to the Coronavirus Disease 2019 Pandemic. Int Stat Rev 2020. [DOI: 10.1111/insr.12398] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Joshua Daniel Loyal
- Department of Statistics University of Illinois at Urbana‐Champaign Champaign 61820 IL USA
| | - Yuguo Chen
- Department of Statistics University of Illinois at Urbana‐Champaign Champaign 61820 IL USA
| |
Collapse
|