1
|
Peel L, Schaub MT. Detectability of hierarchical communities in networks. Phys Rev E 2024; 110:034306. [PMID: 39425354 DOI: 10.1103/physreve.110.034306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 08/15/2024] [Indexed: 10/21/2024]
Abstract
We study the problem of recovering a planted hierarchy of partitions in a network. The detectability of a single planted partition has previously been analyzed in detail and a phase transition has been identified below which the partition cannot be detected. Here we show that, in the hierarchical setting, there exist additional phases in which the presence of multiple consistent partitions can either help or hinder detection. Accordingly, the detectability limit for nonhierarchical partitions typically provides insufficient information about the detectability of the complete hierarchical structure, as we highlight with several constructive examples.
Collapse
|
2
|
Kawamoto T. Single-trajectory map equation. Sci Rep 2023; 13:6597. [PMID: 37087492 PMCID: PMC10122677 DOI: 10.1038/s41598-023-33880-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 04/20/2023] [Indexed: 04/24/2023] Open
Abstract
Community detection, the process of identifying module structures in complex systems represented on networks, is an effective tool in various fields of science. The map equation, which is an information-theoretic framework based on the random walk on a network, is a particularly popular community detection method. Despite its outstanding performance in many applications, the inner workings of the map equation have not been thoroughly studied. Herein, we revisit the original formulation of the map equation and address the existence of its "raw form," which we refer to as the single-trajectory map equation. This raw form sheds light on many details behind the principle of the map equation that are hidden in the steady-state limit of the random walk. Most importantly, the single-trajectory map equation provides a more balanced community structure, naturally reducing the tendency of the overfitting phenomenon in the map equation.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, 135-0064, Japan.
| |
Collapse
|
3
|
Qing H. Estimating Mixed Memberships in Directed Networks by Spectral Clustering. ENTROPY (BASEL, SWITZERLAND) 2023; 25:345. [PMID: 36832711 PMCID: PMC9955123 DOI: 10.3390/e25020345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/04/2023] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
Community detection is an important and powerful way to understand the latent structure of complex networks in social network analysis. This paper considers the problem of estimating community memberships of nodes in a directed network, where a node may belong to multiple communities. For such a directed network, existing models either assume that each node belongs solely to one community or ignore variation in node degree. Here, a directed degree corrected mixed membership (DiDCMM) model is proposed by considering degree heterogeneity. An efficient spectral clustering algorithm with a theoretical guarantee of consistent estimation is designed to fit DiDCMM. We apply our algorithm to a small scale of computer-generated directed networks and several real-world directed networks.
Collapse
Affiliation(s)
- Huan Qing
- School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
4
|
Spectral norm bounds for block Markov chain random matrices. Stoch Process Their Appl 2023. [DOI: 10.1016/j.spa.2022.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
5
|
Chen S, Huang G, Piccioli G, Zdeborová L. Planted XY model: Thermodynamics and inference. Phys Rev E 2022; 106:054115. [PMID: 36559493 DOI: 10.1103/physreve.106.054115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 10/14/2022] [Indexed: 06/17/2023]
Abstract
In this paper we study a fully connected planted spin glass named the planted XY model. Motivation for studying this system comes both from the spin glass field and the one of statistical inference where it models the angular synchronization problem. We derive the replica symmetric (RS) phase diagram in the temperature, ferromagnetic bias plane using the approximate message passing (AMP) algorithm and its state evolution (SE). While the RS predictions are exact on the Nishimori line (i.e., when the temperature is matched to the ferromagnetic bias), they become inaccurate when the parameters are mismatched, giving rise to a spin glass phase where AMP is not able to converge. To overcome the defects of the RS approximation we carry out a one-step replica symmetry-breaking (1RSB) analysis based on the approximate survey propagation (ASP) algorithm. Exploiting the state evolution of ASP, we count the number of metastable states in the measure, derive the 1RSB free entropy and find the behavior of the Parisi parameter throughout the spin glass phase.
Collapse
Affiliation(s)
- Siyu Chen
- École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| | - Guanhao Huang
- École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| | - Giovanni Piccioli
- École Polytechnique Fédérale de Lausanne (EPFL), Statistical Physics of Computation Laboratory, CH-1015 Lausanne, Switzerland
| | - Lenka Zdeborová
- École Polytechnique Fédérale de Lausanne (EPFL), Statistical Physics of Computation Laboratory, CH-1015 Lausanne, Switzerland
| |
Collapse
|
6
|
Li HJ, Song S, Tan W, Huang Z, Li X, Xu W, Cao J. Characterizing the fuzzy community structure in link graph via the likelihood optimization. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
7
|
Huang S, Weng H, Feng Y. Spectral clustering via adaptive layer aggregation for multi-layer networks*. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2134874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Sihan Huang
- Department of Statistics, Columbia University
| | - Haolei Weng
- Department of Statistics and Probability, Michigan State University
| | - Yang Feng
- Department of Biostatistics, New York University
| |
Collapse
|
8
|
Chen S, Liu S, Ma Z. Global and individualized community detection in inhomogeneous multilayer networks. Ann Stat 2022. [DOI: 10.1214/22-aos2202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Shuxiao Chen
- Department of Statistics, University or Pennsylvania
| | - Sifan Liu
- Department of Statistics, Stanford University
| | - Zongming Ma
- Department of Statistics, University or Pennsylvania
| |
Collapse
|
9
|
Banerjee D, Ma Z. Optimal signal detection in some spiked random matrix models: Likelihood ratio tests and linear spectral statistics. Ann Stat 2022. [DOI: 10.1214/21-aos2150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Zongming Ma
- Department of Statistics, The Wharton School, University of Pennsylvania
| |
Collapse
|
10
|
A likelihood-ratio type test for stochastic block models with bounded degrees. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2021.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Bakkelund D. Machine part data with part-of relations and part dissimilarities for planted partition generation. Data Brief 2022; 42:108065. [PMID: 35372649 PMCID: PMC8971586 DOI: 10.1016/j.dib.2022.108065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 03/14/2022] [Indexed: 11/15/2022] Open
|
12
|
Affiliation(s)
- Mingao Yuan
- Department of Statistics, North Dakota State University
| | - Ruiqi Liu
- Department of Mathematical Sciences, Texas Tech University
| | - Yang Feng
- Department of Biostatistics, New York University
| | - Zuofeng Shang
- Department of Mathematical Sciences, New Jersey Institute of Technology
| |
Collapse
|
13
|
Affiliation(s)
- Can M. Le
- Department of Statistics, University of California, Davis
| | | |
Collapse
|
14
|
Jin J, Ke ZT, Luo S. Optimal adaptivity of signed-polygon statistics for network testing. Ann Stat 2021. [DOI: 10.1214/21-aos2089] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Jiashun Jin
- Department of Statistics & Data Science, Carnegie Mellon University
| | | | - Shengming Luo
- Department of Statistics & Data Science, Carnegie Mellon University
| |
Collapse
|
15
|
The number of solutions for random regular NAE-SAT. Probab Theory Relat Fields 2021. [DOI: 10.1007/s00440-021-01029-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
16
|
Yuan M, Yang F, Shang Z. Hypothesis testing in sparse weighted stochastic block model. Stat Pap (Berl) 2021. [DOI: 10.1007/s00362-021-01269-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Bogerd K, Castro RM, van der Hofstad R, Verzelen N. Detecting a planted community in an inhomogeneous random graph. BERNOULLI 2021. [DOI: 10.3150/20-bej1269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Kay Bogerd
- Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Rui M. Castro
- Eindhoven University of Technology, Eindhoven, The Netherlands
| | | | | |
Collapse
|
18
|
Cai C, Li G, Chi Y, Poor HV, Chen Y. Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees. Ann Stat 2021. [DOI: 10.1214/20-aos1986] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Changxiao Cai
- Department of Electrical Engineering, Princeton University
| | - Gen Li
- Department of Electronic Engineering, Tsinghua University
| | - Yuejie Chi
- Department of Electrical and Computer Engineering, Carnegie Mellon University
| | | | - Yuxin Chen
- Department of Electrical Engineering, Princeton University
| |
Collapse
|
19
|
Gao C, Ma Z. Minimax Rates in Network Analysis: Graphon Estimation, Community Detection and Hypothesis Testing. Stat Sci 2021. [DOI: 10.1214/19-sts736] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
|
21
|
|
22
|
Yen TC, Larremore DB. Community detection in bipartite networks with stochastic block models. Phys Rev E 2020; 102:032309. [PMID: 33075933 DOI: 10.1103/physreve.102.032309] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/23/2020] [Indexed: 11/07/2022]
Abstract
In bipartite networks, community structures are restricted to being disassortative, in that nodes of one type are grouped according to common patterns of connection with nodes of the other type. This makes the stochastic block model (SBM), a highly flexible generative model for networks with block structure, an intuitive choice for bipartite community detection. However, typical formulations of the SBM do not make use of the special structure of bipartite networks. Here we introduce a Bayesian nonparametric formulation of the SBM and a corresponding algorithm to efficiently find communities in bipartite networks which parsimoniously chooses the number of communities. The biSBM improves community detection results over general SBMs when data are noisy, improves the model resolution limit by a factor of sqrt[2], and expands our understanding of the complicated optimization landscape associated with community detection tasks. A direct comparison of certain terms of the prior distributions in the biSBM and a related high-resolution hierarchical SBM also reveals a counterintuitive regime of community detection problems, populated by smaller and sparser networks, where nonhierarchical models outperform their more flexible counterpart.
Collapse
Affiliation(s)
- Tzu-Chi Yen
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Daniel B Larremore
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA.,BioFrontiers Institute, University of Colorado, Boulder, Colorado 80303, USA
| |
Collapse
|
23
|
|
24
|
Falakshahi H, Vergara VM, Liu J, Mathalon DH, Ford JM, Voyvodic J, Mueller BA, Belger A, McEwen S, Potkin SG, Preda A, Rokham H, Sui J, Turner JA, Plis S, Calhoun VD. Meta-Modal Information Flow: A Method for Capturing Multimodal Modular Disconnectivity in Schizophrenia. IEEE Trans Biomed Eng 2020; 67:2572-2584. [PMID: 31944934 PMCID: PMC7538162 DOI: 10.1109/tbme.2020.2964724] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
OBJECTIVE Multimodal measurements of the same phenomena provide complementary information and highlight different perspectives, albeit each with their own limitations. A focus on a single modality may lead to incorrect inferences, which is especially important when a studied phenomenon is a disease. In this paper, we introduce a method that takes advantage of multimodal data in addressing the hypotheses of disconnectivity and dysfunction within schizophrenia (SZ). METHODS We start with estimating and visualizing links within and among extracted multimodal data features using a Gaussian graphical model (GGM). We then propose a modularity-based method that can be applied to the GGM to identify links that are associated with mental illness across a multimodal data set. Through simulation and real data, we show our approach reveals important information about disease-related network disruptions that are missed with a focus on a single modality. We use functional MRI (fMRI), diffusion MRI (dMRI), and structural MRI (sMRI) to compute the fractional amplitude of low frequency fluctuations (fALFF), fractional anisotropy (FA), and gray matter (GM) concentration maps. These three modalities are analyzed using our modularity method. RESULTS Our results show missing links that are only captured by the cross-modal information that may play an important role in disconnectivity between the components. CONCLUSION We identified multimodal (fALFF, FA and GM) disconnectivity in the default mode network area in patients with SZ, which would not have been detectable in a single modality. SIGNIFICANCE The proposed approach provides an important new tool for capturing information that is distributed among multiple imaging modalities.
Collapse
|
25
|
Schweinberger M. Consistent structure estimation of exponential-family random graph models with block structure. BERNOULLI 2020. [DOI: 10.3150/19-bej1153] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
Nonreconstruction of high-dimensional stochastic block model with bounded degree. Stat Probab Lett 2020. [DOI: 10.1016/j.spl.2019.108675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
27
|
Perry A, Wein AS, Bandeira AS. Statistical limits of spiked tensor models. ANNALES DE L'INSTITUT HENRI POINCARÉ, PROBABILITÉS ET STATISTIQUES 2020. [DOI: 10.1214/19-aihp960] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
28
|
Lauw HW, Wong RCW, Ntoulas A, Lim EP, Ng SK, Pan SJ. Fast Community Detection with Graph Sparsification. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2020. [PMCID: PMC7206315 DOI: 10.1007/978-3-030-47426-3_23] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A popular model for detecting community structure in large graphs is the Stochastic Block Model (SBM). The exact parameters to recover the community structure of a SBM has been well studied, and many methods have been proposed to recover a nodes’ community membership. A popular approach is to use spectral methods where the Graph Laplacian L of the given graph is created, and the Fiedler vector of the graph is found. This vector is then used to cluster nodes in the same community. While a robust method, it can be expensive to compute the Fiedler vector exactly. In this paper we examine the types of errors that can be tolerated using spectral methods while still recovering the communities. The two sources of error considered are: (i) dropping edges using different sparsification strategies; and (ii) inaccurately computing the eigenvectors. In this way, spectral clustering algorithms can be tuned to be far more efficient at detecting community structure for these community models.
Collapse
Affiliation(s)
- Hady W. Lauw
- School of Information Systems, Singapore Management University, Singapore, Singapore
| | - Raymond Chi-Wing Wong
- Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
| | - Alexandros Ntoulas
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece
| | - Ee-Peng Lim
- School of Information Systems, Singapore Management University, Singapore, Singapore
| | - See-Kiong Ng
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | - Sinno Jialin Pan
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
29
|
|
30
|
Ricci-Tersenghi F, Semerjian G, Zdeborová L. Typology of phase transitions in Bayesian inference problems. Phys Rev E 2019; 99:042109. [PMID: 31108676 DOI: 10.1103/physreve.99.042109] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Indexed: 11/07/2022]
Abstract
Many inference problems undergo phase transitions as a function of the signal-to-noise ratio, a prominent example of this phenomenon being found in the stochastic block model (SBM) that generates a random graph with a hidden community structure. Some of these phase transitions affect the information-theoretic optimal (but possibly computationally expensive) estimation procedure, others concern the behavior of efficient iterative algorithms. A computational gap opens when the phase transitions for these two aspects do not coincide, leading to a hard phase in which optimal inference is computationally challenging. In this paper we refine this description in two ways. From a qualitative perspective, we emphasize the existence of more generic phase diagrams with a hybrid-hard phase, in which it is computationally easy to reach a nontrivial inference accuracy but computationally hard to match the information-theoretic optimal one. We support this discussion by quantitative expansions of the functional cavity equations that describe inference problems on sparse graphs, around their trivial solution. These expansions shed light on the existence of hybrid-hard phases, for a large class of planted constraint satisfaction problems, and on the question of the tightness of the Kesten-Stigum (KS) bound for the associated tree reconstruction problem. Our results show that the instability of the trivial fixed point is not generic evidence for the Bayes optimality of the message-passing algorithms. We clarify in particular the status of the symmetric SBM with four communities and of the tree reconstruction of the associated Potts model: In the assortative (ferromagnetic) case the KS bound is always tight, whereas in the disassortative (antiferromagnetic) case we exhibit an explicit criterion involving the degree distribution that separates a large-degree regime where the KS bound is tight and a low-degree regime where it is not. We also investigate the SBM with two communities of different sizes, also known as the asymmetric Ising model, and describe quantitatively its computational gap as a function of its asymmetry, as well as a version of the SBM with two groups of q_{1} and q_{2} communities. We complement this study with numerical simulations of the belief propagation iterative algorithm, confirming that its behavior on large samples is well described by the cavity method.
Collapse
Affiliation(s)
- Federico Ricci-Tersenghi
- Diparimento di Fisica, Sapienza Università di Roma, Nanotec CNR, UOS di Roma, and INFN Sezione di Roma 1, Piazzale Aldo Moro 5, 00185 Roma, Italy
| | - Guilhem Semerjian
- Laboratoire de Physique, Ecole Normale Supérieure, Université PSL, CNRS, Sorbonne Université, Université Paris-Diderot, Université Sorbonne Paris Cité, Paris, France
| | - Lenka Zdeborová
- Institut de Physique Théorique, CNRS, CEA, Université Paris-Saclay, Gif-sur-Yvette, France
| |
Collapse
|
31
|
Funke T, Becker T. Stochastic block models: A comparison of variants and inference methods. PLoS One 2019; 14:e0215296. [PMID: 31013290 PMCID: PMC6478296 DOI: 10.1371/journal.pone.0215296] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 03/30/2019] [Indexed: 11/19/2022] Open
Abstract
Finding communities in complex networks is a challenging task and one promising approach is the Stochastic Block Model (SBM). But the influences from various fields led to a diversity of variants and inference methods. Therefore, a comparison of the existing techniques and an independent analysis of their capabilities and weaknesses is needed. As a first step, we review the development of different SBM variants such as the degree-corrected SBM of Karrer and Newman or Peixoto's hierarchical SBM. Beside stating all these variants in a uniform notation, we show the reasons for their development. Knowing the variants, we discuss a variety of approaches to infer the optimal partition like the Metropolis-Hastings algorithm. We perform our analysis based on our extension of the Girvan-Newman test and the Lancichinetti-Fortunato-Radicchi benchmark as well as a selection of some real world networks. Using these results, we give some guidance to the challenging task of selecting an inference method and SBM variant. In addition, we give a simple heuristic to determine the number of steps for the Metropolis-Hastings algorithms that lack a usual stop criterion. With our comparison, we hope to guide researches in the field of SBM and highlight the problem of existing techniques to focus future research. Finally, by making our code freely available, we want to promote a faster development, integration and exchange of new ideas.
Collapse
Affiliation(s)
- Thorben Funke
- Production Systems and Logistic Systems, BIBA - Bremer Institut für Produktion und Logistik GmbH at the University of Bremen, Bremen, Bremen, Germany
- Faculty of Production Engineering, University of Bremen, Bremen, Bremen, Germany
| | - Till Becker
- Production Systems and Logistic Systems, BIBA - Bremer Institut für Produktion und Logistik GmbH at the University of Bremen, Bremen, Bremen, Germany
- Faculty of Business Studies, University of Applied Sciences Emden/Leer, Emden, Lower Saxony, Germany
| |
Collapse
|
32
|
Wilinski M, Mazzarisi P, Tantari D, Lillo F. Detectability of macroscopic structures in directed asymmetric stochastic block model. Phys Rev E 2019; 99:042310. [PMID: 31108631 DOI: 10.1103/physreve.99.042310] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Indexed: 11/07/2022]
Abstract
We study the problem of identifying macroscopic structures in networks, characterizing the impact of introducing link directions on the detectability phase transition. To this end, building on the stochastic block model, we construct a class of nontrivially detectable directed networks. We find closed-form solutions by using the belief propagation method, showing how the transition line depends on the assortativity and the asymmetry of the network. Finally, we numerically identify the existence of a hard phase for detection close to the transition point.
Collapse
Affiliation(s)
- Mateusz Wilinski
- Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Piero Mazzarisi
- Dipartimento di Matematica, Porta di Piazza San Donato 5, 40126 Bologna, Italy
| | - Daniele Tantari
- Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy
| | - Fabrizio Lillo
- Dipartimento di Matematica, Porta di Piazza San Donato 5, 40126 Bologna, Italy
| |
Collapse
|
33
|
Kawamoto T, Kabashima Y. Counting the number of metastable states in the modularity landscape: Algorithmic detectability limit of greedy algorithms in community detection. Phys Rev E 2019; 99:010301. [PMID: 30780211 DOI: 10.1103/physreve.99.010301] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Indexed: 11/07/2022]
Abstract
Modularity maximization using greedy algorithms continues to be a popular approach toward community detection in graphs, even after various better forming algorithms have been proposed. Apart from its clear mechanism and ease of implementation, this approach is persistently popular because, presumably, its risk of algorithmic failure is not well understood. This Rapid Communication provides insight into this issue by estimating the algorithmic performance limit of the stochastic block model inference using modularity maximization. This is achieved by counting the number of metastable states under a local update rule. Our results offer a quantitative insight into the level of sparsity at which a greedy algorithm typically fails.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi, Koto-ku, Tokyo, Japan
| | - Yoshiyuki Kabashima
- Department of Mathematical and Computing Science, Tokyo Institute of Technology, W8-45, 2-12-1 Ookayma, Meguro-ku, Tokyo, Japan
| |
Collapse
|
34
|
Gulikers L, Lelarge M, Massoulié L. An impossibility result for reconstruction in the degree-corrected stochastic block model. ANN APPL PROBAB 2018. [DOI: 10.1214/18-aap1381] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
35
|
Tang M, Priebe CE. Limit theorems for eigenvectors of the normalized Laplacian for random graphs. Ann Stat 2018. [DOI: 10.1214/17-aos1623] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
36
|
Perry A, Wein AS, Bandeira AS, Moitra A. Optimality and sub-optimality of PCA I: Spiked random matrix models. Ann Stat 2018. [DOI: 10.1214/17-aos1625] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
37
|
Abstract
Abstract
Community detection is considered for a stochastic block model graph of n vertices, with K vertices in the planted community, edge probability p for pairs of vertices both in the community, and edge probability q for other pairs of vertices. The main focus of the paper is on weak recovery of the community based on the graph G, with o(K) misclassified vertices on average, in the sublinear regime n1-o(1) ≤ K ≤ o(n). A critical parameter is the effective signal-to-noise ratio λ = K2(p - q)2 / ((n - K)q), with λ = 1 corresponding to the Kesten–Stigum threshold. We show that a belief propagation (BP) algorithm achieves weak recovery if λ > 1 / e, beyond the Kesten–Stigum threshold by a factor of 1 / e. The BP algorithm only needs to run for log*n + O(1) iterations, with the total time complexity O(|E|log*n), where log*n is the iterated logarithm of n. Conversely, if λ ≤ 1 / e, no local algorithm can asymptotically outperform trivial random guessing. Furthermore, a linear message-passing algorithm that corresponds to applying a power iteration to the nonbacktracking matrix of the graph is shown to attain weak recovery if and only if λ > 1. In addition, the BP algorithm can be combined with a linear-time voting procedure to achieve the information limit of exact recovery (correctly classify all vertices with high probability) for all K ≥ (n / logn) (ρBP + o(1)), where ρBP is a function of p / q.
Collapse
|
38
|
Kawamoto T, Kabashima Y. Comparative analysis on the selection of number of clusters in community detection. Phys Rev E 2018; 97:022315. [PMID: 29548181 DOI: 10.1103/physreve.97.022315] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Indexed: 11/07/2022]
Abstract
We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi, Koto-ku, Tokyo, Japan
| | - Yoshiyuki Kabashima
- Department of Mathematical and Computing Science, Tokyo Institute of Technology, W8-45, 2-12-1 Ookayma, Meguro-ku, Tokyo, Japan
| |
Collapse
|
39
|
Tokuda T. Statistical test for detecting community structure in real-valued edge-weighted graphs. PLoS One 2018; 13:e0194079. [PMID: 29558487 PMCID: PMC5860707 DOI: 10.1371/journal.pone.0194079] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 02/23/2018] [Indexed: 11/28/2022] Open
Abstract
We propose a novel method to test the existence of community structure in undirected, real-valued, edge-weighted graphs. The method is based on the asymptotic behavior of extreme eigenvalues of a real symmetric edge-weight matrix. We provide a theoretical foundation for this method and report on its performance using synthetic and real data, suggesting that this new method outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Tomoki Tokuda
- Okinawa Institute of Science and Technology Graduate University, 1919-1, Tancha, Onna-son, Okinawa, Japan
- * E-mail:
| |
Collapse
|
40
|
Kawamoto T. Algorithmic detectability threshold of the stochastic block model. Phys Rev E 2018; 97:032301. [PMID: 29776051 DOI: 10.1103/physreve.97.032301] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Indexed: 06/08/2023]
Abstract
The assumption that the values of model parameters are known or correctly learned, i.e., the Nishimori condition, is one of the requirements for the detectability analysis of the stochastic block model in statistical inference. In practice, however, there is no example demonstrating that we can know the model parameters beforehand, and there is no guarantee that the model parameters can be learned accurately. In this study, we consider the expectation-maximization (EM) algorithm with belief propagation (BP) and derive its algorithmic detectability threshold. Our analysis is not restricted to the community structure but includes general modular structures. Because the algorithm cannot always learn the planted model parameters correctly, the algorithmic detectability threshold is qualitatively different from the one with the Nishimori condition.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi, Koto-ku, Tokyo, Japan
| |
Collapse
|
41
|
Bordenave C, Lelarge M, Massoulié L. Nonbacktracking spectrum of random graphs: Community detection and nonregular Ramanujan graphs. ANN PROBAB 2018. [DOI: 10.1214/16-aop1142] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
42
|
Banerjee D. Contiguity and non-reconstruction results for planted partition models: the dense case. ELECTRON J PROBAB 2018. [DOI: 10.1214/17-ejp128] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
43
|
Riolo MA, Cantwell GT, Reinert G, Newman MEJ. Efficient method for estimating the number of communities in a network. Phys Rev E 2017; 96:032310. [PMID: 29346915 DOI: 10.1103/physreve.96.032310] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Indexed: 06/07/2023]
Abstract
While there exist a wide range of effective methods for community detection in networks, most of them require one to know in advance how many communities one is looking for. Here we present a method for estimating the number of communities in a network using a combination of Bayesian inference with a novel prior and an efficient Monte Carlo sampling scheme. We test the method extensively on both real and computer-generated networks, showing that it performs accurately and consistently, even in cases where groups are widely varying in size or structure.
Collapse
Affiliation(s)
- Maria A Riolo
- Center for the Study of Complex Systems, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - George T Cantwell
- Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Gesine Reinert
- Department of Statistics, University of Oxford, 24-29 St. Giles, Oxford OX1 3LB, United Kingdom
| | - M E J Newman
- Center for the Study of Complex Systems, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
44
|
Zhao Y. A survey on theoretical advances of community detection in networks. WIRES COMPUTATIONAL STATISTICS 2017. [DOI: 10.1002/wics.1403] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Yunpeng Zhao
- Department of Statistics George Mason University Fairfax VA USA
| |
Collapse
|
45
|
Kawamoto T, Kabashima Y. Cross-validation estimate of the number of clusters in a network. Sci Rep 2017; 7:3327. [PMID: 28607441 PMCID: PMC5468368 DOI: 10.1038/s41598-017-03623-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 05/02/2017] [Indexed: 12/01/2022] Open
Abstract
Network science investigates methodologies that summarise relational data to obtain better interpretability. Identifying modular structures is a fundamental task, and assessment of the coarse-grain level is its crucial step. Here, we propose principled, scalable, and widely applicable assessment criteria to determine the number of clusters in modular networks based on the leave-one-out cross-validation estimate of the edge prediction error.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi, Koto-ku, Tokyo, Japan.
| | - Yoshiyuki Kabashima
- Department of Mathematical and Computing Science, Tokyo Institute of Technology, 4259-G5-22, Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa, 226-8502, Japan
| |
Collapse
|
46
|
Abstract
Graph partitioning problems emerge in a wide variety of complex systems, ranging from biology to finance, but can be rigorously analyzed and solved only for a few graph ensembles. Here, an ensemble of equitable graphs, i.e., random graphs with a block-regular structure, is studied, for which analytical results can be obtained. In particular, the spectral density of this ensemble is computed exactly for a modular and bipartite structure. Kesten-McKay's law for random regular graphs is found analytically to apply also for modular and bipartite structures when blocks are homogeneous. An exact solution to graph partitioning for two equal-sized communities is proposed and verified numerically, and a conjecture on the absence of an efficient recovery detectability transition in equitable graphs is suggested. A final discussion summarizes results and outlines their relevance for the solution of graph partitioning problems in other graph ensembles, in particular for the study of detectability thresholds and resolution limits in stochastic block models.
Collapse
Affiliation(s)
- Paolo Barucca
- Department of Banking and Finance, University of Zurich, Zurich, Switzerland and London Institute for Mathematical Sciences, London W1K 2XF, United Kingdom
| |
Collapse
|
47
|
Young JG, Desrosiers P, Hébert-Dufresne L, Laurence E, Dubé LJ. Finite-size analysis of the detectability limit of the stochastic block model. Phys Rev E 2017; 95:062304. [PMID: 28709195 DOI: 10.1103/physreve.95.062304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Indexed: 06/07/2023]
Abstract
It has been shown in recent years that the stochastic block model is sometimes undetectable in the sparse limit, i.e., that no algorithm can identify a partition correlated with the partition used to generate an instance, if the instance is sparse enough and infinitely large. In this contribution, we treat the finite case explicitly, using arguments drawn from information theory and statistics. We give a necessary condition for finite-size detectability in the general SBM. We then distinguish the concept of average detectability from the concept of instance-by-instance detectability and give explicit formulas for both definitions. Using these formulas, we prove that there exist large equivalence classes of parameters, where widely different network ensembles are equally detectable with respect to our definitions of detectability. In an extensive case study, we investigate the finite-size detectability of a simplified variant of the SBM, which encompasses a number of important models as special cases. These models include the symmetric SBM, the planted coloring model, and more exotic SBMs not previously studied. We conclude with three appendices, where we study the interplay of noise and detectability, establish a connection between our information-theoretic approach and random matrix theory, and provide proofs of some of the more technical results.
Collapse
Affiliation(s)
- Jean-Gabriel Young
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
| | - Patrick Desrosiers
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
- Centre de recherche de l'Institut universitaire en santé mentale de Québec, Québec (Québec), Canada G1J 2G3
| | | | - Edward Laurence
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
| | - Louis J Dubé
- Département de Physique, de Génie Physique, et d'Optique, Université Laval, Québec (Québec), Canada G1V 0A6
| |
Collapse
|
48
|
Peel L, Larremore DB, Clauset A. The ground truth about metadata and community detection in networks. SCIENCE ADVANCES 2017; 3:e1602548. [PMID: 28508065 PMCID: PMC5415338 DOI: 10.1126/sciadv.1602548] [Citation(s) in RCA: 126] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 03/08/2017] [Indexed: 05/30/2023]
Abstract
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks' links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
Collapse
Affiliation(s)
- Leto Peel
- Institute of Information and Communication Technologies, Electronics and Applied Mathematics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- naXys, Université de Namur, Namur, Belgium
| | | | - Aaron Clauset
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Department of Computer Science, University of Colorado, Boulder, CO 80309, USA
- BioFrontiers Institute, University of Colorado, Boulder, CO 80309, USA
| |
Collapse
|
49
|
Kawamoto T, Kabashima Y. Detectability thresholds of general modular graphs. Phys Rev E 2017; 95:012304. [PMID: 28208358 DOI: 10.1103/physreve.95.012304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Indexed: 06/06/2023]
Abstract
We investigate the detectability thresholds of various modular structures in the stochastic block model. Our analysis reveals how the detectability threshold is related to the details of the modular pattern, including the hierarchy of the clusters. We show that certain planted structures are impossible to infer regardless of their fuzziness.
Collapse
Affiliation(s)
- Tatsuro Kawamoto
- Department of Mathematical and Computing Science, Tokyo Institute of Technology, 4259-G5-22, Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8502, Japan
| | - Yoshiyuki Kabashima
- Department of Mathematical and Computing Science, Tokyo Institute of Technology, 4259-G5-22, Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8502, Japan
| |
Collapse
|
50
|
Newman MEJ. Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys Rev E 2016; 94:052315. [PMID: 27967199 DOI: 10.1103/physreve.94.052315] [Citation(s) in RCA: 160] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Indexed: 11/07/2022]
Abstract
We demonstrate an equivalence between two widely used methods of community detection in networks, the method of modularity maximization and the method of maximum likelihood applied to the degree-corrected stochastic block model. Specifically, we show an exact equivalence between maximization of the generalized modularity that includes a resolution parameter and the special case of the block model known as the planted partition model, in which all communities in a network are assumed to have statistically similar properties. Among other things, this equivalence provides a mathematically principled derivation of the modularity function, clarifies the conditions and assumptions of its use, and gives an explicit formula for the optimal value of the resolution parameter.
Collapse
Affiliation(s)
- M E J Newman
- Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|