1
|
Tian M, Moriano P. Robustness of community structure under edge addition. Phys Rev E 2023; 108:054302. [PMID: 38115408 DOI: 10.1103/physreve.108.054302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 09/08/2023] [Indexed: 12/21/2023]
Abstract
Communities often represent key structural and functional clusters in networks. To preserve such communities, it is important to understand their robustness under network perturbations. Previous work in community robustness analysis has focused on studying changes in the community structure as a response of edge rewiring and node or edge removal. However, the impact of increasing connectivity on the robustness of communities in networked systems is relatively unexplored. Studying the limits of community robustness under edge addition is crucial to better understanding the cases in which density expands or false edges erroneously appear. In this paper, we analyze the effect of edge addition on community robustness in synthetic and empirical temporal networks. We study two scenarios of edge addition: random and targeted. We use four community detection algorithms, Infomap, Label Propagation, Leiden, and Louvain, and demonstrate the results in community similarity metrics. The experiments on synthetic networks show that communities are more robust when the initial partition is stronger or the edge addition is random, and the experiments on empirical data also indicate that robustness performance can be affected by the community similarity metric. Overall, our results suggest that the communities identified by the different types of community detection algorithms exhibit different levels of robustness, and so the robustness of communities depends strongly on the choice of detection method.
Collapse
Affiliation(s)
- Moyi Tian
- Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA
| | - Pablo Moriano
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, USA
| |
Collapse
|
2
|
Mangold L, Roth C. Generative models for two-ground-truth partitions in networks. Phys Rev E 2023; 108:054308. [PMID: 38115519 DOI: 10.1103/physreve.108.054308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 10/09/2023] [Indexed: 12/21/2023]
Abstract
A myriad of approaches have been proposed to characterize the mesoscale structure of networks most often as a partition based on patterns variously called communities, blocks, or clusters. Clearly, distinct methods designed to detect different types of patterns may provide a variety of answers' to the networks mesoscale structure. Yet even multiple runs of a given method can sometimes yield diverse and conflicting results, producing entire landscapes of partitions which potentially include multiple (locally optimal) mesoscale explanations of the network. Such ambiguity motivates a closer look at the ability of these methods to find multiple qualitatively different "ground truth" partitions in a network. Here we propose the stochastic cross-block model (SCBM), a generative model which allows for two distinct partitions to be built into the mesoscale structure of a single benchmark network. We demonstrate a use case of the benchmark model by appraising the power of stochastic block models (SBMs) to detect implicitly planted coexisting bicommunity and core-periphery structures of different strengths. Given our model design and experimental setup, we find that the ability to detect the two partitions individually varies by SBM variant and that coexistence of both partitions is recovered only in a very limited number of cases. Our findings suggest that in most instances only one-in some way dominating-structure can be detected, even in the presence of other partitions. They underline the need for considering entire landscapes of partitions when different competing explanations exist and motivate future research to advance partition coexistence detection methods. Our model also contributes to the field of benchmark networks more generally by enabling further exploration of the ability of new and existing methods to detect ambiguity in the mesoscale structure of networks.
Collapse
Affiliation(s)
- Lena Mangold
- Computational Social Science Team, Centre Marc Bloch, Friedrichstr. 191, 10117 Berlin, Germany
- Centre national de la recherche scientifique (CNRS), 3 rue Michel-Ange, 75 016 Paris, France; and Centre d'Analyse et de Mathématique Sociales (CAMS), École des hautes études en sciences sociales (EHESS), 54 Bd Raspail, 75006 Paris, France
| | - Camille Roth
- Computational Social Science Team, Centre Marc Bloch, Friedrichstr. 191, 10117 Berlin, Germany
- Centre national de la recherche scientifique (CNRS), 3 rue Michel-Ange, 75 016 Paris, France; and Centre d'Analyse et de Mathématique Sociales (CAMS), École des hautes études en sciences sociales (EHESS), 54 Bd Raspail, 75006 Paris, France
| |
Collapse
|
3
|
Zhao X, Zhang J, Lin W. Clustering multivariate count data via Dirichlet-multinomial network fusion. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
4
|
Li H, Ye X, Imakura A, Sakurai T. LSEC: Large-scale spectral ensemble clustering. INTELL DATA ANAL 2023. [DOI: 10.3233/ida-216240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
A fundamental problem in machine learning is ensemble clustering, that is, combining multiple base clusterings to obtain improved clustering result. However, most of the existing methods are unsuitable for large-scale ensemble clustering tasks owing to efficiency bottlenecks. In this paper, we propose a large-scale spectral ensemble clustering (LSEC) method to balance efficiency and effectiveness. In LSEC, a large-scale spectral clustering-based efficient ensemble generation framework is designed to generate various base clusterings with low computational complexity. Thereafter, all the base clusterings are combined using a bipartite graph partition-based consensus function to obtain improved consensus clustering results. The LSEC method achieves a lower computational complexity than most existing ensemble clustering methods. Experiments conducted on ten large-scale datasets demonstrate the efficiency and effectiveness of the LSEC method. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li-Hongmin/MyPaperWithCode.
Collapse
|
5
|
Community detection over feature-rich information networks: An eHealth case study. INFORM SYST 2022. [DOI: 10.1016/j.is.2022.102092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
6
|
Chowdhury A, Srinivasan S, Bhowmick S, Mukherjee A, Ghosh K. Constant community identification in million-scale networks. SOCIAL NETWORK ANALYSIS AND MINING 2022; 12:70. [PMID: 35789889 PMCID: PMC9243870 DOI: 10.1007/s13278-022-00895-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 05/27/2022] [Accepted: 06/01/2022] [Indexed: 10/29/2022]
Abstract
The inherently stochastic nature of community detection in real-world complex networks poses an important challenge in assessing the accuracy of the results. In order to eliminate the algorithmic and implementation artifacts, it is necessary to identify the groups of vertices that are always clustered together, independent of the community detection algorithm used. Such groups of vertices are called constant communities. Current approaches for finding constant communities are very expensive and do not scale to large networks. In this paper, we use binary edge classification to find constant communities. The key idea is to classify edges based on whether they form a constant community or not. We present two methods for edge classification. The first is a GCN-based semi-supervised approach that we term Line-GCN. The second is an unsupervised approach based on image thresholding methods. Neither of these methods requires explicit detection of communities and can thus scale to very large networks of the order of millions of vertices. Both of our semi-supervised and unsupervised results on real-world graphs demonstrate that the constant communities obtained by our method have higher F1-scores and comparable or higher NMI scores than other state-of-the-art baseline methods for constant community detection. While the training step of Line-GCN can be expensive, the unsupervised algorithm is 10 times faster than the baseline methods. For larger networks, the baseline methods cannot complete, whereas all of our algorithms can find constant communities in a reasonable amount of time. Finally, we also demonstrate that our methods are robust under noisy conditions. We use three different, well-studied noise models to add noise to the networks and show that our results are mostly stable.
Collapse
Affiliation(s)
- Anjan Chowdhury
- Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India
| | - Sriram Srinivasan
- Department of Radiation Oncology, Virginia Commonwealth University, Richmond, USA
| | - Sanjukta Bhowmick
- Department of Computer Science, University of North Texas, Denton, USA
| | - Animesh Mukherjee
- Department of Computer Science and Engineering, IIT Kharagpur, Kharagpur, India
| | - Kuntal Ghosh
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
7
|
Bakhthemmat A, Izadi M. Communities Detection for Advertising by Futuristic Greedy Method with Clustering Approach. BIG DATA 2021; 9:22-40. [PMID: 33434088 DOI: 10.1089/big.2020.0133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Community detection in social networks is one of the advertising methods in electronic marketing. One of the approaches to find communities in large social networks is to use greedy methods, because these methods perform very fast. Greedy methods are generally designed based on local decisions; thus, inappropriate local decisions may result in an improper global solution. The use of a greedy improved index with a futuristic approach can, to some extent, prevent inappropriate local choices. Our proposed method determines the influential nodes in the social network based on the followers and following and new futuristic greedy index. It classifies the nodes based on the influential nodes by the density-based clustering algorithm with a new distance function. The proposed method can improve clustering precision to detect communities by the futuristic greedy approach. We implemented the proposed algorithm with the map-reduce technique in the Hadoop structure. Experimental results in datasets show that the average of the rand index of clusters was accomplished by 99.32% in the proposed method. In addition, these results illustrate that there is a reduction in execution time by the proposed algorithm.
Collapse
Affiliation(s)
- Ali Bakhthemmat
- Kish International Campus, Sharif University of Technology, Tehran, Iran
| | - Mohammad Izadi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
8
|
Calatayud J, Bernardo-Madrid R, Neuman M, Rojas A, Rosvall M. Exploring the solution landscape enables more reliable network community detection. Phys Rev E 2019; 100:052308. [PMID: 31869919 DOI: 10.1103/physreve.100.052308] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Indexed: 06/10/2023]
Abstract
To understand how a complex system is organized and functions, researchers often identify communities in the system's network of interactions. Because it is practically impossible to explore all solutions to guarantee the best one, many community-detection algorithms rely on multiple stochastic searches. But for a given combination of network and stochastic algorithms, how many searches are sufficient to find a solution that is good enough? The standard approach is to pick a reasonably large number of searches and select the network partition with the highest quality or derive a consensus solution based on all network partitions. However, if different partitions have similar qualities such that the solution landscape is degenerate, the single best partition may miss relevant information, and a consensus solution may blur complementary communities. Here we address this degeneracy problem with coarse-grained descriptions of the solution landscape. We cluster network partitions based on their similarity and suggest an approach to determine the minimum number of searches required to describe the solution landscape adequately. To make good use of all partitions, we also propose different ways to explore the solution landscape, including a significance clustering procedure. We test these approaches on synthetic networks and a real-world network using two contrasting community-detection algorithms: The algorithm that can identify more general structures requires more searches, and networks with clearer community structures require fewer searches. We also find that exploring the coarse-grained solution landscape can reveal complementary solutions and enable more reliable community detection.
Collapse
Affiliation(s)
- Joaquín Calatayud
- Integrated Science Lab, Department of Physics, Umeå University, Sweden
| | | | - Magnus Neuman
- Integrated Science Lab, Department of Physics, Umeå University, Sweden
| | - Alexis Rojas
- Integrated Science Lab, Department of Physics, Umeå University, Sweden
| | - Martin Rosvall
- Integrated Science Lab, Department of Physics, Umeå University, Sweden
| |
Collapse
|