51
|
Le CM, Li T. Linear regression and its inference on noisy network‐linked data. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Can M. Le
- Department of Statistics University of California, Davis Davis California USA
| | - Tianxi Li
- Department of Statistics University of Virginia Charlottesville Virginia USA
| |
Collapse
|
52
|
Bernaschi M, Celestini A, Guarino S, Mastrostefano E, Saracco F. The Fitness-Corrected Block Model, or how to create maximum-entropy data-driven spatial social networks. Sci Rep 2022; 12:18206. [PMID: 36307499 PMCID: PMC9616435 DOI: 10.1038/s41598-022-22798-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 10/19/2022] [Indexed: 12/31/2022] Open
Abstract
Models of networks play a major role in explaining and reproducing empirically observed patterns. Suitable models can be used to randomize an observed network while preserving some of its features, or to generate synthetic graphs whose properties may be tuned upon the characteristics of a given population. In the present paper, we introduce the Fitness-Corrected Block Model, an adjustable-density variation of the well-known Degree-Corrected Block Model, and we show that the proposed construction yields a maximum entropy model. When the network is sparse, we derive an analytical expression for the degree distribution of the model that depends on just the constraints and the chosen fitness-distribution. Our model is perfectly suited to define maximum-entropy data-driven spatial social networks, where each block identifies vertices having similar position (e.g., residence) and age, and where the expected block-to-block adjacency matrix can be inferred from the available data. In this case, the sparse-regime approximation coincides with a phenomenological model where the probability of a link binding two individuals is directly proportional to their sociability and to the typical cohesion of their age-groups, whereas it decays as an inverse-power of their geographic distance. We support our analytical findings through simulations of a stylized urban area.
Collapse
Affiliation(s)
- Massimo Bernaschi
- grid.5326.20000 0001 1940 4177Institute for Applied Computing “Mauro Picone”, National Research Council of Italy, Via dei Taurini 19, 00185 Rome, Italy
| | - Alessandro Celestini
- grid.5326.20000 0001 1940 4177Institute for Applied Computing “Mauro Picone”, National Research Council of Italy, Via dei Taurini 19, 00185 Rome, Italy
| | - Stefano Guarino
- grid.5326.20000 0001 1940 4177Institute for Applied Computing “Mauro Picone”, National Research Council of Italy, Via dei Taurini 19, 00185 Rome, Italy
| | - Enrico Mastrostefano
- grid.5326.20000 0001 1940 4177Institute for Applied Computing “Mauro Picone”, National Research Council of Italy, Via dei Taurini 19, 00185 Rome, Italy
| | - Fabio Saracco
- grid.5326.20000 0001 1940 4177Institute for Applied Computing “Mauro Picone”, National Research Council of Italy, Via dei Taurini 19, 00185 Rome, Italy ,“Enrico Fermi” Research Center (CREF), Via Panisperna 89A, 00184 Rome, Italy
| |
Collapse
|
53
|
Huang S, Weng H, Feng Y. Spectral clustering via adaptive layer aggregation for multi-layer networks*. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2134874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Sihan Huang
- Department of Statistics, Columbia University
| | - Haolei Weng
- Department of Statistics and Probability, Michigan State University
| | - Yang Feng
- Department of Biostatistics, New York University
| |
Collapse
|
54
|
Liu D, Chang Z, Yang G, Chen E. Hiding ourselves from community detection through genetic algorithms. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
55
|
Beckus A, Atia GK. Sketch-based community detection in evolving networks. Phys Rev E 2022; 106:044306. [PMID: 36397578 DOI: 10.1103/physreve.106.044306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 09/08/2022] [Indexed: 06/16/2023]
Abstract
We consider an approach for community detection in time-varying networks. At its core, this approach maintains a small sketch graph to capture the essential community structure found in each snapshot of the full network. We demonstrate how the sketch can be used to explicitly identify six key community events which typically occur during network evolution: growth, shrinkage, merging, splitting, birth, and death. Based on these detection techniques, we formulate a community detection algorithm which can process a network concurrently exhibiting all processes. One advantage afforded by the sketch-based algorithm is the efficient handling of large networks. Whereas detecting events in the full graph may be computationally expensive, the small size of the sketch allows changes to be quickly assessed. A second advantage occurs in networks containing clusters of disproportionate size. The sketch is constructed such that there is equal representation of each cluster, thus reducing the possibility that the small clusters are lost in the estimate. We present a new standardized benchmark based on the stochastic block model which models the addition and deletion of nodes, as well as the birth and death of communities. When coupled with existing benchmarks, this new benchmark provides a comprehensive suite of tests encompassing all six community events. We provide analysis and a set of numerical results demonstrating the advantages of our approach both in runtime and in the handling of small clusters.
Collapse
Affiliation(s)
- Andre Beckus
- Department of Electrical and Computer Engineering, University of Central Florida, Orlando, Florida 32816, USA
| | - George K Atia
- Department of Electrical and Computer Engineering, University of Central Florida, Orlando, Florida 32816, USA
- Department of Computer Science, University of Central Florida, Orlando, Florida 32816, USA
| |
Collapse
|
56
|
Finite-state parameter space maps for pruning partitions in modularity-based community detection. Sci Rep 2022; 12:15928. [PMID: 36151268 PMCID: PMC9508178 DOI: 10.1038/s41598-022-20142-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 09/09/2022] [Indexed: 11/08/2022] Open
Abstract
Partitioning networks into communities of densely connected nodes is an important tool used widely across different applications, with numerous methods and software packages available for community detection. Modularity-based methods require parameters to be selected (or assume defaults) to control the resolution and, in multilayer networks, interlayer coupling. Meanwhile, most useful algorithms are heuristics yielding different near-optimal results upon repeated runs (even at the same parameters). To address these difficulties, we combine recent developments into a simple-to-use framework for pruning a set of partitions to a subset that are self-consistent by an equivalence with the objective function for inference of a degree-corrected planted partition stochastic block model (SBM). Importantly, this combined framework reduces some of the problems associated with the stochasticity that is inherent in the use of heuristics for optimizing modularity. In our examples, the pruning typically highlights only a small number of partitions that are fixed points of the corresponding map on the set of somewhere-optimal partitions in the parameter space. We also derive resolution parameter upper bounds for fitting a constrained SBM of K blocks and demonstrate that these bounds hold in practice, further guiding parameter space regions to consider. With publicly available code ( http://github.com/ragibson/ModularityPruning ), our pruning procedure provides a new baseline for using modularity-based community detection in practice.
Collapse
|
57
|
Zhao Y, Chen T, Cai J, Lichenstein S, Potenza MN, Yip SW. Bayesian network mediation analysis with application to the brain functional connectome. Stat Med 2022; 41:3991-4005. [PMID: 35795965 PMCID: PMC10131252 DOI: 10.1002/sim.9488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 04/12/2022] [Accepted: 05/18/2022] [Indexed: 11/10/2022]
Abstract
The brain functional connectome, the collection of interconnected neural circuits along functional networks, facilitates a cutting-edge understanding of brain functioning, and has a potential to play a mediating role within the effect pathway between an exposure and an outcome. While existing mediation analytic approaches are capable of providing insight into complex processes, they mainly focus on a univariate mediator or mediator vector, without considering network-variate mediators. To fill the methodological gap and accomplish this exciting and urgent application, in the article, we propose an integrative mediation analysis under a Bayesian paradigm with networks entailing the mediation effect. To parameterize the network measurements, we introduce individually specified stochastic block models with unknown block allocation, and naturally bridge effect elements through the latent network mediators induced by the connectivity weights across network modules. To enable the identification of truly active mediating components, we simultaneously impose a feature selection across network mediators. We show the superiority of our model in estimating different effect components and selecting active mediating network structures. As a practical illustration of this approach's application to network neuroscience, we characterize the relationship between a therapeutic intervention and opioid abstinence as mediated by brain functional sub-networks.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, USA
- Yale Center for Analytical Sciences, Yale University School of Public Health, New Haven, Connecticut, USA
| | - Tianqi Chen
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, USA
| | - Jiachen Cai
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, USA
| | - Sarah Lichenstein
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Marc N Potenza
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, USA
- Child Study Center, Yale University School of Medicine, New Haven, Connecticut, USA
- Department of Neuroscience, Yale University School of Medicine, New Haven, Connecticut, USA
- Connecticut Mental Health Center, New Haven, Connecticut, USA
- Connecticut Council on Problem Gambling, Wethersfield, Connecticut, USA
- Wu Tsai Institute, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Sarah W Yip
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, USA
- Child Study Center, Yale University School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
58
|
Degree-corrected distribution-free model for community detection in weighted networks. Sci Rep 2022; 12:15153. [PMID: 36071097 PMCID: PMC9452590 DOI: 10.1038/s41598-022-19456-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 08/30/2022] [Indexed: 12/01/2022] Open
Abstract
A degree-corrected distribution-free model is proposed for weighted social networks with latent structural information. The model extends the previous distribution-free models by considering variation in node degree to fit real-world weighted networks, and it also extends the classical degree-corrected stochastic block model from un-weighted network to weighted network. We design an algorithm based on the idea of spectral clustering to fit the model. Theoretical framework on consistent estimation for the algorithm is developed under the model. Theoretical results when edge weights are generated from different distributions are analyzed. We also propose a general modularity as an extension of Newman’s modularity from un-weighted network to weighted network. Using experiments with simulated and real-world networks, we show that our method significantly outperforms the uncorrected one, and the general modularity is effective.
Collapse
|
59
|
Sun J, Kong Q, Xu Z. Deep alternating non-negative matrix factorisation. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
60
|
Qing H. Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1216. [PMID: 36141101 PMCID: PMC9497671 DOI: 10.3390/e24091216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/25/2022] [Accepted: 08/27/2022] [Indexed: 06/16/2023]
Abstract
We consider the problem of modeling and estimating communities in directed networks. Models to this problem in the previous literature always assume that the sending clusters and the receiving clusters have non-overlapping property or overlapping property simultaneously. However, previous models cannot model the directed network in which nodes in sending clusters have overlapping property, while nodes in receiving clusters have non-overlapping property, especially for the case when the number of sending clusters is no larger than that of the receiving clusters. This kind of directed network exists in the real world for its randomness, and by the fact that we have little prior knowledge of the community structure for some real-world directed networks. To study the asymmetric structure for such directed networks, we propose a flexible and identifiable Overlapping and Non-overlapping model (ONM). We also provide one model as an extension of ONM to model the directed network, with a variation in node degree. Two spectral clustering algorithms are designed to fit the models. We establish a theoretical guarantee on the estimation consistency for the algorithms under the proposed models. A small scale computer-generated directed networks are designed and conducted to support our theoretical results. Four real-world directed networks are used to illustrate the algorithms, and the results reveal the existence of highly mixed nodes and the asymmetric structure for these networks.
Collapse
Affiliation(s)
- Huan Qing
- School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
61
|
Kovács L, Bóta A, Hajdu L, Krész M. Brands, networks, communities: How brand names are wired in the mind. PLoS One 2022; 17:e0273192. [PMID: 36006965 PMCID: PMC9409517 DOI: 10.1371/journal.pone.0273192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 08/03/2022] [Indexed: 11/18/2022] Open
Abstract
Brands can be defined as psychological constructs residing in our minds. By analyzing brand associations, we can study the mental constructs around them. In this paper, we study brands as parts of an associative network based on a word association database. We explore the communities–closely-knit groups in the mind–around brand names in this structure using two community detection algorithms in the Hungarian word association database ConnectYourMind. We identify brand names inside the communities of a word association network and explain why these brand names are part of the community. Several detected communities contain brand names from the same product category, and the words in these categories were connected either to brands in the category or to words describing the product category. Based on our findings, we describe the mental position of brand names. We show that brand knowledge, product knowledge and real word knowledge interact with each other. We also show how the meaning of a product category arises and how this meaning is related to brand meaning. Our results suggest that words sharing the same community with brand names can be used in brand communication and brand positioning.
Collapse
Affiliation(s)
- László Kovács
- Savaria Department of Business Administration, Faculty of Social Sciences, E¨otv¨os Lor´and University, Szombathely, Hungary
| | - András Bóta
- Department of Computer Science, Electrical and Space Engineering, Embedded Intelligent Systems Lab, Lule˚a University of Technology, Lule˚a, Sweden
- * E-mail:
| | - László Hajdu
- Innorenew CoE, Izola, Slovenia
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
- Gyula Juh´asz Faculty of Education, University of Szeged, Szeged, Hungary
| | - Miklós Krész
- Innorenew CoE, Izola, Slovenia
- Andrej Maruˇsiˆc Institute, University of Primorska, Koper, Slovenia
- Gyula Juh´asz Faculty of Education, University of Szeged, Szeged, Hungary
| |
Collapse
|
62
|
Li N, Jin D, Wei J, Huang Y, Xu J. Functional brain abnormalities in major depressive disorder using a multiscale community detection approach. Neuroscience 2022; 501:1-10. [PMID: 35964834 DOI: 10.1016/j.neuroscience.2022.08.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 08/04/2022] [Accepted: 08/05/2022] [Indexed: 11/28/2022]
Abstract
Major depressive disorder (MDD) is a serious disease associated with abnormal brain regions, however, the interconnection between specific brain regions related to depression has not been fully explored. To solve this problem, the paper proposes a novel multiscale community detection method to compare the differences in brain regions between normal controls (NC) and MDD patients. This study adopted the Brainnetome Atlas to divide the brain into 246 regions and extract the time series of each region. The Pearson correlation was used to measure the similarity among different brain regions to conduct the brain functional network and to perform multiscale community detection. The optimal brain community structure of each group was further explored based on the modularized Qcut algorithm, normalized mutual information (NMI), and variation of information (VI). The Jaccard index was then applied to compare the abnormalities of each brain region from different community environments between the brain function networks of NC and MDD patients. The experiments revealed several abnormal brain regions between NC and MDD, including the superior frontal gyrus, middle frontal gyrus, inferior frontal gyrus, orbital gyrus, superior temporal gyrus, middle temporal gyrus, inferior temporal gyrus, posterior superior temporal sulcus, inferior parietal gyrus, precuneus, postcentral gyrus, insular gyrus, cingulate gyrus, hippocampus and basal ganglia. Finally, a new subnetwork related to cognitive function was discovered, which was composed of the island gyrus and inferior frontal gyrus. All experiments indicated that the proposed method is useful in detecting functional brain abnormalities in MDD, and it can provide valuable insights into the diagnosis and treatment of MDD.
Collapse
Affiliation(s)
- Na Li
- Tianjin Key Lab of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Di Jin
- Tianjin Key Lab of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jianguo Wei
- Tianjin Key Lab of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yuxiao Huang
- Columbian College of Arts & Sciences, George Washington University, Washington D.C., USA
| | - Junhai Xu
- Tianjin Key Lab of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, China.
| |
Collapse
|
63
|
Qing H. A Useful Criterion on Studying Consistent Estimation in Community Detection. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1098. [PMID: 36010762 PMCID: PMC9407257 DOI: 10.3390/e24081098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 06/15/2023]
Abstract
In network analysis, developing a unified theoretical framework that can compare methods under different models is an interesting problem. This paper proposes a partial solution to this problem. We summarize the idea of using a separation condition for a standard network and sharp threshold of the Erdös-Rényi random graph to study consistent estimation, and compare theoretical error rates and requirements on the network sparsity of spectral methods under models that can degenerate to a stochastic block model as a four-step criterion SCSTC. Using SCSTC, we find some inconsistent phenomena on separation condition and sharp threshold in community detection. In particular, we find that the original theoretical results of the SPACL algorithm introduced to estimate network memberships under the mixed membership stochastic blockmodel are sub-optimal. To find the formation mechanism of inconsistencies, we re-establish the theoretical convergence rate of this algorithm by applying recent techniques on row-wise eigenvector deviation. The results are further extended to the degree-corrected mixed membership model. By comparison, our results enjoy smaller error rates, lesser dependence on the number of communities, weaker requirements on network sparsity, and so forth. The separation condition and sharp threshold obtained from our theoretical results match the classical results, so the usefulness of this criterion on studying consistent estimation is guaranteed. Numerical results for computer-generated networks support our finding that spectral methods considered in this paper achieve the threshold of separation condition.
Collapse
Affiliation(s)
- Huan Qing
- School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
64
|
Haj AE, Slaoui Y, Louis PY, Khraibani Z. Estimation in a binomial stochastic blockmodel for a weighted graph by a variational expectation maximization algorithm. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2020.1743858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Abir El Haj
- Laboratoire de Mathématiques et Applications, Université de Poitiers, Poitiers, France
- Faculté de Sciences, Université Libanaise, Beyrouth, Liban
| | - Yousri Slaoui
- Laboratoire de Mathématiques et Applications, Université de Poitiers, Poitiers, France
| | - Pierre-Yves Louis
- Laboratoire de Mathématiques et Applications, Université de Poitiers, Poitiers, France
| | | |
Collapse
|
65
|
Peixoto TP. Ordered community detection in directed networks. Phys Rev E 2022; 106:024305. [PMID: 36109944 DOI: 10.1103/physreve.106.024305] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 08/02/2022] [Indexed: 06/15/2023]
Abstract
We develop a method to infer community structure in directed networks where the groups are ordered in a latent one-dimensional hierarchy that determines the preferred edge direction. Our nonparametric Bayesian approach is based on a modification of the stochastic block model (SBM), which can take advantage of rank alignment and coherence to produce parsimonious descriptions of networks that combine ordered hierarchies with arbitrary mixing patterns between groups. Since our model also includes directed degree correction, we can use it to distinguish nonlocal hierarchical structure from local in- and out-degree imbalance-thus, removing a source of conflation present in most ranking methods. We also demonstrate how we can reliably compare with the results obtained with the unordered SBM variant to determine whether a hierarchical ordering is statistically warranted in the first place. We illustrate the application of our method on a wide variety of empirical networks across several domains.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria
| |
Collapse
|
66
|
Gribel D, Gendreau M, Vidal T. Semi-supervised clustering with inaccurate pairwise annotations. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.05.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
67
|
Wang L, Tong X, Wang YR. Statistics in everyone’s backyard: An impact study via citation network analysis. PATTERNS 2022; 3:100532. [PMID: 36033599 PMCID: PMC9403407 DOI: 10.1016/j.patter.2022.100532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 04/25/2022] [Accepted: 05/25/2022] [Indexed: 11/27/2022]
Abstract
Statistical methodologies are indispensable in data-driven scientific discoveries. In this paper, we make the first effort to understand the impact of recent statistical innovations on other scientific fields. By collecting comprehensive bibliometric data from the Web of Science database for selected statistical journals, we investigate the citation trends and compositions of citing fields over time, and we find increasing citation diversity. Furthermore, in a new setting, we apply a local clustering technique involving personalized PageRank with graph conductance for size selection to find the most relevant statistical innovation for a given external topic in other fields. Through a number of case studies, we show that the results from our citation data analysis align well with our knowledge and intuition about these external topics. Overall, we have found that the statistical theory and methods recently invented by the statistics community have made increasing impact on other scientific fields. Citation data were collected for core statistics papers from the past two decades A comprehensive evaluation of their impact on other scientific fields was conducted The external impact of statistics has been increasing in volume and diversity The most influential statistics community for a given external topic was found
How much impact has statistics made on other scientific fields in the era of big data? This work represents the first effort toward quantifying the external influence of statistical theory and method research through citation network analysis. We formulate the problem of finding the most relevant statistical research area for any external research topic as a local clustering problem, suggesting new applied and theoretical grounds for alternative community detection techniques. The results of our analysis confirm that statistics plays an active and expanding role in serving other disciplines. The data we have collected are rich in content and structure, lending themselves naturally to future modeling and analysis from different perspectives.
Collapse
|
68
|
Runghen R, Stouffer DB, Dalla Riva GV. Exploiting node metadata to predict interactions in bipartite networks using graph embedding and neural networks. ROYAL SOCIETY OPEN SCIENCE 2022; 9:220079. [PMID: 36016910 PMCID: PMC9399714 DOI: 10.1098/rsos.220079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 08/02/2022] [Indexed: 06/15/2023]
Abstract
Networks are increasingly used in various fields to represent systems with the aim of understanding the underlying rules governing observed interactions, and hence predict how the system is likely to behave in the future. Recent developments in network science highlight that accounting for node metadata improves both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, to predict interactions in a network within existing statistical and machine learning frameworks, we need to learn objects that rapidly grow in dimension with the number of nodes. Thus, the task becomes computationally and conceptually challenging for networks. Here, we present a new predictive procedure combining a statistical, low-rank graph embedding method with machine learning techniques which reduces substantially the complexity of the learning task and allows us to efficiently predict interactions from node metadata in bipartite networks. To illustrate its application on real-world data, we apply it to a large dataset of tourist visits across a country. We found that our procedure accurately reconstructs existing interactions and predicts new interactions in the network. Overall, both from a network science and data science perspective, our work offers a flexible and generalizable procedure for link prediction.
Collapse
Affiliation(s)
- Rogini Runghen
- Centre for Integrative Ecology, School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- The Roux Institute, Northeastern University, Boston, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Daniel B. Stouffer
- Centre for Integrative Ecology, School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Giulio V. Dalla Riva
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
69
|
Gu J, Yin G. Triangular Concordance Learning of Networks. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2099405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Jiaqi Gu
- Department of Statistics and Actuarial Science, The University of Hong Kong
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, The University of Hong Kong
| |
Collapse
|
70
|
Peng Z, Zhou Q. An empirical Bayes approach to stochastic blockmodels and graphons: shrinkage estimation and model selection. PeerJ Comput Sci 2022; 8:e1006. [PMID: 35875655 PMCID: PMC9299287 DOI: 10.7717/peerj-cs.1006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 05/24/2022] [Indexed: 06/15/2023]
Abstract
The graphon (W-graph), including the stochastic block model as a special case, has been widely used in modeling and analyzing network data. Estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on inference in the latent space of the model, while adopting simple maximum likelihood or Bayesian estimates for the graphon or connectivity parameters given the identified latent variables. In this work, we propose a hierarchical model and develop a novel empirical Bayes estimate of the connectivity matrix of a stochastic block model to approximate the graphon function. Based on our hierarchical model, we further introduce a new model selection criterion for choosing the number of communities. Numerical results on extensive simulations and two well-annotated social networks demonstrate the superiority of our approach in terms of parameter estimation and model selection.
Collapse
|
71
|
Link Pruning for Community Detection in Social Networks. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Attempts to discover knowledge through data are gradually becoming diversified to understand complex aspects of social phenomena. Graph data analysis, which models and analyzes complex data as graphs, draws much attention as it combines the latest machine learning techniques. In this paper, we propose a new framework called link pruning for detecting clusters in complex networks, which leverages the cohesiveness of local structures by removing unimportant connections. Link pruning is a flexible framework that reduces the clustering problem in a highly mixed community structure to a simpler problem with a lowly mixed community structure. We analyze which similarities and curvatures defined on the pairs of nodes, which we call the link attributes, allow links inside and outside the community to have a different range of values. Using the link attributes, we design and analyze an algorithm that eliminates links with low attribute values to find a better community structure on the transformed graph with low mixing. Through extensive experiments, we have shown that clustering algorithms with link pruning achieve higher quality than existing algorithms in both synthetic and real-world social networks.
Collapse
|
72
|
Doubly Stochastic Scaling Unifies Community Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
73
|
|
74
|
Schneider T, Dunbar ORA, Wu J, Böttcher L, Burov D, Garbuno-Inigo A, Wagner GL, Pei S, Daraio C, Ferrari R, Shaman J. Epidemic management and control through risk-dependent individual contact interventions. PLoS Comput Biol 2022; 18:e1010171. [PMID: 35737648 PMCID: PMC9223336 DOI: 10.1371/journal.pcbi.1010171] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 05/05/2022] [Indexed: 12/12/2022] Open
Abstract
Testing, contact tracing, and isolation (TTI) is an epidemic management and control approach that is difficult to implement at scale because it relies on manual tracing of contacts. Exposure notification apps have been developed to digitally scale up TTI by harnessing contact data obtained from mobile devices; however, exposure notification apps provide users only with limited binary information when they have been directly exposed to a known infection source. Here we demonstrate a scalable improvement to TTI and exposure notification apps that uses data assimilation (DA) on a contact network. Network DA exploits diverse sources of health data together with the proximity data from mobile devices that exposure notification apps rely upon. It provides users with continuously assessed individual risks of exposure and infection, which can form the basis for targeting individual contact interventions. Simulations of the early COVID-19 epidemic in New York City are used to establish proof-of-concept. In the simulations, network DA identifies up to a factor 2 more infections than contact tracing when both harness the same contact data and diagnostic test data. This remains true even when only a relatively small fraction of the population uses network DA. When a sufficiently large fraction of the population (≳ 75%) uses network DA and complies with individual contact interventions, targeting contact interventions with network DA reduces deaths by up to a factor 4 relative to TTI. Network DA can be implemented by expanding the computational backend of existing exposure notification apps, thus greatly enhancing their capabilities. Implemented at scale, it has the potential to precisely and effectively control future epidemics while minimizing economic disruption. During the ongoing COVID-19 pandemic, exposure notification apps have been developed to scale up manual contact tracing. The apps use proximity data from mobile devices to automate notifying direct contacts of an infection source. The information they provide is limited because users receive only rare and binary alerts. Here we present network data assimilation (DA) as a new digital approach to epidemic management and control. Network DA uses the same data as exposure notification apps but uses it more effectively to provide frequently updated individual risk assessments to users. Network DA is based on automated learning about individuals’ risk of exposure and infection from crowd-sourced health data and proximity data. The data are aggregated with models of disease transmission to produce statistical assessments of users’ risks. In an extensive simulation study of the COVID-19 epidemic in New York City (NYC), we show that network DA with diagnostic testing achieves epidemic control with fewer than half the deaths that occurred during NYC’s lockdown, while isolating a far smaller fraction of the population (typically only 5–10% of the population at any given time). Implemented at scale, then, network DA has the potential to effectively control epidemics while minimizing economic and social disruption.
Collapse
Affiliation(s)
- Tapio Schneider
- California Institute of Technology, Pasadena, California, United States of America
- * E-mail:
| | - Oliver R. A. Dunbar
- California Institute of Technology, Pasadena, California, United States of America
| | - Jinlong Wu
- California Institute of Technology, Pasadena, California, United States of America
| | - Lucas Böttcher
- Computational Social Science, Frankfurt School of Finance and Management, Frankfurt a. M., Germany
- Department of Computational Medicine, University of California, Los Angeles, California, United States of America
| | - Dmitry Burov
- California Institute of Technology, Pasadena, California, United States of America
| | - Alfredo Garbuno-Inigo
- Departamento de Estadística, Instituto Tecnológico Autónomo de México, Ciudad de México, México
| | - Gregory L. Wagner
- Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Sen Pei
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, United States of America
| | - Chiara Daraio
- California Institute of Technology, Pasadena, California, United States of America
| | - Raffaele Ferrari
- Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Jeffrey Shaman
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, United States of America
| |
Collapse
|
75
|
Rubin‐Delanchy P, Cape J, Tang M, Priebe CE. A statistical interpretation of spectral embedding: The generalised random dot product graph. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
| | - Joshua Cape
- University of Pittsburgh Pittsburgh Pennsylvania USA
| | - Minh Tang
- North Carolina State University Raleigh North Carolina USA
| | | |
Collapse
|
76
|
Lunde R, Sarkar P. Subsampling Sparse Graphons Under Minimal Assumptions. Biometrika 2022. [DOI: 10.1093/biomet/asac032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Summary
We study the properties of two subsampling procedures for networks, (vertex subsampling and p-subsampling), under the sparse graphon model. The consistency of network subsampling is demonstrated under the minimal assumptions of weak convergence of corresponding network statistics and an (expected) subsample size growing to infinity slower than the number of vertices in the network. Furthermore, under appropriate sparsity conditions, we derive limiting distributions for the nonzero eigenvalues of an adjacency matrix under the sparse graphon model. Our weak convergence result implies the consistency of our subsampling procedures for eigenvalues under appropriate conditions.
Collapse
Affiliation(s)
- Robert Lunde
- University of Michigan Department of Statistics, , Ann Arbor, Michigan 48109, U.S.A
| | - Purnamrita Sarkar
- University of Texas Austin Department of Statistics and Data Sciences, , Texas 78712, U.S.A
| |
Collapse
|
77
|
Guseva K, Darcy S, Simon E, Alteio LV, Montesinos-Navarro A, Kaiser C. From diversity to complexity: Microbial networks in soils. SOIL BIOLOGY & BIOCHEMISTRY 2022; 169:108604. [PMID: 35712047 PMCID: PMC9125165 DOI: 10.1016/j.soilbio.2022.108604] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 02/08/2022] [Accepted: 02/09/2022] [Indexed: 05/07/2023]
Abstract
Network analysis has been used for many years in ecological research to analyze organismal associations, for example in food webs, plant-plant or plant-animal interactions. Although network analysis is widely applied in microbial ecology, only recently has it entered the realms of soil microbial ecology, shown by a rapid rise in studies applying co-occurrence analysis to soil microbial communities. While this application offers great potential for deeper insights into the ecological structure of soil microbial ecosystems, it also brings new challenges related to the specific characteristics of soil datasets and the type of ecological questions that can be addressed. In this Perspectives Paper we assess the challenges of applying network analysis to soil microbial ecology due to the small-scale heterogeneity of the soil environment and the nature of soil microbial datasets. We review the different approaches of network construction that are commonly applied to soil microbial datasets and discuss their features and limitations. Using a test dataset of microbial communities from two depths of a forest soil, we demonstrate how different experimental designs and network constructing algorithms affect the structure of the resulting networks, and how this in turn may influence ecological conclusions. We will also reveal how assumptions of the construction method, methods of preparing the dataset, and definitions of thresholds affect the network structure. Finally, we discuss the particular questions in soil microbial ecology that can be approached by analyzing and interpreting specific network properties. Targeting these network properties in a meaningful way will allow applying this technique not in merely descriptive, but in hypothesis-driven research. Analysing microbial networks in soils opens a window to a better understanding of the complexity of microbial communities. However, this approach is unfortunately often used to draw conclusions which are far beyond the scientific evidence it can provide, which has damaged its reputation for soil microbial analysis. In this Perspectives Paper, we would like to sharpen the view for the real potential of microbial co-occurrence analysis in soils, and at the same time raise awareness regarding its limitations and the many ways how it can be misused or misinterpreted.
Collapse
Affiliation(s)
- Ksenia Guseva
- Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
- Corresponding author.
| | - Sean Darcy
- Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| | - Eva Simon
- Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
- Doctoral School in Microbiology and Environmental Science, University of Vienna, Vienna, Austria
| | - Lauren V. Alteio
- Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| | - Alicia Montesinos-Navarro
- Centro de Investigaciones sobre Desertificación (CIDE, CSIC-UV-GV), Carretera de Moncada-Náquera Km 4.5, 46113, Moncada, Valencia, Spain
| | - Christina Kaiser
- Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
- Corresponding author.
| |
Collapse
|
78
|
Abstract
AbstractNetwork data often exhibit block structures characterized by clusters of nodes with similar patterns of edge formation. When such relational data are complemented by additional information on exogenous node partitions, these sources of knowledge are typically included in the model to supervise the cluster assignment mechanism or to improve inference on edge probabilities. Although these solutions are routinely implemented, there is a lack of formal approaches to test if a given external node partition is in line with the endogenous clustering structure encoding stochastic equivalence patterns among the nodes in the network. To fill this gap, we develop a formal Bayesian testing procedure which relies on the calculation of the Bayes factor between a stochastic block model with known grouping structure defined by the exogenous node partition and an infinite relational model that allows the endogenous clustering configurations to be unknown, random and fully revealed by the block–connectivity patterns in the network. A simple Markov chain Monte Carlo method for computing the Bayes factor and quantifying uncertainty in the endogenous groups is proposed. This strategy is evaluated in simulations, and in applications studying brain networks of Alzheimer’s patients.
Collapse
|
79
|
|
80
|
Wang H, Ma C, Chen HS, Lai YC, Zhang HF. Full reconstruction of simplicial complexes from binary contagion and Ising data. Nat Commun 2022; 13:3043. [PMID: 35650211 PMCID: PMC9160016 DOI: 10.1038/s41467-022-30706-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 05/13/2022] [Indexed: 11/29/2022] Open
Abstract
Previous efforts on data-based reconstruction focused on complex networks with pairwise or two-body interactions. There is a growing interest in networks with higher-order or many-body interactions, raising the need to reconstruct such networks based on observational data. We develop a general framework combining statistical inference and expectation maximization to fully reconstruct 2-simplicial complexes with two- and three-body interactions based on binary time-series data from two types of discrete-state dynamics. We further articulate a two-step scheme to improve the reconstruction accuracy while significantly reducing the computational load. Through synthetic and real-world 2-simplicial complexes, we validate the framework by showing that all the connections can be faithfully identified and the full topology of the 2-simplicial complexes can be inferred. The effects of noisy data or stochastic disturbance are studied, demonstrating the robustness of the proposed framework. Data-driven recovery of topology is challenging for networks beyond pairwise interactions. The authors propose a framework to reconstruct complex networks with higher-order interactions from time series, focusing on networks with 2-simplexes where social contagion and Ising dynamics generate binary data.
Collapse
Affiliation(s)
- Huan Wang
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Mathematical Science, Anhui University, Hefei, 230601, China
| | - Chuang Ma
- School of Internet, Anhui University, Hefei, 230601, China
| | - Han-Shuang Chen
- School of Physics and Material Science, Anhui University, Hefei, 230601, China
| | - Ying-Cheng Lai
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, 85287, USA
| | - Hai-Feng Zhang
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Mathematical Science, Anhui University, Hefei, 230601, China.
| |
Collapse
|
81
|
Koo J, Tang M, Trosset MW. Popularity Adjusted Block Models are Generalized Random Dot Product Graphs. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2081576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- John Koo
- Department of Statistics, Indiana University
| | - Minh Tang
- Department of Statistics, North Carolina State University
| | | |
Collapse
|
82
|
Analysis of OFDI Industry Linkage Network Based on Grey Incidence: Taking the Jiangsu Manufacturing Industry as an Example. SUSTAINABILITY 2022. [DOI: 10.3390/su14095680] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Based on the outward direct investment data of each manufacturing industry segment in Jiangsu Province from 2000 to 2020, this paper establishes a correlation network by constructing a grey incidence model with the average value of absolute grey incidence as the threshold. We further analyze the relationship between each manufacturing industry segment in Jiangsu Province in the process of outward direct investment from two perspectives, namely, point and surface. The study shows that from the perspective of each node, the correlation coefficient between equipment manufacturing and other industries is significantly higher, i.e., the influence of equipment manufacturing on other industries is significantly greater. Chemical raw materials and chemical products manufacturing, general equipment manufacturing, special equipment manufacturing, and transportation equipment manufacturing are the important nodes in the network. From the perspective of the network as a whole, the Jiangsu manufacturing OFDI affiliation network is not concentrated. Still, it has small-world characteristics, which are conducive to disseminating information. In contrast, the close nature of the industry has more commonalities, leading to it being more easily divided into the same module in the network block model analysis.
Collapse
|
83
|
Xiao B, Lei B, Lan W, Guo B. A blockwise network autoregressive model with application for fraud detection. ANN I STAT MATH 2022. [DOI: 10.1007/s10463-022-00822-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
84
|
Tang M, Cape J, Priebe CE. Asymptotically efficient estimators for stochastic blockmodels: The naive MLE, the rank-constrained MLE, and the spectral estimator. BERNOULLI 2022. [DOI: 10.3150/21-bej1376] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Minh Tang
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Joshua Cape
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Carey E. Priebe
- Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
85
|
Silva FN, Albeshri A, Thayananthan V, Alhalabi W, Fortunato S. Robustness modularity in complex networks. Phys Rev E 2022; 105:054308. [PMID: 35706196 DOI: 10.1103/physreve.105.054308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 04/21/2022] [Indexed: 06/15/2023]
Abstract
A basic question in network community detection is how modular a given network is. This is usually addressed by evaluating the quality of partitions detected in the network. The Girvan-Newman (GN) modularity function is the standard way to make this assessment, but it has a number of drawbacks. Most importantly, it is not clearly interpretable, given that the measure can take relatively large values on partitions of random networks without communities. Here we propose a measure based on the concept of robustness: modularity is the probability to find trivial partitions when the structure of the network is randomly perturbed. This concept can be implemented for any clustering algorithm capable of telling when a group structure is absent. Tests on artificial and real graphs reveal that robustness modularity can be used to assess and compare the strength of the community structure of different networks. We also introduce two other quality functions: modularity difference, a suitably normalized version of the GN modularity, and information modularity, a measure of distance based on information compression. Both measures are strongly correlated with robustness modularity, but have lower time complexity, so they could be used on networks whose size makes the calculation of robustness modularity too costly.
Collapse
Affiliation(s)
- Filipi N Silva
- Indiana University Network Science Institute (IUNI), Bloomington, Indiana, 47408, USA
| | - Aiiad Albeshri
- Department of Computer Science, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah 21589, Kingdom of Saudi Arabia
| | - Vijey Thayananthan
- Department of Computer Science, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah 21589, Kingdom of Saudi Arabia
| | - Wadee Alhalabi
- Department of Computer Science, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah 21589, Kingdom of Saudi Arabia
| | - Santo Fortunato
- Indiana University Network Science Institute (IUNI), Bloomington, Indiana, 47408, USA
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA
| |
Collapse
|
86
|
Vaca-Ramírez F, Peixoto TP. Systematic assessment of the quality of fit of the stochastic block model for empirical networks. Phys Rev E 2022; 105:054311. [PMID: 35706168 DOI: 10.1103/physreve.105.054311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 04/19/2022] [Indexed: 06/15/2023]
Abstract
We perform a systematic analysis of the quality of fit of the stochastic block model (SBM) for 275 empirical networks spanning a wide range of domains and orders of size magnitude. We employ posterior predictive model checking as a criterion to assess the quality of fit, which involves comparing networks generated by the inferred model with the empirical network, according to a set of network descriptors. We observe that the SBM is capable of providing an accurate description for the majority of networks considered, but falls short of saturating all modeling requirements. In particular, networks possessing a large diameter and slow-mixing random walks tend to be badly described by the SBM. However, contrary to what is often assumed, networks with a high abundance of triangles can be well described by the SBM in many cases. We demonstrate that simple network descriptors can be used to evaluate whether or not the SBM can provide a sufficiently accurate representation, potentially pointing to possible model extensions that can systematically improve the expressiveness of this class of models.
Collapse
Affiliation(s)
- Felipe Vaca-Ramírez
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria
| | - Tiago P Peixoto
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria
| |
Collapse
|
87
|
Lei J, Lin KZ. Bias-adjusted spectral clustering in multi-layer stochastic block models. J Am Stat Assoc 2022; 118:2433-2445. [PMID: 38532854 PMCID: PMC10963943 DOI: 10.1080/01621459.2022.2054817] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 03/04/2022] [Indexed: 12/31/2022]
Abstract
We consider the problem of estimating common community structures in multi-layer stochastic block models, where each single layer may not have sufficient signal strength to recover the full community structure. In order to efficiently aggregate signal across different layers, we argue that the sum-of-squared adjacency matrices contain sufficient signal even when individual layers are very sparse. Our method uses a bias-removal step that is necessary when the squared noise matrices may overwhelm the signal in the very sparse regime. The analysis of our method relies on several novel tail probability bounds for matrix linear combinations with matrix-valued coefficients and matrix-valued quadratic forms, which may be of independent interest. The performance of our method and the necessity of bias removal is demonstrated in synthetic data and in microarray analysis about gene co-expression networks.
Collapse
Affiliation(s)
- Jing Lei
- Department of Statistics and Data Science, Carnegie Mellon University, USA
| | - Kevin Z Lin
- Department of Statistics, Wharton School of Business, University of Pennsylvania, USA
| |
Collapse
|
88
|
Dynamic Community Discovery Method Based on Phylogenetic Planted Partition in Temporal Networks. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
As most of the community discovery methods are researched by static thought, some community discovery algorithms cannot represent the whole dynamic network change process efficiently. This paper proposes a novel dynamic community discovery method (Phylogenetic Planted Partition Model, PPPM) for phylogenetic evolution. Firstly, the time dimension is introduced into the typical migration partition model, and all states are treated as variables, and the observation equation is constructed. Secondly, this paper takes the observation equation of the whole dynamic social network as the constraint between variables and the error function. Then, the quadratic form of the error function is minimized. Thirdly, the Levenberg–Marquardt (L–M) method is used to calculate the gradient of the error function, and the iteration is carried out. Finally, simulation experiments are carried out under the experimental environment of artificial networks and real networks. The experimental results show that: compared with FaceNet, SBM + MLE, CLBM, and PisCES, the proposed PPPM model improves accuracy by 5% and 3%, respectively. It is proven that the proposed PPPM method is robust, reasonable, and effective. This method can also be applied to the general social networking community discovery field.
Collapse
|
89
|
Yang L, Guo Y, Gu J, Jin D, Yang B, Cao X. Probabilistic Graph Convolutional Network via Topology-Constrained Latent Space Model. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2123-2136. [PMID: 32692689 DOI: 10.1109/tcyb.2020.3005938] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Although many graph convolutional neural networks (GCNNs) have achieved superior performances in semisupervised node classification, they are designed from either the spatial or spectral perspective, yet without a general theoretical basis. Besides, most of the existing GCNNs methods tend to ignore the ubiquitous noises in the network topology and node content and are thus unable to model these uncertainties. These drawbacks certainly reduce their effectiveness in integrating network topology and node content. To provide a probabilistic perspective to the GCNNs, we model the semisupervised node classification problem as a topology-constrained probabilistic latent space model, probabilistic graph convolutional network (PGCN). By representing the nodes in a more efficient distribution form, the proposed framework can seamlessly integrate the node content and network topology. When specifying the distribution in PGCN to be a Gaussian distribution, the transductive node classification problems can be solved by the general framework and a specific method, called PGCN with the Gaussian distribution representation (PGCN-G), is proposed. To overcome the overfitting problem in covariance estimation and reduce the computational complexity, PGCN-G is further improved to PGCN-G+ by imposing the covariance matrices of all vertices to possess the identical singular vectors. The optimization algorithm based on expectation-maximization indicates that the proposed method can iteratively denoise the network topology and node content with respect to each other. Besides the effectiveness of this top-down framework demonstrated via extensive experiments, it can also be deduced to cover the existing methods, graph convolutional network, graph attention network, and Gaussian mixture model and elaborate their characteristics and relationships by specific derivations.
Collapse
|
90
|
Zheng R, Lyzinski V, Priebe CE, Tang M. Vertex nomination between graphs via spectral embedding and quadratic programming. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2060238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Runbing Zheng
- Department of Statistics, North Carolina State University
| | | | - Carey E. Priebe
- Department of Applied Mathematics and Statistics, Johns Hopkins University
| | - Minh Tang
- Department of Statistics, North Carolina State University
| |
Collapse
|
91
|
Affiliation(s)
- Yuan Zhang
- Department of Statistics, The Ohio State University
| | - Dong Xia
- Department of Mathematics, The Hong Kong University of Science and Technology
| |
Collapse
|
92
|
Fan J, Fan Y, Han X, Lv J. SIMPLE: Statistical inference on membership profiles in large networks. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12505] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Jianqing Fan
- Department of Operations Research and Financial EngineeringPrinceton University PrincetonNew JerseyUSA
| | - Yingying Fan
- Data Sciences and Operations DepartmentMarshall School of BusinessUniversity of Southern California Los AngelesCaliforniaUSA
| | - Xiao Han
- International Institute of FinanceDepartment of Statistics and FinanceUniversity of Science and Technology of China HefeiChina
| | - Jinchi Lv
- Data Sciences and Operations DepartmentMarshall School of BusinessUniversity of Southern California Los AngelesCaliforniaUSA
| |
Collapse
|
93
|
Poisson degree corrected dynamic stochastic block model. ADV DATA ANAL CLASSI 2022. [DOI: 10.1007/s11634-022-00492-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
94
|
Identifying large scale interaction atlases using probabilistic graphs and external knowledge. J Clin Transl Sci 2022; 6:e27. [PMID: 35321220 PMCID: PMC8922291 DOI: 10.1017/cts.2022.18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 12/29/2021] [Accepted: 02/07/2022] [Indexed: 11/17/2022] Open
Abstract
Introduction: Reconstruction of gene interaction networks from experimental data provides a deep understanding of the underlying biological mechanisms. The noisy nature of the data and the large size of the network make this a very challenging task. Complex approaches handle the stochastic nature of the data but can only do this for small networks; simpler, linear models generate large networks but with less reliability. Methods: We propose a divide-and-conquer approach using probabilistic graph representations and external knowledge. We cluster the experimental data and learn an interaction network for each cluster, which are merged using the interaction network for the representative genes selected for each cluster. Results: We generated an interaction atlas for 337 human pathways yielding a network of 11,454 genes with 17,777 edges. Simulated gene expression data from this atlas formed the basis for reconstruction. Based on the area under the curve of the precision-recall curve, the proposed approach outperformed the baseline (random classifier) by ∼15-fold and conventional methods by ∼5–17-fold. The performance of the proposed workflow is significantly linked to the accuracy of the clustering step that tries to identify the modularity of the underlying biological mechanisms. Conclusions: We provide an interaction atlas generation workflow optimizing the algorithm/parameter selection. The proposed approach integrates external knowledge in the reconstruction of the interactome using probabilistic graphs. Network characterization and understanding long-range effects in interaction atlases provide means for comparative analysis with implications in biomarker discovery and therapeutic approaches. The proposed workflow is freely available at http://otulab.unl.edu/atlas.
Collapse
|
95
|
Qin J, Lei J. Consistent estimation of the number of communities in stochastic block models using cross‐validation. Stat (Int Stat Inst) 2022. [DOI: 10.1002/sta4.426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Jining Qin
- Two Sigma Investments LP New York NY 10013 USA
| | - Jing Lei
- Department of Statistics and Data Science Carnegie Mellon University Pittsburgh PA 15213 USA
| |
Collapse
|
96
|
Affiliation(s)
- Jiashun Jin
- Department of Statistics, Carnegie Mellon University
| | | | - Shengming Luo
- Department of Statistics, Carnegie Mellon University
| | - Minzhe Wang
- Department of Statistics, University of Chicago
| |
Collapse
|
97
|
Affiliation(s)
- Mingao Yuan
- Department of Statistics, North Dakota State University
| | - Ruiqi Liu
- Department of Mathematical Sciences, Texas Tech University
| | - Yang Feng
- Department of Biostatistics, New York University
| | - Zuofeng Shang
- Department of Mathematical Sciences, New Jersey Institute of Technology
| |
Collapse
|
98
|
Zhang H, Guo X, Chang X. Randomized Spectral Clustering in Large-Scale Stochastic Block Models. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2034636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Hai Zhang
- Center for Modern Statistics, School of Mathematics, Northwest University, China
| | - Xiao Guo
- Center for Modern Statistics, School of Mathematics, Northwest University, China
| | - Xiangyu Chang
- Center for Intelligent Decision-Making and Machine Learning, School of Management, Xi’an Jiaotong University, China
| |
Collapse
|
99
|
Weng H, Feng Y. Community detection with nodal information: Likelihood and its variational approximation. Stat (Int Stat Inst) 2022. [DOI: 10.1002/sta4.428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Haolei Weng
- Department of Statistics and Probability Michigan State University East Lansing Michigan USA
| | - Yang Feng
- Department of Biostatistics New York University New York City New York USA
| |
Collapse
|
100
|
Passino FS, Heard NA, Rubin-Delanchy P. Spectral Clustering on Spherical Coordinates Under the Degree-Corrected Stochastic Blockmodel. Technometrics 2022. [DOI: 10.1080/00401706.2021.2008503] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|