1
|
Yassin A, Cherifi H, Seba H, Togni O. Backbone extraction through statistical edge filtering: A comparative study. PLoS One 2025; 20:e0316141. [PMID: 39752450 PMCID: PMC11698430 DOI: 10.1371/journal.pone.0316141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 12/05/2024] [Indexed: 01/06/2025] Open
Abstract
The backbone extraction process is pivotal in expediting analysis and enhancing visualization in network applications. This study systematically compares seven influential statistical hypothesis-testing backbone edge filtering methods (Disparity Filter (DF), Polya Urn Filter (PF), Marginal Likelihood Filter (MLF), Noise Corrected (NC), Enhanced Configuration Model Filter (ECM), Global Statistical Significance Filter (GloSS), and Locally Adaptive Network Sparsification Filter (LANS)) across diverse networks. A similarity analysis reveals that backbones extracted with the ECM and DF filters exhibit minimal overlap with backbones derived from their alternatives. Interestingly, ordering the other methods from GloSS to NC, PF, LANS, and MLF, we observe that each method's output encapsulates the backbone of the previous one. Correlation analysis between edge features (weight, degree, betweenness) and the test significance level reveals that the DF and LANS filters favor high-weighted edges while ECM assigns them lower significance to edges with high degrees. Furthermore, the results suggest a limited influence of the edge betweenness on the filtering process. The backbones global properties analysis (edge fraction, node fraction, weight fraction, weight entropy, reachability, number of components, and transitivity) identifies three typical behavior types for each property. Notably, the LANS filter preserves all nodes and weight entropy. In contrast, DF, PF, ECM, and GloSS significantly reduce network size. The MLF, NC, and ECM filters preserve network connectivity and weight entropy. Distribution analysis highlights the PU filter's ability to capture the original weight distribution. NC filter closely exhibits a similar capability. NC and MLF filters excel for degree distribution. These insights offer valuable guidance for selecting appropriate backbone extraction methods based on specific properties.
Collapse
Affiliation(s)
- Ali Yassin
- LIB, Université de Bourgogne, Franche-Comté, Dijon, France
- UCBL, CNRS, INSA Lyon, LIRIS, UMR5205, Univ Lyon, Villeurbanne, France
| | - Hocine Cherifi
- ICB UMR 6303 CNRS - Univ, Bourgogne - Franche-Comté, Dijon, France
| | - Hamida Seba
- UCBL, CNRS, INSA Lyon, LIRIS, UMR5205, Univ Lyon, Villeurbanne, France
| | - Olivier Togni
- LIB, Université de Bourgogne, Franche-Comté, Dijon, France
| |
Collapse
|
2
|
Neal ZP. How strong is strong? The challenge of interpreting network edge weights. PLoS One 2024; 19:e0311614. [PMID: 39361670 PMCID: PMC11449300 DOI: 10.1371/journal.pone.0311614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 09/17/2024] [Indexed: 10/05/2024] Open
Abstract
Weighted networks are information-rich and highly-flexible, but they can be difficult to analyze because the interpretation of edges weights is often ambiguous. Specifically, the meaning of a given edge's weight is locally contingent, so that a given weight may be strong for one dyad, but weak for other dyad, even in the same network. I use backbone models to distinguish strong and weak edges in a corpus of 110 weighted networks, and used the results to examine the magnitude of this ambiguity. Although strong edges have larger weights than weak edges on average, a large fraction of edges' weights provide ambiguous information about whether it is strong or weak. Based on these results, I recommend that strong edges should be identified by applying an appropriate backbone model, and that once strong edges have been identified using a backbone model, their original weights should not be directly interpreted or used in subsequent analysis.
Collapse
Affiliation(s)
- Zachary P. Neal
- Psychology Department, Michigan State University, East Lansing, MI, United States of America
| |
Collapse
|
3
|
Yassin A, Haidar A, Cherifi H, Seba H, Togni O. An evaluation tool for backbone extraction techniques in weighted complex networks. Sci Rep 2023; 13:17000. [PMID: 37813946 PMCID: PMC10562457 DOI: 10.1038/s41598-023-42076-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 09/05/2023] [Indexed: 10/11/2023] Open
Abstract
Networks are essential for analyzing complex systems. However, their growing size necessitates backbone extraction techniques aimed at reducing their size while retaining critical features. In practice, selecting, implementing, and evaluating the most suitable backbone extraction method may be challenging. This paper introduces netbone, a Python package designed for assessing the performance of backbone extraction techniques in weighted networks. Its comparison framework is the standout feature of netbone. Indeed, the tool incorporates state-of-the-art backbone extraction techniques. Furthermore, it provides a comprehensive suite of evaluation metrics allowing users to evaluate different backbones techniques. We illustrate the flexibility and effectiveness of netbone through the US air transportation network analysis. We compare the performance of different backbone extraction techniques using the evaluation metrics. We also show how users can integrate a new backbone extraction method into the comparison framework. netbone is publicly available as an open-source tool, ensuring its accessibility to researchers and practitioners. Promoting standardized evaluation practices contributes to the advancement of backbone extraction techniques and fosters reproducibility and comparability in research efforts. We anticipate that netbone will serve as a valuable resource for researchers and practitioners enabling them to make informed decisions when selecting backbone extraction techniques to gain insights into the structural and functional properties of complex systems.
Collapse
Affiliation(s)
- Ali Yassin
- Laboratoire d'Informatique de Bourgogne, University of Burgundy, Dijon, France.
| | - Abbas Haidar
- Computer Science Department, Lebanese University, Beirut, Lebanon
| | - Hocine Cherifi
- ICB UMR 6303 CNRS, Univ. Bourgogne - Franche-Comté, Dijon, France
| | - Hamida Seba
- UCBL, CNRS, INSA Lyon, LIRIS, UMR5205, Univ Lyon, 69622, Villeurbanne, France
| | - Olivier Togni
- Laboratoire d'Informatique de Bourgogne, University of Burgundy, Dijon, France
| |
Collapse
|
4
|
Souto Arias LA, Cirillo P, Oosterlee CW. Estimation of reinforced urn processes under left-truncation and right-censoring. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221223. [PMID: 36908984 PMCID: PMC9993059 DOI: 10.1098/rsos.221223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 02/20/2023] [Indexed: 06/18/2023]
Abstract
We propose a non-parametric estimator for bivariate left-truncated and right-censored observations that combines the expectation-maximization algorithm and the reinforced urn process. The resulting expectation-reinforcement algorithm allows for the inclusion of experts' knowledge in the form of a prior distribution, thus belonging to the class of Bayesian models. This can be relevant in applications where the data is incomplete, due to biases in the sampling process, as in the case of left-truncation and right-censoring. With this new approach, the distribution of the truncation variables is also recovered, granting further insight into those biases, and playing an important role in applications like prevalent cohort studies. The estimators are tested numerically using artificial and empirical datasets, and compared with other methodologies such as copula models and the Kaplan-Meier estimator.
Collapse
Affiliation(s)
| | - Pasquale Cirillo
- ZHAW School of Law and Management, Zurich University of Applied Sciences, Zurich, Switzerland
| | | |
Collapse
|
5
|
Brattig Correia R, Barrat A, Rocha LM. Contact networks have small metric backbones that maintain community structure and are primary transmission subgraphs. PLoS Comput Biol 2023; 19:e1010854. [PMID: 36821564 PMCID: PMC9949650 DOI: 10.1371/journal.pcbi.1010854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 01/06/2023] [Indexed: 02/24/2023] Open
Abstract
The structure of social networks strongly affects how different phenomena spread in human society, from the transmission of information to the propagation of contagious diseases. It is well-known that heterogeneous connectivity strongly favors spread, but a precise characterization of the redundancy present in social networks and its effect on the robustness of transmission is still lacking. This gap is addressed by the metric backbone, a weight- and connectivity-preserving subgraph that is sufficient to compute all shortest paths of weighted graphs. This subgraph is obtained via algebraically-principled axioms and does not require statistical sampling based on null-models. We show that the metric backbones of nine contact networks obtained from proximity sensors in a variety of social contexts are generally very small, 49% of the original graph for one and ranging from about 6% to 20% for the others. This reflects a surprising amount of redundancy and reveals that shortest paths on these networks are very robust to random attacks and failures. We also show that the metric backbone preserves the full distribution of shortest paths of the original contact networks-which must include the shortest inter- and intra-community distances that define any community structure-and is a primary subgraph for epidemic transmission based on pure diffusion processes. This suggests that the organization of social contact networks is based on large amounts of shortest-path redundancy which shapes epidemic spread in human populations. Thus, the metric backbone is an important subgraph with regard to epidemic spread, the robustness of social networks, and any communication dynamics that depend on complex network shortest paths.
Collapse
Affiliation(s)
- Rion Brattig Correia
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- Department of Systems Science and Industrial Engineering, Center for Social and Biomedical Complexity, Binghamton University, Binghamton New York, United States of America
| | - Alain Barrat
- Aix Marseille Univ, Université de Toulon, CNRS, CPT, Turing Center for Living Systems, Marseille, France
| | - Luis M. Rocha
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- Department of Systems Science and Industrial Engineering, Center for Social and Biomedical Complexity, Binghamton University, Binghamton New York, United States of America
| |
Collapse
|
6
|
Turiel J, Barucca P, Aste T. Simplicial Persistence of Financial Markets: Filtering, Generative Processes and Structural Risk. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1482. [PMID: 37420502 DOI: 10.3390/e24101482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 07/09/2023]
Abstract
We introduce simplicial persistence, a measure of time evolution of motifs in networks obtained from correlation filtering. We observe long memory in the evolution of structures, with a two power law decay regimes in the number of persistent simplicial complexes. Null models of the underlying time series are tested to investigate properties of the generative process and its evolutional constraints. Networks are generated with both a topological embedding network filtering technique called TMFG and by thresholding, showing that the TMFG method identifies high order structures throughout the market sample, where thresholding methods fail. The decay exponents of these long memory processes are used to characterise financial markets based on their efficiency and liquidity. We find that more liquid markets tend to have a slower persistence decay. This appears to be in contrast with the common understanding that efficient markets are more random. We argue that they are indeed less predictable for what concerns the dynamics of each single variable but they are more predictable for what concerns the collective evolution of the variables. This could imply higher fragility to systemic shocks.
Collapse
Affiliation(s)
- Jeremy Turiel
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK
- JP Morgan, 60 Victoria Embankment, London EC4Y 0JP, UK
| | - Paolo Barucca
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK
| | - Tomaso Aste
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
7
|
Gomes Ferreira CH, Murai F, Silva APC, Trevisan M, Vassio L, Drago I, Mellia M, Almeida JM. On network backbone extraction for modeling online collective behavior. PLoS One 2022; 17:e0274218. [PMID: 36107952 PMCID: PMC9477297 DOI: 10.1371/journal.pone.0274218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 08/23/2022] [Indexed: 11/18/2022] Open
Abstract
Collective user behavior in social media applications often drives several important online and offline phenomena linked to the spread of opinions and information. Several studies have focused on the analysis of such phenomena using networks to model user interactions, represented by edges. However, only a fraction of edges contribute to the actual investigation. Even worse, the often large number of non-relevant edges may obfuscate the salient interactions, blurring the underlying structures and user communities that capture the collective behavior patterns driving the target phenomenon. To solve this issue, researchers have proposed several network backbone extraction techniques to obtain a reduced and representative version of the network that better explains the phenomenon of interest. Each technique has its specific assumptions and procedure to extract the backbone. However, the literature lacks a clear methodology to highlight such assumptions, discuss how they affect the choice of a method and offer validation strategies in scenarios where no ground truth exists. In this work, we fill this gap by proposing a principled methodology for comparing and selecting the most appropriate backbone extraction method given a phenomenon of interest. We characterize ten state-of-the-art techniques in terms of their assumptions, requirements, and other aspects that one must consider to apply them in practice. We present four steps to apply, evaluate and select the best method(s) to a given target phenomenon. We validate our approach using two case studies with different requirements: online discussions on Instagram and coordinated behavior in WhatsApp groups. We show that each method can produce very different backbones, underlying that the choice of an adequate method is of utmost importance to reveal valuable knowledge about the particular phenomenon under investigation.
Collapse
Affiliation(s)
- Carlos Henrique Gomes Ferreira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Computing and Systems, Universidade Federal de Ouro Preto, João Monlevade, Minas Gerais, Brazil
- Department of Electronics and Telecommunications, Politecnico di Torino, Torino, Italy
| | - Fabricio Murai
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Ana P. C. Silva
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Martino Trevisan
- Department of Electronics and Telecommunications, Politecnico di Torino, Torino, Italy
| | - Luca Vassio
- Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
| | - Idilio Drago
- Department of Computer Science, Università di Torino, Torino, Italy
| | - Marco Mellia
- Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
| | - Jussara M. Almeida
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| |
Collapse
|
8
|
Abstract
Community structure detection is an important and valuable task in financial network studies as it forms the basis of many statistical applications such as prediction, risk analysis, and recommendation. Financial networks have a natural multi-grained structure that leads to different community structures at different levels. However, few studies pay attention to these multi-part features of financial networks. In this study, we present a geometric coarse graining method based on Voronoi regions of a financial network. Rather than studying the dense structure of the network, we perform our analysis on the triangular maximally filtering of a financial network. Such filtered topology emerges as an efficient approach because it keeps local clustering coefficients steady and it underlies the network geometry. Moreover, in order to capture changes in coarse grains geometry throughout a financial stress, we study Haantjes curvatures of paths that are the farthest from the center in each of the Voronoi regions. We performed our analysis on a network representation comprising the stock market indices BIST (Borsa Istanbul), FTSE100 (London Stock Exchange), and Nasdaq-100 Index (NASDAQ), across three financial crisis periods. Our results indicate that there are remarkable changes in the geometry of coarse grains.
Collapse
|
9
|
Neal ZP. backbone: An R package to extract network backbones. PLoS One 2022; 17:e0269137. [PMID: 35639738 PMCID: PMC9154188 DOI: 10.1371/journal.pone.0269137] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/13/2022] [Indexed: 11/19/2022] Open
Abstract
Networks are useful for representing phenomena in a broad range of domains. Although their ability to represent complexity can be a virtue, it is sometimes useful to focus on a simplified network that contains only the most important edges: the backbone. This paper introduces and demonstrates a substantially expanded version of the backbone package for R, which now provides methods for extracting backbones from weighted networks, weighted bipartite projections, and unweighted networks. For each type of network, fully replicable code is presented first for small toy examples, then for complete empirical examples using transportation, political, and social networks. The paper also demonstrates the implications of several issues of statistical inference that arise in backbone extraction. It concludes by briefly reviewing existing applications of backbone extraction using the backbone package, and future directions for research on network backbone extraction.
Collapse
Affiliation(s)
- Zachary P. Neal
- Psychology Department, Michigan State University, East Lansing, MI, United States of America
| |
Collapse
|
10
|
Analysis of the Structure and Dynamics of European Flight Networks. ENTROPY 2022; 24:e24020248. [PMID: 35205542 PMCID: PMC8870763 DOI: 10.3390/e24020248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Revised: 01/30/2022] [Accepted: 02/05/2022] [Indexed: 11/16/2022]
Abstract
We analyze structure and dynamics of flight networks of 50 airlines active in the European airspace in 2017. Our analysis shows that the concentration of the degree of nodes of different flight networks of airlines is markedly heterogeneous among airlines reflecting heterogeneity of the airline business models. We obtain an unsupervised classification of airlines by performing a hierarchical clustering that uses a correlation coefficient computed between the average occurrence profiles of 4-motifs of airline networks as similarity measure. The hierarchical tree is highly informative with respect to properties of the different airlines (for example, the number of main hubs, airline participation to intercontinental flights, regional coverage, nature of commercial, cargo, leisure or rental airline). The 4-motif patterns are therefore distinctive of each airline and reflect information about the main determinants of different airlines. This information is different from what can be found looking at the overlap of directed links.
Collapse
|
11
|
Hao M, Zhang H, Hu Z, Jiang X, Song Q, Wang X, Wang J, Liu Z, Wang X, Li Y, Jin L. Phenotype correlations reveal the relationships of physiological systems underlying human ageing. Aging Cell 2021; 20:e13519. [PMID: 34825761 PMCID: PMC8672793 DOI: 10.1111/acel.13519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 10/18/2021] [Accepted: 11/03/2021] [Indexed: 01/02/2023] Open
Abstract
Ageing is characterized by degeneration and loss of function across multiple physiological systems. To study the mechanisms and consequences of ageing, several metrics have been proposed in a hierarchical model, including biological, phenotypic and functional ageing. In particular, phenotypic ageing and interconnected changes in multiple physiological systems occur in all ageing individuals over time. Recently, phenotypic age, a new ageing measure, was proposed to capture morbidity and mortality risk across diverse subpopulations in US cohort studies. Although phenotypic age has been widely used, it may overlook the complex relationships among phenotypic biomarkers. Considering the correlation structure of these phenotypic biomarkers, we proposed a composite phenotype analysis (CPA) strategy to analyse 71 biomarkers from 2074 individuals in the Rugao Longitudinal Ageing Study. CPA grouped these biomarkers into 18 composite phenotypes according to their internal correlation, and these composite phenotypes were mostly consistent with prior findings. In addition, compared with prior findings, this strategy exhibited some different yet important implications. For example, the indicators of kidney and cardiovascular functions were tightly connected, implying internal interactions. The composite phenotypes were further verified through associations with functional metrics of ageing, including disability, depression, cognitive function and frailty. Compared to age alone, these composite phenotypes had better predictive performances for functional metrics of ageing. In summary, CPA could reveal the hidden relationships of physiological systems and identify the links between physiological systems and functional ageing metrics, thereby providing novel insights into potential mechanisms underlying human ageing.
Collapse
Affiliation(s)
- Meng Hao
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
| | - Hui Zhang
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
- National Clinical Research Center for Ageing and MedicineHuashan HospitalFudan UniversityShanghaiChina
| | - Zixin Hu
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
| | - Xiaoyan Jiang
- Key Laboratory of Arrhythmias of the Ministry of Education of ChinaTongji University School of MedicineShanghaiChina
| | - Qi Song
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
| | - Xi Wang
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
| | - Jiucun Wang
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
- Research Unit of Dissecting the Population Genetics and Developing New Technologies for Treatment and Prevention of Skin Phenotypes and Dermatological Diseases (2019RU058)Chinese Academy of Medical SciencesBeijingChina
| | - Zuyun Liu
- Center for Clinical Big Data and AnalyticsSecond Affiliated Hospital and Department of Big Data in Health ScienceSchool of Public HealthZhejiang University School of MedicineHangzhouZhejiangChina
| | - Xiaofeng Wang
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
- National Clinical Research Center for Ageing and MedicineHuashan HospitalFudan UniversityShanghaiChina
| | - Yi Li
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
- Research Unit of Dissecting the Population Genetics and Developing New Technologies for Treatment and Prevention of Skin Phenotypes and Dermatological Diseases (2019RU058)Chinese Academy of Medical SciencesBeijingChina
| | - Li Jin
- State Key Laboratory of Genetic EngineeringCollaborative Innovation Center for Genetics and DevelopmentSchool of Life Sciences and Human Phenome InstituteFudan UniversityShanghaiChina
- Research Unit of Dissecting the Population Genetics and Developing New Technologies for Treatment and Prevention of Skin Phenotypes and Dermatological Diseases (2019RU058)Chinese Academy of Medical SciencesBeijingChina
- International Human Phenome InstitutesShanghaiChina
| |
Collapse
|
12
|
Casiraghi G, Nanumyan V. Configuration models as an urn problem. Sci Rep 2021; 11:13416. [PMID: 34183694 PMCID: PMC8239003 DOI: 10.1038/s41598-021-92519-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 06/11/2021] [Indexed: 11/09/2022] Open
Abstract
A fundamental issue of network data science is the ability to discern observed features that can be expected at random from those beyond such expectations. Configuration models play a crucial role there, allowing us to compare observations against degree-corrected null-models. Nonetheless, existing formulations have limited large-scale data analysis applications either because they require expensive Monte-Carlo simulations or lack the required flexibility to model real-world systems. With the generalized hypergeometric ensemble, we address both problems. To achieve this, we map the configuration model to an urn problem, where edges are represented as balls in an appropriately constructed urn. Doing so, we obtain the generalized hypergeometric ensemble of random graphs: a random graph model reproducing and extending the properties of standard configuration models, with the critical advantage of a closed-form probability distribution.
Collapse
|
13
|
Iacopini I, Di Bona G, Ubaldi E, Loreto V, Latora V. Interacting Discovery Processes on Complex Networks. PHYSICAL REVIEW LETTERS 2020; 125:248301. [PMID: 33412072 DOI: 10.1103/physrevlett.125.248301] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 10/22/2020] [Accepted: 11/05/2020] [Indexed: 06/12/2023]
Abstract
Innovation is the driving force of human progress. Recent urn models reproduce well the dynamics through which the discovery of a novelty may trigger further ones, in an expanding space of opportunities, but neglect the effects of social interactions. Here we focus on the mechanisms of collective exploration, and we propose a model in which many urns, representing different explorers, are coupled through the links of a social network and exploit opportunities coming from their contacts. We study different network structures showing, both analytically and numerically, that the pace of discovery of an explorer depends on its centrality in the social network. Our model sheds light on the role that social structures play in discovery processes.
Collapse
Affiliation(s)
- Iacopo Iacopini
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- Centre for Advanced Spatial Analysis, University College London, London W1T 4TJ, United Kingdom
- The Alan Turing Institute, The British Library, London NW1 2DB, United Kingdom
| | - Gabriele Di Bona
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- Scuola Superiore di Catania, Università di Catania, Via Valdisavoia 9, 95123 Catania, Italy
| | - Enrico Ubaldi
- Sony Computer Science Laboratories, 6 Rue Amyot, 75005 Paris, France
| | - Vittorio Loreto
- Sony Computer Science Laboratories, 6 Rue Amyot, 75005 Paris, France
- Sapienza University of Rome, Physics Department, Piazzale Aldo Moro 5, 00185 Rome, Italy
- Complexity Science Hub Vienna, A-1080 Vienna, Austria
| | - Vito Latora
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
- The Alan Turing Institute, The British Library, London NW1 2DB, United Kingdom
- Complexity Science Hub Vienna, A-1080 Vienna, Austria
- Dipartimento di Fisica ed Astronomia, Università di Catania and INFN, I-95123 Catania, Italy
| |
Collapse
|
14
|
Marcaccioli R, Livan G. Correspondence between temporal correlations in time series, inverse problems, and the spherical model. Phys Rev E 2020; 102:012112. [PMID: 32795068 DOI: 10.1103/physreve.102.012112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 06/16/2020] [Indexed: 11/07/2022]
Abstract
In this paper we employ methods from statistical mechanics to model temporal correlations in time series. We put forward a methodology based on the maximum entropy principle to generate ensembles of time series constrained to preserve part of the temporal structure of an empirical time series of interest. We show that a constraint on the lag-one autocorrelation can be fully handled analytically and corresponds to the well-known spherical model of a ferromagnet. We then extend such a model to include constraints on more complex temporal correlations by means of perturbation theory, showing that this leads to substantial improvements in capturing the lag-one autocorrelation in the variance. We apply our approach on synthetic data and illustrate how it can be used to formulate expectations on the future values of a data-generating process.
Collapse
Affiliation(s)
- Riccardo Marcaccioli
- Department of Computer Science, University College London, 66-72 Gower Street, London WC1E 6EA, United Kingdom
| | - Giacomo Livan
- Department of Computer Science, University College London, 66-72 Gower Street, London WC1E 6EA, United Kingdom.,Systemic Risk Centre, London School of Economics and Political Sciences, Houghton Street, London WC2A 2AE, United Kingdom
| |
Collapse
|
15
|
Turiel JD, Aste T. Peer-to-peer loan acceptance and default prediction with artificial intelligence. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191649. [PMID: 32742678 PMCID: PMC7353984 DOI: 10.1098/rsos.191649] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 05/18/2020] [Indexed: 06/11/2023]
Abstract
Logistic regression (LR) and support vector machine algorithms, together with linear and nonlinear deep neural networks (DNNs), are applied to lending data in order to replicate lender acceptance of loans and predict the likelihood of default of issued loans. A two-phase model is proposed; the first phase predicts loan rejection, while the second one predicts default risk for approved loans. LR was found to be the best performer for the first phase, with test set recall macro score of 77.4 % . DNNs were applied to the second phase only, where they achieved best performance, with test set recall score of 72 % , for defaults. This shows that artificial intelligence can improve current credit risk models reducing the default risk of issued loans by as much as 70 % . The models were also applied to loans taken for small businesses alone. The first phase of the model performs significantly better when trained on the whole dataset. Instead, the second phase performs significantly better when trained on the small business subset. This suggests a potential discrepancy between how these loans are screened and how they should be analysed in terms of default prediction.
Collapse
Affiliation(s)
- J. D. Turiel
- Department of Computer Science, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - T. Aste
- Department of Computer Science, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
- UCL Centre for Blockchain Technologies, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
- Systemic Risk Centre, London School of Economics and Political Sciences, Houghton Street, London WC2A 2AE, UK
| |
Collapse
|
16
|
Kobayashi T, Takaguchi T, Barrat A. The structured backbone of temporal social ties. Nat Commun 2019; 10:220. [PMID: 30644392 PMCID: PMC6333776 DOI: 10.1038/s41467-018-08160-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 12/13/2018] [Indexed: 11/28/2022] Open
Abstract
In many data sets, information on the structure and temporality of a system coexists with noise and non-essential elements. In networked systems for instance, some edges might be non-essential or exist only by chance. Filtering them out and extracting a set of relevant connections is a non-trivial task. Moreover, mehods put forward until now do not deal with time-resolved network data, which have become increasingly available. Here we develop a method for filtering temporal network data, by defining an adequate temporal null model that allows us to identify pairs of nodes having more interactions than expected given their activities: the significant ties. Moreover, our method can assign a significance to complex structures such as triads of simultaneous interactions, an impossible task for methods based on static representations. Our results hint at ways to represent temporal networks for use in data-driven models.
Collapse
Affiliation(s)
- Teruyoshi Kobayashi
- Graduate School of Economics, Center for Computational Social Science, Kobe University, Kobe, Japan
| | | | - Alain Barrat
- Aix Marseille Univ, Université de Toulon, CNRS, CPT, Marseille, 13009, France.
- Data Science Laboratory, ISI Foundation, Torino, 10126, Italy.
| |
Collapse
|