1
|
Zhan Q, Tiedje KE, Day KP, Pascual M. From multiplicity of infection to force of infection for sparsely sampled Plasmodium falciparum populations at high transmission. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.12.24302148. [PMID: 38853963 PMCID: PMC11160831 DOI: 10.1101/2024.02.12.24302148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
High multiplicity of infection or MOI, the number of genetically distinct parasite strains co-infecting a single human host, characterizes infectious diseases including falciparum malaria at high transmission. It accompanies high asymptomatic Plasmodium falciparum prevalence despite high exposure, creating a large transmission reservoir challenging intervention. High MOI and asymptomatic prevalence are enabled by immune evasion of the parasite achieved via vast antigenic diversity. Force of infection or FOI, the number of new infections acquired by an individual host over a given time interval, is the dynamic sister quantity of MOI, and a key epidemiological parameter for monitoring the impact of antimalarial interventions and assessing vaccine or drug efficacy in clinical trials. FOI remains difficult, expensive, and labor-intensive to accurately measure, especially in high-transmission regions, whether directly via cohort studies or indirectly via the fitting of epidemiological models to repeated cross-sectional surveys. We propose here the application of queuing theory to obtain FOI on the basis of MOI, in the form of either a two-moment approximation method or Little's law. We illustrate these methods with MOI estimates obtained under sparse sampling schemes with the recently proposed " v a r coding" method, based on sequences of the v a r multigene family encoding for the major variant surface antigen of the blood stage of malaria infection. The methods are evaluated with simulation output from a stochastic agent-based model, and are applied to an interrupted time-series study from Bongo District in northern Ghana before and immediately after a three-round transient indoor residual spraying (IRS) intervention. We incorporate into the sampling of the simulation output, limitations representative of those encountered in the collection of field data, including under-sampling of v a r genes, missing data, and usage of antimalarial drug treatment. We address these limitations in MOI estimates with a Bayesian framework and an imputation bootstrap approach. We demonstrate that both proposed methods give good and consistent FOI estimates across various simulated scenarios. Their application to the field surveys shows a pronounced reduction in annual FOI during intervention, of more than 70%. The proposed approach should be applicable to the many geographical locations where cohort or cross-sectional studies with regular and frequent sampling are lacking but single-time-point surveys under sparse sampling schemes are available, and for MOI estimates obtained in different ways. They should also be relevant to other pathogens of humans, wildlife and livestock whose immune evasion strategies are based on large antigenic variation resulting in high multiplicity of infection.
Collapse
Affiliation(s)
- Qi Zhan
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, IL, USA
| | - Kathryn E. Tiedje
- Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University of Melbourne, Melbourne, Australia
| | - Karen P. Day
- Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University of Melbourne, Melbourne, Australia
| | - Mercedes Pascual
- Department of Biology, New York University, New York, NY, USA
- Department of Environmental Studies, New York University, New York, NY, USA
- Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
2
|
Zhan Q, He Q, Tiedje KE, Day KP, Pascual M. Hyper-diverse antigenic variation and resilience to transmission-reducing intervention in falciparum malaria. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.01.24301818. [PMID: 38370729 PMCID: PMC10871444 DOI: 10.1101/2024.02.01.24301818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Intervention against falciparum malaria in high transmission regions remains challenging, with relaxation of control efforts typically followed by rapid resurgence. Resilience to intervention co-occurs with incomplete immunity, whereby children eventually become protected from severe disease but not infection and a large transmission reservoir results from high asymptomatic prevalence across all ages. Incomplete immunity relates to the vast antigenic variation of the parasite, with the major surface antigen of the blood stage of infection encoded by the multigene family known as var. Recent deep sampling of var sequences from individual isolates in northern Ghana showed that parasite population structure exhibited persistent features of high-transmission regions despite the considerable decrease in prevalence during transient intervention with indoor residual spraying (IRS). We ask whether despite such apparent limited impact, the transmission system had been brought close to a transition in both prevalence and resurgence ability. With a stochastic agent-based model, we investigate the existence of such a transition to pre-elimination with intervention intensity, and of molecular indicators informative of its approach. We show that resurgence ability decreases sharply and nonlinearly across a narrow region of intervention intensities in model simulations, and identify informative molecular indicators based on var gene sequences. Their application to the survey data indicates that the transmission system in northern Ghana was brought close to transition by IRS. These results suggest that sustaining and intensifying intervention would have pushed malaria dynamics to a slow-rebound regime with an increased probability of local parasite extinction.
Collapse
Affiliation(s)
- Qi Zhan
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago; Chicago, IL, 60637, USA
| | - Qixin He
- Department of Biological Sciences, Purdue University; West Lafayette, IN, 47907, USA
| | - Kathryn E Tiedje
- Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University of Melbourne; Melbourne, Australia
| | - Karen P Day
- Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University of Melbourne; Melbourne, Australia
| | - Mercedes Pascual
- Department of Biology, New York University; New York, NY, 10012, USA
- Department of Environmental Studies, New York University; New York, NY, 10012, USA
- Santa Fe Institute; Santa Fe, NM, 87501, USA
| |
Collapse
|
3
|
Zhang J, Li C, Wang J. A stochastic block Ising model for multi-layer networks with inter-layer dependence. Biometrics 2023; 79:3564-3573. [PMID: 37284764 DOI: 10.1111/biom.13885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/26/2023] [Indexed: 06/08/2023]
Abstract
Community detection has attracted tremendous interests in network analysis, which aims at finding group of nodes with similar characteristics. Various detection methods have been developed to detect homogeneous communities in multi-layer networks, where inter-layer dependence is a widely acknowledged but severely under-investigated issue. In this paper, we propose a novel stochastic block Ising model (SBIM) to incorporate the inter-layer dependence to help with community detection in multi-layer networks. The community structure is modeled by the stochastic block model (SBM) and the inter-layer dependence is incorporated via the popular Ising model. Furthermore, we develop an efficient variational EM algorithm to tackle the resultant optimization task and establish the asymptotic consistency of the proposed method. Extensive simulated examples and a real example on gene co-expression multi-layer network data are also provided to demonstrate the advantage of the proposed method.
Collapse
Affiliation(s)
- Jingnan Zhang
- International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Chengye Li
- School of Data Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Junhui Wang
- Department of Statistics, The Chinese University of Hong Kong, New Territories, Hong Kong
| |
Collapse
|
4
|
Ghavasieh A, De Domenico M. Generalized network density matrices for analysis of multiscale functional diversity. Phys Rev E 2023; 107:044304. [PMID: 37198772 DOI: 10.1103/physreve.107.044304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 02/13/2023] [Indexed: 05/19/2023]
Abstract
The network density matrix formalism allows for describing the dynamics of information on top of complex structures and it has been successfully used to analyze, e.g., a system's robustness, perturbations, coarse-graining multilayer networks, characterization of emergent network states, and performing multiscale analysis. However, this framework is usually limited to diffusion dynamics on undirected networks. Here, to overcome some limitations, we propose an approach to derive density matrices based on dynamical systems and information theory, which allows for encapsulating a much wider range of linear and nonlinear dynamics and richer classes of structure, such as directed and signed ones. We use our framework to study the response to local stochastic perturbations of synthetic and empirical networks, including neural systems consisting of excitatory and inhibitory links and gene-regulatory interactions. Our findings demonstrate that topological complexity does not necessarily lead to functional diversity, i.e., the complex and heterogeneous response to stimuli or perturbations. Instead, functional diversity is a genuine emergent property which cannot be deduced from the knowledge of topological features such as heterogeneity, modularity, the presence of asymmetries, and dynamical properties of a system.
Collapse
Affiliation(s)
- Arsham Ghavasieh
- Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Italy
- Department of Physics, University of Trento, Via Sommarive 14, 38123 Povo, Trento, Italy
| | - Manlio De Domenico
- Department of Physics and Astronomy "Galileo Galilei," University of Padova, 35131 Padova, Padova, Italy
- Padua Center for Network Medicine, University of Padova, 35122 Padova, Padova, Italy
- Istituto Nazionale di Fisica Nucleare, Sezione di Padova, 35131 Padova Padova, Italy
| |
Collapse
|
5
|
Borges DGF, Carvalho DS, Bomfim GC, Ramos PIP, Brzozowski J, Góes-Neto A, F. S. Andrade R, El-Hani C. On the origin of mitochondria: a multilayer network approach. PeerJ 2023; 11:e14571. [PMID: 36632145 PMCID: PMC9828282 DOI: 10.7717/peerj.14571] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 11/28/2022] [Indexed: 01/08/2023] Open
Abstract
Backgound The endosymbiotic theory is widely accepted to explain the origin of mitochondria from a bacterial ancestor. While ample evidence supports the intimate connection of Alphaproteobacteria to the mitochondrial ancestor, pinpointing its closest relative within sampled Alphaproteobacteria is still an open evolutionary debate. Many different phylogenetic methods and approaches have been used to answer this challenging question, further compounded by the heterogeneity of sampled taxa, varying evolutionary rates of mitochondrial proteins, and the inherent biases in each method, all factors that can produce phylogenetic artifacts. By harnessing the simplicity and interpretability of protein similarity networks, herein we re-evaluated the origin of mitochondria within an enhanced multilayer framework, which is an extension and improvement of a previously developed method. Methods We used a dataset of eight proteins found in mitochondria (N = 6 organisms) and bacteria (N = 80 organisms). The sequences were aligned and resulting identity matrices were combined to generate an eight-layer multiplex network. Each layer corresponded to a protein network, where nodes represented organisms and edges were placed following mutual sequence identity. The Multi-Newman-Girvan algorithm was applied to evaluate community structure, and bifurcation events linked to network partition allowed to trace patterns of divergence between studied taxa. Results In our network-based analysis, we first examined the topology of the 8-layer multiplex when mitochondrial sequences disconnected from the main alphaproteobacterial cluster. The resulting topology lent firm support toward an Alphaproteobacteria-sister placement for mitochondria, reinforcing the hypothesis that mitochondria diverged from the common ancestor of all Alphaproteobacteria. Additionally, we observed that the divergence of Rickettsiales was an early event in the evolutionary history of alphaproteobacterial clades. Conclusion By leveraging complex networks methods to the challenging question of circumscribing mitochondrial origin, we suggest that the entire Alphaproteobacteria clade is the closest relative to mitochondria (Alphaproteobacterial-sister hypothesis), echoing recent findings based on different datasets and methodologies.
Collapse
Affiliation(s)
| | - Daniel S. Carvalho
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Gilberto C. Bomfim
- Institute of Biology, Federal University of Bahia, Salvador, Bahia, Brazil
| | | | - Jerzy Brzozowski
- Philosophy Department, Federal University of Santa Catarina, Florianópolis, Santa Catarina, Brazil
| | - Aristóteles Góes-Neto
- Institute of Biological Sciences, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil,Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Roberto F. S. Andrade
- Institute of Physics, Federal University of Bahia, Salvador, Bahia, Brazil,National Institute of Science and Technology in Interdisciplinary and Transdisciplinary Studies in Ecology and Evolution (INCT IN-TREE), Salvador, Bahia, Brazil
| | - Charbel El-Hani
- Institute of Biology, Federal University of Bahia, Salvador, Bahia, Brazil,National Institute of Science and Technology in Interdisciplinary and Transdisciplinary Studies in Ecology and Evolution (INCT IN-TREE), Salvador, Bahia, Brazil
| |
Collapse
|
6
|
Lyu Z, Xia D, Zhang Y. Latent Space Model for Higher-order Networks and Generalized Tensor Decomposition. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2022.2164289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Zhongyuan Lyu
- Department of Mathematics, Hong Kong University of Science and Technology
| | - Dong Xia
- Department of Mathematics, Hong Kong University of Science and Technology
| | - Yuan Zhang
- Department of Statistics, Ohio State University
| |
Collapse
|
7
|
Link predictability classes in large node-attributed networks. SOCIAL NETWORK ANALYSIS AND MINING 2022. [DOI: 10.1007/s13278-022-00912-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Vaca-Ramírez F, Peixoto TP. Systematic assessment of the quality of fit of the stochastic block model for empirical networks. Phys Rev E 2022; 105:054311. [PMID: 35706168 DOI: 10.1103/physreve.105.054311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 04/19/2022] [Indexed: 06/15/2023]
Abstract
We perform a systematic analysis of the quality of fit of the stochastic block model (SBM) for 275 empirical networks spanning a wide range of domains and orders of size magnitude. We employ posterior predictive model checking as a criterion to assess the quality of fit, which involves comparing networks generated by the inferred model with the empirical network, according to a set of network descriptors. We observe that the SBM is capable of providing an accurate description for the majority of networks considered, but falls short of saturating all modeling requirements. In particular, networks possessing a large diameter and slow-mixing random walks tend to be badly described by the SBM. However, contrary to what is often assumed, networks with a high abundance of triangles can be well described by the SBM in many cases. We demonstrate that simple network descriptors can be used to evaluate whether or not the SBM can provide a sufficiently accurate representation, potentially pointing to possible model extensions that can systematically improve the expressiveness of this class of models.
Collapse
Affiliation(s)
- Felipe Vaca-Ramírez
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria
| | - Tiago P Peixoto
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria
| |
Collapse
|
9
|
Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method. ENTROPY 2022; 24:e24050626. [PMID: 35626512 PMCID: PMC9142054 DOI: 10.3390/e24050626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/26/2022] [Accepted: 04/27/2022] [Indexed: 11/16/2022]
Abstract
This paper proposes a meaningful and effective extension of the celebrated K-means algorithm to detect communities in feature-rich networks, due to our assumption of non-summability mode. We least-squares approximate given matrices of inter-node links and feature values, leading to a straightforward extension of the conventional K-means clustering method as an alternating minimization strategy for the criterion. This works in a two-fold space, embracing both the network nodes and features. The metric used is a weighted sum of the squared Euclidean distances in the feature and network spaces. To tackle the so-called curse of dimensionality, we extend this to a version that uses the cosine distances between entities and centers. One more version of our method is based on the Manhattan distance metric. We conduct computational experiments to test our method and compare its performances with those by competing popular algorithms at synthetic and real-world datasets. The cosine-based version of the extended K-means typically wins at the high-dimension real-world datasets. In contrast, the Manhattan-based version wins at most synthetic datasets.
Collapse
|
10
|
Moshiri M, Safaei F. Application of hyperbolic geometry of multiplex networks under layer link-based attacks. CHAOS (WOODBURY, N.Y.) 2022; 32:021105. [PMID: 35232029 DOI: 10.1063/5.0073952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 01/20/2022] [Indexed: 06/14/2023]
Abstract
At present, network science can be considered one of the prosperous scientific fields. The multi-layered network approach is a recent development in this area and focuses on identifying the interactions of several interconnected networks. In this paper, we propose a new method for predicting redundant links for multiplex networks using the similarity criterion based on the hyperbolic distance of the node pairs. We retrieve lost links found on various attack strategies in multiplex networks by predicting redundant links in these networks using the proffered method. We applied the recommended algorithm to real-world multiplex networks, and the numerical simulations show its superiority over other advanced algorithms. During the studies and numerical simulations, the power of the hyperbolic geometry criterion over different standard and current methods based on link prediction used for network retrieval is evident, especially in the case of attacks based on the edge betweenness and random strategies illustrated in the results.
Collapse
Affiliation(s)
- Mahdi Moshiri
- Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran
| | - Farshad Safaei
- Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
11
|
Jing BY, Li T, Lyu Z, Xia D. Community detection on mixture multilayer networks via regularized tensor decomposition. Ann Stat 2021. [DOI: 10.1214/21-aos2079] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Bing-Yi Jing
- Department of Mathematics, The Hong Kong University of Science and Technology
| | - Ting Li
- Department of Applied Mathematics, The Hong Kong Polytechnic University
| | - Zhongyuan Lyu
- Department of Mathematics, The Hong Kong University of Science and Technology
| | - Dong Xia
- Department of Mathematics, The Hong Kong University of Science and Technology
| |
Collapse
|
12
|
Kuang J, Scoglio C. Layer reconstruction and missing link prediction of a multilayer network with maximum a posteriori estimation. Phys Rev E 2021; 104:024301. [PMID: 34525660 PMCID: PMC8445383 DOI: 10.1103/physreve.104.024301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 07/16/2021] [Indexed: 04/23/2023]
Abstract
From social networks to biological networks, different types of interactions among the same set of nodes characterize distinct layers, which are termed multilayer networks. Within a multilayer network, some layers, confirmed through different experiments, could be structurally similar and interdependent. In this paper, we propose a maximum a posteriori-based method to study and reconstruct the structure of a target layer in a multilayer network. Nodes within the target layer are characterized by vectors, which are employed to compute edge weights. Further, to detect structurally similar layers, we propose a method for comparing networks based on the eigenvector centrality. Using similar layers, we obtain the parameters of the conjugate prior. With this maximum a posteriori algorithm, we can reconstruct the target layer and predict missing links. We test the method on two real multilayer networks, and the results show that the maximum a posteriori estimation is promising in reconstructing the target layer even when a large number of links is missing.
Collapse
|
13
|
Summable and nonsummable data-driven models for community detection in feature-rich networks. SOCIAL NETWORK ANALYSIS AND MINING 2021. [DOI: 10.1007/s13278-021-00774-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
14
|
Shalileh S, Mirkin B. Least-squares community extraction in feature-rich networks using similarity data. PLoS One 2021; 16:e0254377. [PMID: 34264961 PMCID: PMC8282089 DOI: 10.1371/journal.pone.0254377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 06/24/2021] [Indexed: 12/02/2022] Open
Abstract
We explore a doubly-greedy approach to the issue of community detection in feature-rich networks. According to this approach, both the network and feature data are straightforwardly recovered from the underlying unknown non-overlapping communities, supplied with a center in the feature space and intensity weight(s) over the network each. Our least-squares additive criterion allows us to search for communities one-by-one and to find each community by adding entities one by one. A focus of this paper is that the feature-space data part is converted into a similarity matrix format. The similarity/link values can be used in either of two modes: (a) as measured in the same scale so that one may can meaningfully compare and sum similarity values across the entire similarity matrix (summability mode), and (b) similarity values in one column should not be compared with the values in other columns (nonsummability mode). The two input matrices and two modes lead us to developing four different Iterative Community Extraction from Similarity data (ICESi) algorithms, which determine the number of communities automatically. Our experiments at real-world and synthetic datasets show that these algorithms are valid and competitive.
Collapse
Affiliation(s)
- Soroosh Shalileh
- Department of Data Analysis and Artificial Intelligence, HSE University, Moscow, Russian Federation
- Laboratory of Methods for Big Data Analysis, HSE University, Moscow, Russian Federation
| | - Boris Mirkin
- Department of Data Analysis and Artificial Intelligence, HSE University, Moscow, Russian Federation
- Department of Computer Science and Information Systems, Birkbeck University of London, London, United Kingdom
- * E-mail:
| |
Collapse
|
15
|
He Q, Pilosof S, Tiedje KE, Day KP, Pascual M. Frequency-Dependent Competition Between Strains Imparts Persistence to Perturbations in a Model of Plasmodium falciparum Malaria Transmission. Front Ecol Evol 2021; 9. [PMID: 35433714 PMCID: PMC9012452 DOI: 10.3389/fevo.2021.633263] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In high-transmission endemic regions, local populations of Plasmodium falciparum exhibit vast diversity of the var genes encoding its major surface antigen, with each parasite comprising multiple copies from this diverse gene pool. This strategy to evade the immune system through large combinatorial antigenic diversity is common to other hyperdiverse pathogens. It underlies a series of fundamental epidemiological characteristics, including large reservoirs of transmission from high prevalence of asymptomatics and long-lasting infections. Previous theory has shown that negative frequency-dependent selection (NFDS) mediated by the acquisition of specific immunity by hosts structures the diversity of var gene repertoires, or strains, in a pattern of limiting similarity that is both non-random and non-neutral. A combination of stochastic agent-based models and network analyses has enabled the development and testing of theory in these complex adaptive systems, where assembly of local parasite diversity occurs under frequency-dependent selection and large pools of variation. We show here the application of these approaches to theory comparing the response of the malaria transmission system to intervention when strain diversity is assembled under (competition-based) selection vs. a form of neutrality, where immunity depends only on the number but not the genetic identity of previous infections. The transmission system is considerably more persistent under NFDS, exhibiting a lower extinction probability despite comparable prevalence during intervention. We explain this pattern on the basis of the structure of strain diversity, in particular the more pronounced fraction of highly dissimilar parasites. For simulations that survive intervention, prevalence under specific immunity is lower than under neutrality, because the recovery of diversity is considerably slower than that of prevalence and decreased var gene diversity reduces parasite transmission. A Principal Component Analysis of network features describing parasite similarity reveals that despite lower overall diversity, NFDS is quickly restored after intervention constraining strain structure and maintaining patterns of limiting similarity important to parasite persistence. Given the described enhanced persistence under perturbation, intervention efforts will likely require longer times than the usual practice to eliminate P. falciparum populations. We discuss implications of our findings and potential analogies for ecological communities with non-neutral assembly processes involving frequency-dependence.
Collapse
Affiliation(s)
- Qixin He
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Shai Pilosof
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Kathryn E. Tiedje
- Department of Microbiology and Immunology, Bio21 Institute, The University of Melbourne, Melbourne, VIC, Australia
| | - Karen P. Day
- Department of Microbiology and Immunology, Bio21 Institute, The University of Melbourne, Melbourne, VIC, Australia
| | - Mercedes Pascual
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, United States
- Santa Fe Institute, Santa Fe, NM, United States
- Correspondence: Mercedes Pascual,
| |
Collapse
|
16
|
Pamfil AR, Howison SD, Porter MA. Inference of edge correlations in multilayer networks. Phys Rev E 2021; 102:062307. [PMID: 33466038 DOI: 10.1103/physreve.102.062307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Accepted: 09/14/2020] [Indexed: 11/07/2022]
Abstract
Many recent developments in network analysis have focused on multilayer networks, which one can use to encode time-dependent interactions, multiple types of interactions, and other complications that arise in complex systems. Like their monolayer counterparts, multilayer networks in applications often have mesoscale features, such as community structure. A prominent approach for inferring such structures is the employment of multilayer stochastic block models (SBMs). A common (but potentially inadequate) assumption of these models is the sampling of edges in different layers independently, conditioned on the community labels of the nodes. In this paper, we relax this assumption of independence by incorporating edge correlations into an SBM-like model. We derive maximum-likelihood estimates of the key parameters of our model, and we propose a measure of layer correlation that reflects the similarity between the connectivity patterns in different layers. Finally, we explain how to use correlated models for edge "prediction" (i.e., inference) in multilayer networks. By incorporating edge correlations, we find that prediction accuracy improves both in synthetic networks and in a temporal network of shoppers who are connected to previously purchased grocery products.
Collapse
Affiliation(s)
- A Roxana Pamfil
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Sam D Howison
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Mason A Porter
- Department of Mathematics, University of California, Los Angeles, Los Angeles, California 90095, USA and Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| |
Collapse
|
17
|
Childs LM, Larremore DB. Network Models for Malaria: Antigens, Dynamics, and Evolution Over Space and Time. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11512-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
18
|
Yen TC, Larremore DB. Community detection in bipartite networks with stochastic block models. Phys Rev E 2020; 102:032309. [PMID: 33075933 DOI: 10.1103/physreve.102.032309] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 07/23/2020] [Indexed: 11/07/2022]
Abstract
In bipartite networks, community structures are restricted to being disassortative, in that nodes of one type are grouped according to common patterns of connection with nodes of the other type. This makes the stochastic block model (SBM), a highly flexible generative model for networks with block structure, an intuitive choice for bipartite community detection. However, typical formulations of the SBM do not make use of the special structure of bipartite networks. Here we introduce a Bayesian nonparametric formulation of the SBM and a corresponding algorithm to efficiently find communities in bipartite networks which parsimoniously chooses the number of communities. The biSBM improves community detection results over general SBMs when data are noisy, improves the model resolution limit by a factor of sqrt[2], and expands our understanding of the complicated optimization landscape associated with community detection tasks. A direct comparison of certain terms of the prior distributions in the biSBM and a related high-resolution hierarchical SBM also reveals a counterintuitive regime of community detection problems, populated by smaller and sparser networks, where nonhierarchical models outperform their more flexible counterpart.
Collapse
Affiliation(s)
- Tzu-Chi Yen
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Daniel B Larremore
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA.,BioFrontiers Institute, University of Colorado, Boulder, Colorado 80303, USA
| |
Collapse
|
19
|
Peixoto TP. Merge-split Markov chain Monte Carlo for community detection. Phys Rev E 2020; 102:012305. [PMID: 32794904 DOI: 10.1103/physreve.102.012305] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 06/19/2020] [Indexed: 11/07/2022]
Abstract
We present a Markov chain Monte Carlo scheme based on merges and splits of groups that is capable of efficiently sampling from the posterior distribution of network partitions, defined according to the stochastic block model (SBM). We demonstrate how schemes based on the move of single nodes between groups systematically fail at correctly sampling from the posterior distribution even on small networks, and how our merge-split approach behaves significantly better, and improves the mixing time of the Markov chain by several orders of magnitude in typical cases. We also show how the scheme can be straightforwardly extended to nested versions of the SBM, yielding asymptotically exact samples of hierarchical network partitions.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Department of Network and Data Science, Central European University, H-1051 Budapest, Hungary; ISI Foundation, Via Chisola 5, 10126 Torino, Italy; and Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom
| |
Collapse
|
20
|
Funke T, Becker T. Stochastic block models: A comparison of variants and inference methods. PLoS One 2019; 14:e0215296. [PMID: 31013290 PMCID: PMC6478296 DOI: 10.1371/journal.pone.0215296] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 03/30/2019] [Indexed: 11/19/2022] Open
Abstract
Finding communities in complex networks is a challenging task and one promising approach is the Stochastic Block Model (SBM). But the influences from various fields led to a diversity of variants and inference methods. Therefore, a comparison of the existing techniques and an independent analysis of their capabilities and weaknesses is needed. As a first step, we review the development of different SBM variants such as the degree-corrected SBM of Karrer and Newman or Peixoto's hierarchical SBM. Beside stating all these variants in a uniform notation, we show the reasons for their development. Knowing the variants, we discuss a variety of approaches to infer the optimal partition like the Metropolis-Hastings algorithm. We perform our analysis based on our extension of the Girvan-Newman test and the Lancichinetti-Fortunato-Radicchi benchmark as well as a selection of some real world networks. Using these results, we give some guidance to the challenging task of selecting an inference method and SBM variant. In addition, we give a simple heuristic to determine the number of steps for the Metropolis-Hastings algorithms that lack a usual stop criterion. With our comparison, we hope to guide researches in the field of SBM and highlight the problem of existing techniques to focus future research. Finally, by making our code freely available, we want to promote a faster development, integration and exchange of new ideas.
Collapse
Affiliation(s)
- Thorben Funke
- Production Systems and Logistic Systems, BIBA - Bremer Institut für Produktion und Logistik GmbH at the University of Bremen, Bremen, Bremen, Germany
- Faculty of Production Engineering, University of Bremen, Bremen, Bremen, Germany
| | - Till Becker
- Production Systems and Logistic Systems, BIBA - Bremer Institut für Produktion und Logistik GmbH at the University of Bremen, Bremen, Bremen, Germany
- Faculty of Business Studies, University of Applied Sciences Emden/Leer, Emden, Lower Saxony, Germany
| |
Collapse
|
21
|
Corel E, Méheust R, Watson AK, McInerney JO, Lopez P, Bapteste E. Bipartite Network Analysis of Gene Sharings in the Microbial World. Mol Biol Evol 2019; 35:899-913. [PMID: 29346651 PMCID: PMC5888944 DOI: 10.1093/molbev/msy001] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Extensive microbial gene flows affect how we understand virology, microbiology, medical sciences, genetic modification, and evolutionary biology. Phylogenies only provide a narrow view of these gene flows: plasmids and viruses, lacking core genes, cannot be attached to cellular life on phylogenetic trees. Yet viruses and plasmids have a major impact on cellular evolution, affecting both the gene content and the dynamics of microbial communities. Using bipartite graphs that connect up to 149,000 clusters of homologous genes with 8,217 related and unrelated genomes, we can in particular show patterns of gene sharing that do not map neatly with the organismal phylogeny. Homologous genes are recycled by lateral gene transfer, and multiple copies of homologous genes are carried by otherwise completely unrelated (and possibly nested) genomes, that is, viruses, plasmids and prokaryotes. When a homologous gene is present on at least one plasmid or virus and at least one chromosome, a process of "gene externalization," affected by a postprocessed selected functional bias, takes place, especially in Bacteria. Bipartite graphs give us a view of vertical and horizontal gene flow beyond classic taxonomy on a single very large, analytically tractable, graph that goes beyond the cellular Web of Life.
Collapse
Affiliation(s)
- Eduardo Corel
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Raphaël Méheust
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Andrew K Watson
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - James O McInerney
- Chair in Evolutionary Biology, The University of Manchester, United Kingdom
| | - Philippe Lopez
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| | - Eric Bapteste
- Unité Mixte de Recherche 7138 Evolution Paris-Seine, Centre National de la Recherche Scientifique, Institut de Biologie Paris-Seine, Sorbonne Université, Université Pierre et Marie Curie, Paris, France
| |
Collapse
|
22
|
Aslak U, Rosvall M, Lehmann S. Constrained information flows in temporal networks reveal intermittent communities. Phys Rev E 2018; 97:062312. [PMID: 30011557 DOI: 10.1103/physreve.97.062312] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Indexed: 11/07/2022]
Abstract
Many real-world networks represent dynamic systems with interactions that change over time, often in uncoordinated ways and at irregular intervals. For example, university students connect in intermittent groups that repeatedly form and dissolve based on multiple factors, including their lectures, interests, and friends. Such dynamic systems can be represented as multilayer networks where each layer represents a snapshot of the temporal network. In this representation, it is crucial that the links between layers accurately capture real dependencies between those layers. Often, however, these dependencies are unknown. Therefore, current methods connect layers based on simplistic assumptions that do not capture node-level layer dependencies. For example, connecting every node to itself in other layers with the same weight can wipe out dependencies between intermittent groups, making it difficult or even impossible to identify them. In this paper, we present a principled approach to estimating node-level layer dependencies based on the network structure within each layer. We implement our node-level coupling method in the community detection framework Infomap and demonstrate its performance compared to current methods on synthetic and real temporal networks. We show that our approach more effectively constrains information inside multilayer communities so that Infomap can better recover planted groups in multilayer benchmark networks that represent multiple modes with different groups and better identify intermittent communities in real temporal contact networks. These results suggest that node-level layer coupling can improve the modeling of information spreading in temporal networks and better capture intermittent community structure.
Collapse
Affiliation(s)
- Ulf Aslak
- Centre for Social Data Science, University of Copenhagen, DK-1353 København K, Denmark and DTU Compute, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Martin Rosvall
- Integrated Science Lab, Department of Physics, Umeå University, SE-901 87 Umeå, Sweden
| | - Sune Lehmann
- DTU Compute, Technical University of Denmark, DK-2800 Kgs. Lyngby; Niels Bohr Institute, University of Copenhagen, DK-2100 København Ø, Denmark; and Department of Sociology, University of Copenhagen, DK-1353 København K, Denmark
| |
Collapse
|
23
|
Rorick MM, Baskerville EB, Rask TS, Day KP, Pascual M. Identifying functional groups among the diverse, recombining antigenic var genes of the malaria parasite Plasmodium falciparum from a local community in Ghana. PLoS Comput Biol 2018; 14:e1006174. [PMID: 29897905 PMCID: PMC6016947 DOI: 10.1371/journal.pcbi.1006174] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 06/25/2018] [Accepted: 05/03/2018] [Indexed: 11/18/2022] Open
Abstract
A challenge in studying diverse multi-copy gene families is deciphering distinct functional types within immense sequence variation. Functional changes can in some cases be tracked through the evolutionary history of a gene family; however phylogenetic approaches are not possible in cases where gene families diversify primarily by recombination. We take a network theoretical approach to functionally classify the highly recombining var antigenic gene family of the malaria parasite Plasmodium falciparum. We sample var DBLα sequence types from a local population in Ghana, and classify 9,276 of these variants into just 48 functional types. Our approach is to first decompose each sequence type into its constituent, recombining parts; we then use a stochastic block model to identify functional groups among the parts; finally, we classify the sequence types based on which functional groups they contain. This method for functional classification does not rely on an inferred phylogenetic history, nor does it rely on inferring function based on conserved sequence features. Instead, it infers functional similarity among recombining parts based on the sharing of similar co-occurrence interactions with other parts. This method can therefore group sequences that have undetectable sequence homology or even distinct origination. Describing these 48 var functional types allows us to simplify the antigenic diversity within our dataset by over two orders of magnitude. We consider how the var functional types are distributed in isolates, and find a nonrandom pattern reflecting that common var functional types are non-randomly distinct from one another in terms of their functional composition. The coarse-graining of var gene diversity into biologically meaningful functional groups has important implications for understanding the disease ecology and evolution of this system, as well as for designing effective epidemiological monitoring and intervention.
Collapse
Affiliation(s)
- Mary M. Rorick
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, United States of America
- Department of Biology, University of Utah, Salt Lake City, UT, United States of America
| | - Edward B. Baskerville
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, United States of America
| | - Thomas S. Rask
- School of Biosciences, Bio21 Institute, The University of Melbourne, Melbourne, AU
- Department of Microbiology, New York University, New York, NY, United States of America
| | - Karen P. Day
- School of Biosciences, Bio21 Institute, The University of Melbourne, Melbourne, AU
- Department of Microbiology, New York University, New York, NY, United States of America
| | - Mercedes Pascual
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, United States of America
- The Santa Fe Institute, Santa Fe, NM, United States of America
| |
Collapse
|
24
|
The Plasmodium falciparum transcriptome in severe malaria reveals altered expression of genes involved in important processes including surface antigen-encoding var genes. PLoS Biol 2018; 16:e2004328. [PMID: 29529020 PMCID: PMC5864071 DOI: 10.1371/journal.pbio.2004328] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 03/22/2018] [Accepted: 02/16/2018] [Indexed: 01/13/2023] Open
Abstract
Within the human host, the malaria parasite Plasmodium falciparum is exposed to multiple selection pressures. The host environment changes dramatically in severe malaria, but the extent to which the parasite responds to-or is selected by-this environment remains unclear. From previous studies, the parasites that cause severe malaria appear to increase expression of a restricted but poorly defined subset of the PfEMP1 variant, surface antigens. PfEMP1s are major targets of protective immunity. Here, we used RNA sequencing (RNAseq) to analyse gene expression in 44 parasite isolates that caused severe and uncomplicated malaria in Papuan patients. The transcriptomes of 19 parasite isolates associated with severe malaria indicated that these parasites had decreased glycolysis without activation of compensatory pathways; altered chromatin structure and probably transcriptional regulation through decreased histone methylation; reduced surface expression of PfEMP1; and down-regulated expression of multiple chaperone proteins. Our RNAseq also identified novel associations between disease severity and PfEMP1 transcripts, domains, and smaller sequence segments and also confirmed all previously reported associations between expressed PfEMP1 sequences and severe disease. These findings will inform efforts to identify vaccine targets for severe malaria and also indicate how parasites adapt to-or are selected by-the host environment in severe malaria.
Collapse
|
25
|
Wang W, Chen X, Jiao P, Jin D. Similarity-based Regularized Latent Feature Model for Link Prediction in Bipartite Networks. Sci Rep 2017; 7:16996. [PMID: 29208988 PMCID: PMC5717264 DOI: 10.1038/s41598-017-17157-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 11/21/2017] [Indexed: 01/05/2023] Open
Abstract
Link prediction is an attractive research topic in the field of data mining and has significant applications in improving performance of recommendation system and exploring evolving mechanisms of the complex networks. A variety of complex systems in real world should be abstractly represented as bipartite networks, in which there are two types of nodes and no links connect nodes of the same type. In this paper, we propose a framework for link prediction in bipartite networks by combining the similarity based structure and the latent feature model from a new perspective. The framework is called Similarity Regularized Nonnegative Matrix Factorization (SRNMF), which explicitly takes the local characteristics into consideration and encodes the geometrical information of the networks by constructing a similarity based matrix. We also develop an iterative scheme to solve the objective function based on gradient descent. Extensive experiments on a variety of real world bipartite networks show that the proposed framework of link prediction has a more competitive, preferable and stable performance in comparison with the state-of-art methods.
Collapse
Affiliation(s)
- Wenjun Wang
- School of Computer Science and Technology, Tianjin University, Tianjin, 300354, China.,Tianjin Engineering Center of SmartSafety and Bigdata Technology, Tianjin University, Tianjin, 300354, China.,Tianjin Key Laboratory of Advanced Networking (TANK), Tianjin Key Laboratory, Tianjin, 300354, China
| | - Xue Chen
- School of Computer Science and Technology, Tianjin University, Tianjin, 300354, China
| | - Pengfei Jiao
- School of Computer Science and Technology, Tianjin University, Tianjin, 300354, China.
| | - Di Jin
- School of Computer Science and Technology, Tianjin University, Tianjin, 300354, China
| |
Collapse
|
26
|
Peel L, Larremore DB, Clauset A. The ground truth about metadata and community detection in networks. SCIENCE ADVANCES 2017; 3:e1602548. [PMID: 28508065 PMCID: PMC5415338 DOI: 10.1126/sciadv.1602548] [Citation(s) in RCA: 120] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 03/08/2017] [Indexed: 05/30/2023]
Abstract
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks' links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
Collapse
Affiliation(s)
- Leto Peel
- Institute of Information and Communication Technologies, Electronics and Applied Mathematics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- naXys, Université de Namur, Namur, Belgium
| | | | - Aaron Clauset
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Department of Computer Science, University of Colorado, Boulder, CO 80309, USA
- BioFrontiers Institute, University of Colorado, Boulder, CO 80309, USA
| |
Collapse
|
27
|
De Bacco C, Power EA, Larremore DB, Moore C. Community detection, link prediction, and layer interdependence in multilayer networks. Phys Rev E 2017; 95:042317. [PMID: 28505768 DOI: 10.1103/physreve.95.042317] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Indexed: 11/07/2022]
Abstract
Complex systems are often characterized by distinct types of interactions between the same entities. These can be described as a multilayer network where each layer represents one type of interaction. These layers may be interdependent in complicated ways, revealing different kinds of structure in the network. In this work we present a generative model, and an efficient expectation-maximization algorithm, which allows us to perform inference tasks such as community detection and link prediction in this setting. Our model assumes overlapping communities that are common between the layers, while allowing these communities to affect each layer in a different way, including arbitrary mixtures of assortative, disassortative, or directed structure. It also gives us a mathematically principled way to define the interdependence between layers, by measuring how much information about one layer helps us predict links in another layer. In particular, this allows us to bundle layers together to compress redundant information and identify small groups of layers which suffice to predict the remaining layers accurately. We illustrate these findings by analyzing synthetic data and two real multilayer networks, one representing social support relationships among villagers in South India and the other representing shared genetic substring material between genes of the malaria parasite.
Collapse
Affiliation(s)
- Caterina De Bacco
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| | - Eleanor A Power
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| | - Daniel B Larremore
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| | - Cristopher Moore
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
28
|
Peixoto TP. Nonparametric Bayesian inference of the microcanonical stochastic block model. Phys Rev E 2017; 95:012317. [PMID: 28208453 DOI: 10.1103/physreve.95.012317] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Indexed: 11/07/2022]
Abstract
A principled approach to characterize the hidden structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Department of Mathematical Sciences and Centre for Networks and Collective Behaviour, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom and ISI Foundation, Via Alassio 11/c, 10126 Torino, Italy
| |
Collapse
|
29
|
Li Z, Wang RS, Zhang S, Zhang XS. Quantitative function and algorithm for community detection in bipartite networks. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.07.024] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
30
|
Newman MEJ, Clauset A. Structure and inference in annotated networks. Nat Commun 2016; 7:11863. [PMID: 27306566 PMCID: PMC4912639 DOI: 10.1038/ncomms11863] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 05/05/2016] [Indexed: 02/02/2023] Open
Abstract
For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this 'metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains.
Collapse
Affiliation(s)
- M. E. J. Newman
- Department of Physics, University of Michigan, 450 Church Street, Ann Arbor, Michigan 48109, USA
- Center for the Study of Complex Systems, University of Michigan, 450 Church Street, Ann Arbor, Michigan 48109, USA
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| | - Aaron Clauset
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
- Department of Computer Science, University of Colorado, 430 UCB, Boulder, Colorado 80309, USA
- BioFrontiers Institute, University of Colorado, 596 UCB, Boulder, Colorado 80309, USA
| |
Collapse
|
31
|
Larremore DB, Sundararaman SA, Liu W, Proto WR, Clauset A, Loy DE, Speede S, Plenderleith LJ, Sharp PM, Hahn BH, Rayner JC, Buckee CO. Ape parasite origins of human malaria virulence genes. Nat Commun 2015; 6:8368. [PMID: 26456841 PMCID: PMC4633637 DOI: 10.1038/ncomms9368] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 08/14/2015] [Indexed: 12/22/2022] Open
Abstract
Antigens encoded by the var gene family are major virulence factors of the human malaria parasite Plasmodium falciparum, exhibiting enormous intra- and interstrain diversity. Here we use network analysis to show that var architecture and mosaicism are conserved at multiple levels across the Laverania subgenus, based on var-like sequences from eight single-species and three multi-species Plasmodium infections of wild-living or sanctuary African apes. Using select whole-genome amplification, we also find evidence of multi-domain var structure and synteny in Plasmodium gaboni, one of the ape Laverania species most distantly related to P. falciparum, as well as a new class of Duffy-binding-like domains. These findings indicate that the modular genetic architecture and sequence diversity underlying var-mediated host-parasite interactions evolved before the radiation of the Laverania subgenus, long before the emergence of P. falciparum. Antigens encoded by var genes are major virulence factors of the human malaria parasite Plasmodium falciparum. Here, Larremore et al. identify var-like genes in distantly related Plasmodium species infecting African apes, indicating that these genes already existed in an ancestral ape parasite many millions of years ago.
Collapse
Affiliation(s)
- Daniel B Larremore
- Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, Massachusetts 02115, USA.,Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts 02115, USA
| | - Sesh A Sundararaman
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Department of Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Weimin Liu
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - William R Proto
- Sanger Institute Malaria Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Aaron Clauset
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA.,Santa Fe Institute, Santa Fe, New Mexico 87501, USA.,BioFrontiers Institute, University of Colorado, Boulder, Colorado 80303, USA
| | - Dorothy E Loy
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Department of Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Sheri Speede
- Sanaga-Yong Chimpanzee Rescue Center, IDA-Africa, Portland, Oregon 97204, USA
| | - Lindsey J Plenderleith
- Institute of Evolutionary Biology and Centre for Immunity, Infection and Evolution, University of Edinburgh, Edinburgh EH9 3JT, UK
| | - Paul M Sharp
- Institute of Evolutionary Biology and Centre for Immunity, Infection and Evolution, University of Edinburgh, Edinburgh EH9 3JT, UK
| | - Beatrice H Hahn
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Department of Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Julian C Rayner
- Sanger Institute Malaria Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Caroline O Buckee
- Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, Massachusetts 02115, USA.,Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts 02115, USA
| |
Collapse
|
32
|
Tessema SK, Monk SL, Schultz MB, Tavul L, Reeder JC, Siba PM, Mueller I, Barry AE. Phylogeography of var gene repertoires reveals fine-scale geospatial clustering of Plasmodium falciparum populations in a highly endemic area. Mol Ecol 2015; 24:484-97. [PMID: 25482097 DOI: 10.1111/mec.13033] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Revised: 11/14/2014] [Accepted: 11/17/2014] [Indexed: 11/28/2022]
Abstract
Plasmodium falciparum malaria is a major global health problem that is being targeted for progressive elimination. Knowledge of local disease transmission patterns in endemic countries is critical to these elimination efforts. To investigate fine-scale patterns of malaria transmission, we have compared repertoires of rapidly evolving var genes in a highly endemic area. A total of 3680 high-quality DBLα-sequences were obtained from 68 P. falciparum isolates from ten villages spread over two distinct catchment areas on the north coast of Papua New Guinea (PNG). Modelling of the extent of var gene diversity in the two parasite populations predicts more than twice as many var gene alleles circulating within each catchment (Mugil = 906; Wosera = 1094) than previously recognized in PNG (Amele = 369). In addition, there were limited levels of var gene sharing between populations, consistent with local parasite population structure. Phylogeographic analyses demonstrate that while neutrally evolving microsatellite markers identified population structure only at the catchment level, var gene repertoires reveal further fine-scale geospatial clustering of parasite isolates. The clustering of parasite isolates by village in Mugil, but not in Wosera was consistent with the physical and cultural isolation of the human populations in the two catchments. The study highlights the microheterogeneity of P. falciparum transmission in highly endemic areas and demonstrates the potential of var genes as markers of local patterns of parasite population structure.
Collapse
Affiliation(s)
- Sofonias K Tessema
- Division of Infection and Immunity, Walter and Eliza Hall Institute, 3052, Melbourne, Vic., Australia; Department of Medical Biology, University of Melbourne, 3052, Melbourne, Vic., Australia
| | | | | | | | | | | | | | | |
Collapse
|
33
|
Claessens A, Hamilton WL, Kekre M, Otto TD, Faizullabhoy A, Rayner JC, Kwiatkowski D. Generation of antigenic diversity in Plasmodium falciparum by structured rearrangement of Var genes during mitosis. PLoS Genet 2014; 10:e1004812. [PMID: 25521112 PMCID: PMC4270465 DOI: 10.1371/journal.pgen.1004812] [Citation(s) in RCA: 128] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Accepted: 10/08/2014] [Indexed: 11/25/2022] Open
Abstract
The most polymorphic gene family in P. falciparum is the ∼60 var genes distributed across parasite chromosomes, both in the subtelomeres and in internal regions. They encode hypervariable surface proteins known as P. falciparum erythrocyte membrane protein 1 (PfEMP1) that are critical for pathogenesis and immune evasion in Plasmodium falciparum. How var gene sequence diversity is generated is not currently completely understood. To address this, we constructed large clone trees and performed whole genome sequence analysis to study the generation of novel var gene sequences in asexually replicating parasites. While single nucleotide polymorphisms (SNPs) were scattered across the genome, structural variants (deletions, duplications, translocations) were focused in and around var genes, with considerable variation in frequency between strains. Analysis of more than 100 recombination events involving var exon 1 revealed that the average nucleotide sequence identity of two recombining exons was only 63% (range: 52.7–72.4%) yet the crossovers were error-free and occurred in such a way that the resulting sequence was in frame and domain architecture was preserved. Var exon 1, which encodes the immunologically exposed part of the protein, recombined in up to 0.2% of infected erythrocytes in vitro per life cycle. The high rate of var exon 1 recombination indicates that millions of new antigenic structures could potentially be generated each day in a single infected individual. We propose a model whereby var gene sequence polymorphism is mainly generated during the asexual part of the life cycle. Malaria kills >600,000 people each year, with most deaths caused by Plasmodium falciparum. A family of proteins known as P. falciparum erythrocyte membrane protein 1, PfEMP1, is expressed on the surface of infected erythrocytes and plays an important role in pathogenesis. Each P. falciparum genome contains approximately 60 highly polymorphic var genes encoding the PfEMP1 proteins, and monoallelic expression with periodic switching results in immune evasion. Var gene polymorphism is thus critical to this survival strategy. We investigated how var gene diversity is generated by performing an in vitro evolution experiment, tracking var gene mutation in ‘real-time’ with whole genome sequencing. We found that genome structural variation is focused in and around var genes. These genetic rearrangements created new ‘chimeric’ var gene sequences during the mitotic part of the life cycle, and were consistent with processes of mitotic non-allelic homologous recombination. The recombinant var genes were always in frame and with conserved overall var gene architecture, and the recombination rate implies that many millions of rearranged var gene sequences are produced every 48-hour life cycle within infected individuals. In conclusion, we provide a detailed description of how new var gene sequences are continuously generated in the parasite genome, helping to explain long-term parasite survival within infected human hosts.
Collapse
Affiliation(s)
- Antoine Claessens
- Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
- * E-mail:
| | | | - Mihir Kekre
- Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Thomas D. Otto
- Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Adnan Faizullabhoy
- Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Julian C. Rayner
- Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Dominic Kwiatkowski
- Malaria Programme, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
- MRC Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
34
|
Immune characterization of Plasmodium falciparum parasites with a shared genetic signature in a region of decreasing transmission. Infect Immun 2014; 83:276-85. [PMID: 25368109 DOI: 10.1128/iai.01979-14] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
As the intensity of malaria transmission has declined, Plasmodium falciparum parasite populations have displayed decreased clonal diversity resulting from the emergence of many parasites with common genetic signatures (CGS). We have monitored such CGS parasite clusters from 2006 to 2013 in Thiès, Senegal, using the molecular barcode. The first, and one of the largest observed clusters of CGS parasites, was present in 24% of clinical isolates in 2008, declined to 3.4% of clinical isolates in 2009, and then disappeared. To begin to explore the relationship between the immune responses of the population and the emergence and decline of specific parasite genotypes, we have determined whether antibodies to CGS parasites correlate with their prevalence. We measured (i) antibodies capable of inhibiting parasite growth in culture and (ii) antibodies recognizing the surfaces of infected erythrocytes (RBCs). IgG obtained from volunteers in 2009 showed increased reactivity to the surfaces of CGS-parasitized erythrocytes over IgG from 2008. Since P. falciparum EMP-1 (PfEMP-1) is a major variant surface antigen, we used var Ups quantitative reverse transcription-PCR (qRT-PCR) and sequencing with degenerate DBL1α domain primers to characterize the var genes expressed by CGS parasites after short-term in vitro culture. CGS parasites show upregulation of UpsA var genes and 2-cysteine-containing PfEMP-1 molecules and express the same dominant var transcript. Our work indicates that the CGS parasites in this cluster express similar var genes, more than would be expected by chance in the population, and that there is year-to-year variation in immune recognition of surface antigens on CGS parasite-infected erythrocytes. This study lays the groundwork for detailed investigations of the mechanisms driving the expansion or contraction of specific parasite clones in the population.
Collapse
|
35
|
Cluster analysis of weighted bipartite networks: a new copula-based approach. PLoS One 2014; 9:e109507. [PMID: 25303095 PMCID: PMC4193785 DOI: 10.1371/journal.pone.0109507] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 09/03/2014] [Indexed: 11/30/2022] Open
Abstract
In this work we are interested in identifying clusters of “positional equivalent” actors, i.e. actors who play a similar role in a system. In particular, we analyze weighted bipartite networks that describes the relationships between actors on one side and features or traits on the other, together with the intensity level to which actors show their features. We develop a methodological approach that takes into account the underlying multivariate dependence among groups of actors. The idea is that positions in a network could be defined on the basis of the similar intensity levels that the actors exhibit in expressing some features, instead of just considering relationships that actors hold with each others. Moreover, we propose a new clustering procedure that exploits the potentiality of copula functions, a mathematical instrument for the modelization of the stochastic dependence structure. Our clustering algorithm can be applied both to binary and real-valued matrices. We validate it with simulations and applications to real-world data.
Collapse
|
36
|
Larremore DB, Clauset A, Jacobs AZ. Efficiently inferring community structure in bipartite networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 90:012805. [PMID: 25122340 PMCID: PMC4137326 DOI: 10.1103/physreve.90.012805] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Indexed: 05/23/2023]
Abstract
Bipartite networks are a common type of network data in which there are two types of vertices, and only vertices of different types can be connected. While bipartite networks exhibit community structure like their unipartite counterparts, existing approaches to bipartite community detection have drawbacks, including implicit parameter choices, loss of information through one-mode projections, and lack of interpretability. Here we solve the community detection problem for bipartite networks by formulating a bipartite stochastic block model, which explicitly includes vertex type information and may be trivially extended to k-partite networks. This bipartite stochastic block model yields a projection-free and statistically principled method for community detection that makes clear assumptions and parameter choices and yields interpretable results. We demonstrate this model's ability to efficiently and accurately find community structure in synthetic bipartite networks with known structure and in real-world bipartite networks with unknown structure, and we characterize its performance in practical contexts.
Collapse
Affiliation(s)
- Daniel B Larremore
- Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, Massachusetts 02115, USA and Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts 02115, USA
| | - Aaron Clauset
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA and Santa Fe Institute, Santa Fe, New Mexico 87501, USA and BioFrontiers Institute, University of Colorado, Boulder, Colorado 80303, USA
| | - Abigail Z Jacobs
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| |
Collapse
|