1
|
Pražnikar J. Using graphlet degree vectors to predict atomic displacement parameters in protein structures. Acta Crystallogr D Struct Biol 2023; 79:1109-1119. [PMID: 37987168 PMCID: PMC10833351 DOI: 10.1107/s2059798323009142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 10/17/2023] [Indexed: 11/22/2023] Open
Abstract
In structural biology, atomic displacement parameters, commonly used in the form of B values, describe uncertainties in atomic positions. Their distribution over the structure can provide hints on local structural reliability and mobility. A spatial macromolecular model can be represented by a graph whose nodes are atoms and whose edges correspond to all interatomic contacts within a certain distance. Small connected subgraphs, called graphlets, provide information about the wiring of a particular atom. The multiple linear regression approach based on this information aims to predict a distribution of values of isotropic atomic displacement parameters (B values) within a protein structure, given the atomic coordinates and molecular packing. By modeling the dynamic component of atomic uncertainties, this method allows the B values obtained from experimental crystallographic or cryo-electron microscopy studies to be reproduced relatively well.
Collapse
Affiliation(s)
- Jure Pražnikar
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, Koper, Slovenia
- Department of Biochemistry, Molecular and Structural Biology, Institute Jožef Stefan, Jamova 39, Ljubljana, Slovenia
| |
Collapse
|
2
|
Piccardi C. Metrics for network comparison using egonet feature distributions. Sci Rep 2023; 13:14657. [PMID: 37669967 PMCID: PMC10480166 DOI: 10.1038/s41598-023-40938-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 08/18/2023] [Indexed: 09/07/2023] Open
Abstract
Identifying networks with similar characteristics in a given ensemble, or detecting pattern discontinuities in a temporal sequence of networks, are two examples of tasks that require an effective metric capable of quantifying network (dis)similarity. Here we propose a method based on a global portrait of graph properties built by processing local nodes features. More precisely, a set of dissimilarity measures is defined by elaborating the distributions, over the network, of a few egonet features, namely the degree, the clustering coefficient, and the egonet persistence. The method, which does not require the alignment of the two networks being compared, exploits the statistics of the three features to define one- or multi-dimensional distribution functions, which are then compared to define a distance between the networks. The effectiveness of the method is evaluated using a standard classification test, i.e., recognizing the graphs originating from the same synthetic model. Overall, the proposed distances have performances comparable to the best state-of-the-art techniques (graphlet-based methods) with similar computational requirements. Given its simplicity and flexibility, the method is proposed as a viable approach for network comparison tasks.
Collapse
Affiliation(s)
- Carlo Piccardi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Milan, Italy.
| |
Collapse
|
3
|
Wang Z, Zhan XX, Liu C, Zhang ZK. Quantification of network structural dissimilarities based on network embedding. iScience 2022; 25:104446. [PMID: 35677641 PMCID: PMC9168171 DOI: 10.1016/j.isci.2022.104446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 05/01/2022] [Accepted: 05/17/2022] [Indexed: 11/26/2022] Open
Abstract
Quantifying structural dissimilarities between networks is a fundamental and challenging problem in network science. Previous network comparison methods are based on the structural features, such as the length of shortest path and degree, which only contain part of the topological information. Therefore, we propose an efficient network comparison method based on network embedding, which considers the global structural information. In detail, we first construct a distance matrix for each network based on the distances between node embedding vectors derived from DeepWalk. Then, we define the dissimilarity between two networks based on Jensen-Shannon divergence of the distance distributions. Experiments on both synthetic and empirical networks show that our method outperforms the baseline methods and can distinguish networks well. In addition, we show that our method can capture network properties, e.g., average shortest path length and link density. Moreover, the experiment of modularity further implies the functionality of our method.
Collapse
Affiliation(s)
- Zhipeng Wang
- Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Xiu-Xiu Zhan
- Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Chuang Liu
- Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Zi-Ke Zhang
- College of Media and International Culture, Zhejiang University, Hangzhou 310058, PR China
| |
Collapse
|
4
|
Das B, Mitra P. ProMoCell and ProModb: Web services for analyzing interaction-based functionally localized protein modules in a cell. J Mol Model 2022; 28:167. [PMID: 35612652 DOI: 10.1007/s00894-022-05133-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 04/30/2022] [Indexed: 11/30/2022]
Abstract
The modular organization of a cell which can be determined by its interaction network allows us to understand a mesh of cooperation among the functional modules. Therefore, cellular-level identification of functional modules aids in understanding the functional and structural characteristics of the biological network of a cell and also assists in determining or comprehending the evolutionary signal. We develop ProMoCell that performs real-time Web scraping for generating clusters of the cellular level functional units of an organism. ProMoCell constructs the Protein Locality Graphs and clusters the cellular level functional units of an organism by utilizing experimentally verified data from various online sources. Also, we develop ProModb, a database service that houses precomputed whole-cell protein-protein interaction network-based functional modules of an organism using ProMoCell. Our Web service is entirely synchronized with the KEGG pathway database and allows users to generate spatially localized protein modules for any organism belonging to the KEGG genome using its real-time Web scraping characteristics. Hence, the server will host as many organisms as is maintained by the KEGG database. Our Web services provide the users a comprehensive and integrated tool for an efficient browsing and extraction of the spatial locality-based protein locality graph and the functional modules constructed by gathering experimental data from several interaction databases and pathway maps. We believe that our Web services will be beneficial in pharmacological research, where a novel research domain called modular pharmacology has initiated the study on the diagnosis, prevention, and treatment of deadly diseases using functional modules.
Collapse
Affiliation(s)
- Barnali Das
- Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India.
| |
Collapse
|
5
|
Linking protein structural and functional change to mutation using amino acid networks. PLoS One 2022; 17:e0261829. [PMID: 35061689 PMCID: PMC8782487 DOI: 10.1371/journal.pone.0261829] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 12/11/2021] [Indexed: 11/30/2022] Open
Abstract
The function of a protein is strongly dependent on its structure. During evolution, proteins acquire new functions through mutations in the amino-acid sequence. Given the advance in deep mutational scanning, recent findings have found functional change to be position dependent, notwithstanding the chemical properties of mutant and mutated amino acids. This could indicate that structural properties of a given position are potentially responsible for the functional relevance of a mutation. Here, we looked at the relation between structure and function of positions using five proteins with experimental data of functional change available. In order to measure structural change, we modeled mutated proteins via amino-acid networks and quantified the perturbation of each mutation. We found that structural change is position dependent, and strongly related to functional change. Strong changes in protein structure correlate with functional loss, and positions with functional gain due to mutations tend to be structurally robust. Finally, we constructed a computational method to predict functionally sensitive positions to mutations using structural change that performs well on all five proteins with a mean precision of 74.7% and recall of 69.3% of all functional positions.
Collapse
|
6
|
Schefzik R, Boland L, Hahn B, Kirschning T, Lindner HA, Thiel M, Schneider-Lindner V. Differential Network Testing Reveals Diverging Dynamics of Organ System Interactions for Survivors and Non-survivors in Intensive Care Medicine. Front Physiol 2022; 12:801622. [PMID: 35082693 PMCID: PMC8784681 DOI: 10.3389/fphys.2021.801622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 12/08/2021] [Indexed: 01/08/2023] Open
Abstract
Statistical network analyses have become popular in many scientific disciplines, where an important task is to test for differences between two networks. We describe an overall framework for differential network testing procedures that vary regarding (1) the network estimation method, typically based on specific concepts of association, and (2) the network characteristic employed to measure the difference. Using permutation-based tests, our approach is general and applicable to various overall, node-specific or edge-specific network difference characteristics. The methods are implemented in our freely available R software package DNT, along with an R Shiny application. In a study in intensive care medicine, we compare networks based on parameters representing main organ systems to evaluate the prognosis of critically ill patients in the intensive care unit (ICU), using data from the surgical ICU of the University Medical Centre Mannheim, Germany. We specifically consider both cross-sectional comparisons between a non-survivor and a survivor group and longitudinal comparisons at two clinically relevant time points during the ICU stay: first, at admission, and second, at an event stage prior to death in non-survivors or a matching time point in survivors. The non-survivor and the survivor networks do not significantly differ at the admission stage. However, the organ system interactions of the survivors then stabilize at the event stage, revealing significantly more network edges, whereas those of the non-survivors do not. In particular, the liver appears to play a central role for the observed increased connectivity in the survivor network at the event stage.
Collapse
Affiliation(s)
- Roman Schefzik
- Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Leonie Boland
- Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Bianka Hahn
- Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Thomas Kirschning
- Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Holger A. Lindner
- Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Mannheim Institute of Innate Immunoscience (MI3), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Manfred Thiel
- Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Mannheim Institute of Innate Immunoscience (MI3), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Verena Schneider-Lindner
- Department of Anesthesiology and Surgical Intensive Care Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- Department of Community Health Sciences, Max Rady College of Medicine, University of Manitoba, Winnipeg, MB, Canada
| |
Collapse
|
7
|
Ovens K, Eames BF, McQuillan I. Comparative Analyses of Gene Co-expression Networks: Implementations and Applications in the Study of Evolution. Front Genet 2021; 12:695399. [PMID: 34484293 PMCID: PMC8414652 DOI: 10.3389/fgene.2021.695399] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
Similarities and differences in the associations of biological entities among species can provide us with a better understanding of evolutionary relationships. Often the evolution of new phenotypes results from changes to interactions in pre-existing biological networks and comparing networks across species can identify evidence of conservation or adaptation. Gene co-expression networks (GCNs), constructed from high-throughput gene expression data, can be used to understand evolution and the rise of new phenotypes. The increasing abundance of gene expression data makes GCNs a valuable tool for the study of evolution in non-model organisms. In this paper, we cover motivations for why comparing these networks across species can be valuable for the study of evolution. We also review techniques for comparing GCNs in the context of evolution, including local and global methods of graph alignment. While some protein-protein interaction (PPI) bioinformatic methods can be used to compare co-expression networks, they often disregard highly relevant properties, including the existence of continuous and negative values for edge weights. Also, the lack of comparative datasets in non-model organisms has hindered the study of evolution using PPI networks. We also discuss limitations and challenges associated with cross-species comparison using GCNs, and provide suggestions for utilizing co-expression network alignments as an indispensable tool for evolutionary studies going forward.
Collapse
Affiliation(s)
- Katie Ovens
- Augmented Intelligence & Precision Health Laboratory (AIPHL), Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - B. Frank Eames
- Department of Anatomy, Physiology, & Pharmacology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
8
|
Mahapatra A, Mukherjee J. Taxonomy classification using genomic footprint of mitochondrial sequences. Comb Chem High Throughput Screen 2021; 25:401-413. [PMID: 34382517 DOI: 10.2174/1386207324666210811102109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 07/07/2021] [Accepted: 07/12/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Advancement in the sequencing technology yields a huge number of genomes of a multitude of organisms in our planet. One of the fundamental tasks for processing and analyzing these sequences is to organize them in the existing taxonomic orders. <P> Method: Recently we proposed a novel approach, GenFooT, of taxonomy classification using the concept of genomic footprint (GFP). The technique is further refined and enhanced in this work leading to improved accuracies in the task of taxonomic classification on various benchmark datasets. GenFooT maps a genome sequence in a 2D coordinate space and extracts features from that representation. It uses two hyper-parameters, namely block size and number of fragments of genomic sequence while computing the feature. In this work, we propose an analysis for choosing values of those parameters adaptively from the sequences. The enhanced version of GenFooT is named GenFooT2. <P> Results and Conclusion: We have experimented GenFooT2 on ten different biological datasets of genomic sequences of various organisms belonging to different taxonomy ranks. Our experimental results indicate more than 3% improved classification performance of the proposed features with Logistic regression classifier than the GenFooT. We also performed the statistical test to compare the performance of GenFooT2 with the state-of-the-art methods including our previous method GenFooT. The experimental results as well as the statistical test exhibit that the performance of the proposed GenFooT2 is significantly better.
Collapse
Affiliation(s)
- Aritra Mahapatra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur. India
| | - Jayanta Mukherjee
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur. India
| |
Collapse
|
9
|
Peschel S, Müller CL, von Mutius E, Boulesteix AL, Depner M. NetCoMi: network construction and comparison for microbiome data in R. Brief Bioinform 2021; 22:bbaa290. [PMID: 33264391 PMCID: PMC8293835 DOI: 10.1093/bib/bbaa290] [Citation(s) in RCA: 143] [Impact Index Per Article: 47.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 09/24/2020] [Accepted: 10/07/2020] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data. RESULTS Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi's wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children's rooms between samples from two study centers (Ulm and Munich). AVAILABILITY R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi. CONTACT Tel:+49 89 3187 43258; stefanie.peschel@mail.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Stefanie Peschel
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
| | - Christian L Müller
- Department of Statistics, LMU München, Munich, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
- Center for Computational Mathematics, Flatiron Institute, New York, USA
| | - Erika von Mutius
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
- Dr von Hauner Children’s Hospital, LMU München, Munich, Germany
- Comprehensive Pneumology Center Munich (CPC-M), Member of the German Center for Lung Research, Munich, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU München, Munich, Germany
| | - Martin Depner
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
| |
Collapse
|
10
|
Ovens K, Maleki F, Eames BF, McQuillan I. Juxtapose: a gene-embedding approach for comparing co-expression networks. BMC Bioinformatics 2021; 22:125. [PMID: 33726666 PMCID: PMC7968242 DOI: 10.1186/s12859-021-04055-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 03/01/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Gene co-expression networks (GCNs) are not easily comparable due to their complex structure. In this paper, we propose a tool, Juxtapose, together with similarity measures that can be utilized for comparative transcriptomics between a set of organisms. While we focus on its application to comparing co-expression networks across species in evolutionary studies, Juxtapose is also generalizable to co-expression network comparisons across tissues or conditions within the same species. METHODS A word embedding strategy commonly used in natural language processing was utilized in order to generate gene embeddings based on walks made throughout the GCNs. Juxtapose was evaluated based on its ability to embed the nodes of synthetic structures in the networks consistently while also generating biologically informative results. Evaluation of the techniques proposed in this research utilized RNA-seq datasets from GTEx, a multi-species experiment of prefrontal cortex samples from the Gene Expression Omnibus, as well as synthesized datasets. Biological evaluation was performed using gene set enrichment analysis and known gene relationships in literature. RESULTS We show that Juxtapose is capable of globally aligning synthesized networks as well as identifying areas that are conserved in real gene co-expression networks without reliance on external biological information. Furthermore, output from a matching algorithm that uses cosine distance between GCN embeddings is shown to be an informative measure of similarity that reflects the amount of topological similarity between networks. CONCLUSIONS Juxtapose can be used to align GCNs without relying on known biological similarities and enables post-hoc analyses using biological parameters, such as orthology of genes, or conserved or variable pathways. AVAILABILITY A development version of the software used in this paper is available at https://github.com/klovens/juxtapose.
Collapse
Affiliation(s)
- Katie Ovens
- Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9 Canada
| | - Farhad Maleki
- Augmented Intelligence & Precision Health Laboratory (AIPHL), Research Institute of the McGill University Health Centre, Montreal, H4A 3S5 Canada
| | - B. Frank Eames
- Department of Anatomy, Physiology, and Pharmacology, University of Saskatchewan, Saskatoon, S7N 5E5 Canada
| | - Ian McQuillan
- Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9 Canada
| |
Collapse
|
11
|
Jiang Y, Li M, Fan Y, Di Z. Characterizing dissimilarity of weighted networks. Sci Rep 2021; 11:5768. [PMID: 33707620 PMCID: PMC7952696 DOI: 10.1038/s41598-021-85175-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 02/22/2021] [Indexed: 11/09/2022] Open
Abstract
Measuring the dissimilarities between networks is a basic problem and wildly used in many fields. Based on method of the D-measure which is suggested for unweighted networks, we propose a quantitative dissimilarity metric of weighted network (WD-metric). Crucially, we construct a distance probability matrix of weighted network, which can capture the comprehensive information of weighted network. Moreover, we define the complementary graph and alpha centrality of weighted network. Correspondingly, several synthetic and real-world networks are used to verify the effectiveness of the WD-metric. Experimental results show that WD-metric can effectively capture the influence of weight on the network structure and quantitatively measure the dissimilarity of weighted networks. It can also be used as a criterion for backbone extraction algorithms of complex network.
Collapse
Affiliation(s)
- Yuanxiang Jiang
- School of Systems Science, Beijing Normal University, Beijing, 100875, China
| | - Meng Li
- School of Systems Science, Beijing Normal University, Beijing, 100875, China
| | - Ying Fan
- School of Systems Science, Beijing Normal University, Beijing, 100875, China
| | - Zengru Di
- School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
12
|
Ospina-Forero L, Castañeda G, Guerrero OA. Estimating networks of sustainable development goals. INFORMATION & MANAGEMENT 2020. [DOI: 10.1016/j.im.2020.103342] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
13
|
Maugis PAG, Olhede SC, Priebe CE, Wolfe PJ. Testing for Equivalence of Network Distribution Using Subgraph Counts. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2020.1736085] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- P.-A. G. Maugis
- Department of Statistical Science, University College London, and Pivitar, London, UK
| | - S. C. Olhede
- School of Basic Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - C. E. Priebe
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD
| | - P. J. Wolfe
- Department of Statistics, Purdue University, West Lafayette, IN
| |
Collapse
|
14
|
System Network Complexity: Network Evolution Subgraphs of System State Series. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020. [DOI: 10.1109/tetci.2018.2848293] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
15
|
Tantardini M, Ieva F, Tajoli L, Piccardi C. Comparing methods for comparing networks. Sci Rep 2019; 9:17557. [PMID: 31772246 PMCID: PMC6879644 DOI: 10.1038/s41598-019-53708-y] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 10/25/2019] [Indexed: 11/17/2022] Open
Abstract
With the impressive growth of available data and the flexibility of network modelling, the problem of devising effective quantitative methods for the comparison of networks arises. Plenty of such methods have been designed to accomplish this task: most of them deal with undirected and unweighted networks only, but a few are capable of handling directed and/or weighted networks too, thus properly exploiting richer information. In this work, we contribute to the effort of comparing the different methods for comparing networks and providing a guide for the selection of an appropriate one. First, we review and classify a collection of network comparison methods, highlighting the criteria they are based on and their advantages and drawbacks. The set includes methods requiring known node-correspondence, such as DeltaCon and Cut Distance, as well as methods not requiring a priori known node-correspondence, such as alignment-based, graphlet-based, and spectral methods, and the recently proposed Portrait Divergence and NetLSD. We test the above methods on synthetic networks and we assess their usability and the meaningfulness of the results they provide. Finally, we apply the methods to two real-world datasets, the European Air Transportation Network and the FAO Trade Network, in order to discuss the results that can be drawn from this type of analysis.
Collapse
Affiliation(s)
| | - Francesca Ieva
- MOX - Modelling and Scientific Computing Lab, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, 20133, Milano, Italy.,CADS - Center for Analysis, Decisions and Society, Human Technopole, 20157, Milano, Italy
| | - Lucia Tajoli
- Department of Management, Economics and Industrial Engineering, Politecnico di Milano, Via Lambruschini 4/b, 20156, Milano, Italy
| | - Carlo Piccardi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Milano, Italy.
| |
Collapse
|
16
|
Reinert G, Ross N. Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs. ANN APPL PROBAB 2019. [DOI: 10.1214/19-aap1478] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Jing F, Zhang SW, Zhang S. Brief Survey of Biological Network Alignment and a Variant with Incorporation of Functional Annotations. Curr Bioinform 2018. [DOI: 10.2174/1574893612666171020103747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Biological network alignment has been widely studied in the context of protein-protein interaction (PPI) networks, metabolic networks and others in bioinformatics. The topological structure of networks and genomic sequence are generally used by existing methods for achieving this task.Objective and Method:Here we briefly survey the methods generally used for this task and introduce a variant with incorporation of functional annotations based on similarity in Gene Ontology (GO). Making full use of GO information is beneficial to provide insights into precise biological network alignment.Results and Conclusion:We analyze the effect of incorporation of GO information to network alignment. Finally, we make a brief summary and discuss future directions about this topic.
Collapse
Affiliation(s)
- Fang Jing
- Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
18
|
Bernard G, Greenfield P, Ragan MA, Chan CX. k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank. mSystems 2018; 3:e00257-18. [PMID: 30505941 PMCID: PMC6247013 DOI: 10.1128/msystems.00257-18] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 11/02/2018] [Indexed: 01/27/2023] Open
Abstract
Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based on k-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularly Proteobacteria. However, the signal from the other chromosomal regions is restricted in breadth. We show that mean k-mer similarity can correlate with taxonomic rank. We also link the implicated k-mers to genome annotation (thus, functions) and define core k-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions among Spirochaetes, whereas energy production and conversion are not highly conserved among the largely parasitic or commensal Tenericutes. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate that k-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale. IMPORTANCE Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly Proteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.
Collapse
Affiliation(s)
- Guillaume Bernard
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Paul Greenfield
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW, Australia
| | - Mark A. Ragan
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
19
|
Kuang J, Cadotte MW, Chen Y, Shu H, Liu J, Chen L, Hua Z, Shu W, Zhou J, Huang L. Conservation of Species- and Trait-Based Modeling Network Interactions in Extremely Acidic Microbial Community Assembly. Front Microbiol 2017; 8:1486. [PMID: 28848508 PMCID: PMC5554326 DOI: 10.3389/fmicb.2017.01486] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 07/24/2017] [Indexed: 11/29/2022] Open
Abstract
Understanding microbial interactions is essential to decipher the mechanisms of community assembly and their effects on ecosystem functioning, however, the conservation of species- and trait-based network interactions along environmental gradient remains largely unknown. Here, by using the network-based analyses with three paralleled data sets derived from 16S rRNA gene pyrosequencing, functional microarray, and predicted metagenome, we test our hypothesis that the network interactions of traits are more conserved than those of taxonomic measures, with significantly lower variation of network characteristics along the environmental gradient in acid mine drainage. The results showed that although the overall network characteristics remained similar, the structural variation was significantly lower at trait levels. The higher conserved individual node topological properties at trait level rather than at species level indicated that the responses of diverse traits remained relatively consistent even though different species played key roles under different environmental conditions. Additionally, the randomization tests revealed that it could not reject the null hypothesis that species-based correlations were random, while the tests suggested that correlation patterns of traits were non-random. Furthermore, relationships between trait-based network characteristics and environmental properties implied that trait-based networks might be more useful in reflecting the variation of ecosystem function. Taken together, our results suggest that deterministic trait-based community assembly results in greater conservation of network interaction, which may ensure ecosystem function across environmental regimes, emphasizing the potential importance of measuring the complexity and conservation of network interaction in evaluating the ecosystem stability and functioning.
Collapse
Affiliation(s)
- Jialiang Kuang
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China.,Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of OklahomaNorman, OK, United States
| | - Marc W Cadotte
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China.,Department of Biological Sciences, University of Toronto-ScarboroughToronto, ON, Canada.,Ecology and Evolutionary Biology, University of TorontoToronto, ON, Canada
| | - Yongjian Chen
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China
| | - Haoyue Shu
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China
| | - Jun Liu
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China
| | - Linxing Chen
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China
| | - Zhengshuang Hua
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China
| | - Wensheng Shu
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China
| | - Jizhong Zhou
- Department of Microbiology and Plant Biology, Institute for Environmental Genomics, University of OklahomaNorman, OK, United States.,Earth Sciences Division, Lawrence Berkeley National LaboratoryBerkeley, CA, United States.,State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua UniversityBeijing, China
| | - Linan Huang
- State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources and Conservation of Guangdong Higher Education Institutes, College of Ecology and Evolution, Sun Yat-sen UniversityGuangzhou, China
| |
Collapse
|
20
|
Yaveroglu ÖN, Malod-Dognin N, Milenkovic T, Pržulj N. Rebuttal to the Letter to the Editor in response to the paper: proper evaluation of alignment-free network comparison methods. Bioinformatics 2017; 33:1107-1109. [PMID: 28073757 DOI: 10.1093/bioinformatics/btw388] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Accepted: 06/14/2016] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Noël Malod-Dognin
- Department of Computer Science, University College London, London, UK
| | - Tijana Milenkovic
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, UK
| |
Collapse
|
21
|
Ali W, Wegner AE, Gaunt RE, Deane CM, Reinert G. Comparison of large networks with sub-sampling strategies. Sci Rep 2016; 6:28955. [PMID: 27380992 PMCID: PMC4933923 DOI: 10.1038/srep28955] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 06/07/2016] [Indexed: 11/17/2022] Open
Abstract
Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data. Netdis is such a score that defines network similarity through the counts of small sub-graphs in the local neighbourhood of all nodes. Here, we introduce a sub-sampling procedure based on neighbourhoods which links naturally with the framework of network comparisons through local neighbourhood comparisons. Our theoretical arguments justify basing the Netdis statistic on a sample of similar-sized neighbourhoods. Our tests on empirical and synthetic datasets indicate that often only 10% of the neighbourhoods of a network suffice for optimal performance, leading to a drastic reduction in computational requirements. The sampling procedure is applicable even when only a small sample of the network is known, and thus provides a novel tool for network comparison of very large and potentially incomplete datasets.
Collapse
Affiliation(s)
- Waqar Ali
- Department of Statistics, University of Oxford, 24-29 St: Giles’, Oxford OX1 3LB, UK
| | - Anatol E. Wegner
- Department of Statistics, University of Oxford, 24-29 St: Giles’, Oxford OX1 3LB, UK
| | - Robert E. Gaunt
- Department of Statistics, University of Oxford, 24-29 St: Giles’, Oxford OX1 3LB, UK
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, 24-29 St: Giles’, Oxford OX1 3LB, UK
| | - Gesine Reinert
- Department of Statistics, University of Oxford, 24-29 St: Giles’, Oxford OX1 3LB, UK
| |
Collapse
|
22
|
Emmert-Streib F, Dehmer M, Shi Y. Fifty years of graph matching, network alignment and network comparison. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.01.074] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
23
|
Coulson M, Gaunt RE, Reinert G. Poisson approximation of subgraph counts in stochastic block models and a graphon model. ESAIM-PROBAB STAT 2016. [DOI: 10.1051/ps/2016006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
24
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
25
|
Yaveroğlu ÖN, Milenković T, Pržulj N. Proper evaluation of alignment-free network comparison methods. Bioinformatics 2015; 31:2697-704. [PMID: 25810431 PMCID: PMC4528624 DOI: 10.1093/bioinformatics/btv170] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 03/18/2015] [Indexed: 11/25/2022] Open
Abstract
Motivation: Network comparison is a computationally intractable problem with important applications in systems biology and other domains. A key challenge is to properly quantify similarity between wiring patterns of two networks in an alignment-free fashion. Also, alignment-based methods exist that aim to identify an actual node mapping between networks and as such serve a different purpose. Various alignment-free methods that use different global network properties (e.g. degree distribution) have been proposed. Methods based on small local subgraphs called graphlets perform the best in the alignment-free network comparison task, due to high level of topological detail that graphlets can capture. Among different graphlet-based methods, Graphlet Correlation Distance (GCD) was shown to be the most accurate for comparing networks. Recently, a new graphlet-based method called NetDis was proposed, which was claimed to be superior. We argue against this, as the performance of NetDis was not properly evaluated to position it correctly among the other alignment-free methods. Results: We evaluate the performance of available alignment-free network comparison methods, including GCD and NetDis. We do this by measuring accuracy of each method (in a systematic precision-recall framework) in terms of how well the method can group (cluster) topologically similar networks. By testing this on both synthetic and real-world networks from different domains, we show that GCD remains the most accurate, noise-tolerant and computationally efficient alignment-free method. That is, we show that NetDis does not outperform the other methods, as originally claimed, while it is also computationally more expensive. Furthermore, since NetDis is dependent on the choice of a network null model (unlike the other graphlet-based methods), we show that its performance is highly sensitive to the choice of this parameter. Finally, we find that its performance is not independent on network sizes and densities, as originally claimed. Contact: natasha@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ömer Nebil Yaveroğlu
- California Institute for Telecommunications and Information Technology (Calit2), University of California, Irvine, CA 92697, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, IN 46556, USA and
| | - Nataša Pržulj
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|