1
|
Soibam B. ChromNetMotif: a Python tool to extract chromatin-sate marked motifs in a chromatin interaction network. BIOINFORMATICS ADVANCES 2023; 3:vbad126. [PMID: 37745003 PMCID: PMC10517636 DOI: 10.1093/bioadv/vbad126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 08/08/2023] [Accepted: 09/12/2023] [Indexed: 09/26/2023]
Abstract
Motivation Analysis of network motifs is crucial to studying the robustness, stability, and functions of complex networks. Genome organization can be viewed as a biological network that consists of interactions between different chromatin regions. These interacting regions are also marked by epigenetic or chromatin states which can contribute to the overall organization of the chromatin and proper genome function. Therefore, it is crucial to integrate the chromatin states of the nodes when performing motif analysis in chromatin interaction networks. Even though there has been increasing production of chromatin interaction and genome-wide epigenetic modification data, there is a lack of publicly available tools to extract chromatin state-marked motifs from genome organization data. Results We develop a Python tool, ChromNetMotif, offering an easy-to-use command line interface to extract chromatin-state-marked motifs from a chromatin interaction network. The tool can extract occurrences, frequencies, and statistical enrichment of the chromatin state-marked motifs. Visualization files are also generated which allow the user to interpret the motifs easily. ChromNetMotif also allows the user to leverage the features of a multicore processor environment to reduce computation time for larger networks. The output files generated can be used to perform further downstream analysis. ChromNetMotif aims to serve as an important tool to comprehend the interplay between epigenetics and genome organization. Availability and implementation ChromNetMotif is available at https://github.com/lncRNAAddict/ChromNetworkMotif.
Collapse
Affiliation(s)
- Benjamin Soibam
- Department of Computer Science and Engineering Technology, University of Houston-Downtown, Houston, TX 77002, United States
| |
Collapse
|
2
|
Matejek B, Wei D, Chen T, Tsourakakis CE, Mitzenmacher M, Pfister H. Edge-colored directed subgraph enumeration on the connectome. Sci Rep 2022; 12:11349. [PMID: 35790766 PMCID: PMC9256670 DOI: 10.1038/s41598-022-15027-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 06/16/2022] [Indexed: 11/24/2022] Open
Abstract
Following significant advances in image acquisition, synapse detection, and neuronal segmentation in connectomics, researchers have extracted an increasingly diverse set of wiring diagrams from brain tissue. Neuroscientists frequently represent these wiring diagrams as graphs with nodes corresponding to a single neuron and edges indicating synaptic connectivity. The edges can contain "colors" or "labels", indicating excitatory versus inhibitory connections, among other things. By representing the wiring diagram as a graph, we can begin to identify motifs, the frequently occurring subgraphs that correspond to specific biological functions. Most analyses on these wiring diagrams have focused on hypothesized motifs-those we expect to find. However, one of the goals of connectomics is to identify biologically-significant motifs that we did not previously hypothesize. To identify these structures, we need large-scale subgraph enumeration to find the frequencies of all unique motifs. Exact subgraph enumeration is a computationally expensive task, particularly in the edge-dense wiring diagrams. Furthermore, most existing methods do not differentiate between types of edges which can significantly affect the function of a motif. We propose a parallel, general-purpose subgraph enumeration strategy to count motifs in the connectome. Next, we introduce a divide-and-conquer community-based subgraph enumeration strategy that allows for enumeration per brain region. Lastly, we allow for differentiation of edges by types to better reflect the underlying biological properties of the graph. We demonstrate our results on eleven connectomes and publish for future analyses extensive overviews for the 26 trillion subgraphs enumerated that required approximately 9.25 years of computation time.
Collapse
Affiliation(s)
- Brian Matejek
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA.
- Computer Science Laboratory, SRI International, Washington, DC, USA.
| | - Donglai Wei
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Department of Computer Science, Boston College, Chestnut Hill, MA, USA
| | - Tianyi Chen
- Department of Computer Science, Boston University, Boston, MA, USA
| | - Charalampos E Tsourakakis
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Department of Computer Science, Boston University, Boston, MA, USA
- ISI Foundation, Turin, Italy
| | - Michael Mitzenmacher
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| |
Collapse
|
3
|
Yadav AK, Shukla R, Singh TR. Topological parameters, patterns, and motifs in biological networks. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00012-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
4
|
le Gorrec L, Knight PA, Caen A. Learning network embeddings using small graphlets. SOCIAL NETWORK ANALYSIS AND MINING 2021. [DOI: 10.1007/s13278-021-00846-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractTechniques for learning vectorial representations of graphs (graph embeddings) have recently emerged as an effective approach to facilitate machine learning on graphs. Some of the most popular methods involve sophisticated features such as graph kernels or convolutional networks. In this work, we introduce two straightforward supervised learning algorithms based on small-size graphlet counts, combined with a dimension reduction step. The first relies on a classic feature extraction method powered by principal component analysis (PCA). The second is a feature selection procedure also based on PCA. Despite their conceptual simplicity, these embeddings are arguably more meaningful than some popular alternatives and at the same time are competitive with state-of-the-art methods. We illustrate this second point on a downstream classification task. We then use our algorithms in a novel setting, namely to conduct an analysis of author relationships in Wikipedia articles, for which we present an original dataset. Finally, we provide empirical evidence suggesting that our methods could also be adapted to unsupervised learning algorithms.
Collapse
|
5
|
Matelsky JK, Reilly EP, Johnson EC, Stiso J, Bassett DS, Wester BA, Gray-Roncal W. DotMotif: an open-source tool for connectome subgraph isomorphism search and graph queries. Sci Rep 2021; 11:13045. [PMID: 34158519 PMCID: PMC8219732 DOI: 10.1038/s41598-021-91025-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 04/29/2021] [Indexed: 01/02/2023] Open
Abstract
Recent advances in neuroscience have enabled the exploration of brain structure at the level of individual synaptic connections. These connectomics datasets continue to grow in size and complexity; methods to search for and identify interesting graph patterns offer a promising approach to quickly reduce data dimensionality and enable discovery. These graphs are often too large to be analyzed manually, presenting significant barriers to searching for structure and testing hypotheses. We combine graph database and analysis libraries with an easy-to-use neuroscience grammar suitable for rapidly constructing queries and searching for subgraphs and patterns of interest. Our approach abstracts many of the computer science and graph theory challenges associated with nanoscale brain network analysis and allows scientists to quickly conduct research at scale. We demonstrate the utility of these tools by searching for motifs on simulated data and real public connectomics datasets, and we share simple and complex structures relevant to the neuroscience community. We contextualize our findings and provide case studies and software to motivate future neuroscience exploration.
Collapse
Affiliation(s)
- Jordan K. Matelsky
- The Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723 USA
| | - Elizabeth P. Reilly
- The Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723 USA
| | - Erik C. Johnson
- The Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723 USA
| | - Jennifer Stiso
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
- Neuroscience Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Danielle S. Bassett
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Electrical and Systems Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Physics and Astronomy, College of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104 USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
- Santa Fe Institute, Santa Fe, NM 87501 USA
| | - Brock A. Wester
- The Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723 USA
| | - William Gray-Roncal
- The Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723 USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
6
|
Biswas S, Ray S, Bandyopadhyay S. Colored Network Motif Analysis by Dynamic Programming Approach: An Application in Host Pathogen Interaction Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:550-561. [PMID: 31217126 DOI: 10.1109/tcbb.2019.2923173] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Network motifs are subgraphs of a network which are found with significantly higher frequency than that expected in similar random networks. Motifs are small building blocks of a network and they have emerged as a way to uncover topological properties of complex networks. A special yet not much explored type of motif is the 'colored motif' where color (type) of each node, and hence the edges, in the motif is distinguishable from each other. A traditional motif is defined as a recurring structure in a network, whereas colored motif introduces detailed information about the color of the nodes. G-trie is a data structure to efficiently store a given set of subgraphs by exploiting the topological overlaps within them. In this article we have implemented a modified g-trie to store colored subgraphs and developed a method to discover colored motifs. Our method uses an approximate enumeration for counting the subgraphs to reduce the runtime. We have applied our method to find colored motifs of size three in a host pathogen protein-protein interaction network having two types of proteins namely HIV-1 and human proteins, and four types of edges. Here, we have discovered eight motifs, six of which contain both HIV-1 and human proteins, while the remaining two contain only human proteins.
Collapse
|
7
|
Tantardini M, Ieva F, Tajoli L, Piccardi C. Comparing methods for comparing networks. Sci Rep 2019; 9:17557. [PMID: 31772246 PMCID: PMC6879644 DOI: 10.1038/s41598-019-53708-y] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 10/25/2019] [Indexed: 11/17/2022] Open
Abstract
With the impressive growth of available data and the flexibility of network modelling, the problem of devising effective quantitative methods for the comparison of networks arises. Plenty of such methods have been designed to accomplish this task: most of them deal with undirected and unweighted networks only, but a few are capable of handling directed and/or weighted networks too, thus properly exploiting richer information. In this work, we contribute to the effort of comparing the different methods for comparing networks and providing a guide for the selection of an appropriate one. First, we review and classify a collection of network comparison methods, highlighting the criteria they are based on and their advantages and drawbacks. The set includes methods requiring known node-correspondence, such as DeltaCon and Cut Distance, as well as methods not requiring a priori known node-correspondence, such as alignment-based, graphlet-based, and spectral methods, and the recently proposed Portrait Divergence and NetLSD. We test the above methods on synthetic networks and we assess their usability and the meaningfulness of the results they provide. Finally, we apply the methods to two real-world datasets, the European Air Transportation Network and the FAO Trade Network, in order to discuss the results that can be drawn from this type of analysis.
Collapse
Affiliation(s)
| | - Francesca Ieva
- MOX - Modelling and Scientific Computing Lab, Department of Mathematics, Politecnico di Milano, Via Bonardi 9, 20133, Milano, Italy.,CADS - Center for Analysis, Decisions and Society, Human Technopole, 20157, Milano, Italy
| | - Lucia Tajoli
- Department of Management, Economics and Industrial Engineering, Politecnico di Milano, Via Lambruschini 4/b, 20156, Milano, Italy
| | - Carlo Piccardi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Milano, Italy.
| |
Collapse
|
8
|
|
9
|
Graphlet-orbit Transitions (GoT): A fingerprint for temporal network comparison. PLoS One 2018; 13:e0205497. [PMID: 30335791 PMCID: PMC6193656 DOI: 10.1371/journal.pone.0205497] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 09/26/2018] [Indexed: 11/19/2022] Open
Abstract
Given a set of temporal networks, from different domains and with different sizes, how can we compare them? Can we identify evolutionary patterns that are both (i) characteristic and (ii) meaningful? We address these challenges by introducing a novel temporal and topological network fingerprint named Graphlet-orbit Transitions (GoT). We demonstrate that GoT provides very rich and interpretable network characterizations. Our work puts forward an extension of graphlets and uses the notion of orbits to encapsulate the roles of nodes in each subgraph. We build a transition matrix that keeps track of the temporal trajectory of nodes in terms of their orbits, therefore describing their evolution. We also introduce a metric (OTA) to compare two networks when considering these matrices. Our experiments show that networks representing similar systems have characteristic orbit transitions. GoT correctly groups synthetic networks pertaining to well-known graph models more accurately than competing static and dynamic state-of-the-art approaches by over 30%. Furthermore, our tests on real-world networks show that GoT produces highly interpretable results, which we use to provide insight into characteristic orbit transitions.
Collapse
|
10
|
Zhang J, Kwong S, Liu G, Lin Q, Wong KC. PathEmb: Random Walk Based Document Embedding for Global Pathway Similarity Search. IEEE J Biomed Health Inform 2018; 23:1329-1335. [PMID: 29993756 DOI: 10.1109/jbhi.2018.2830806] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Pathway analysis is a cornerstone of system biology. In particular, pathway similarity search plays a key role in establishing structural, functional, and evolutionary relationships between different biological entities. Given a query pathway as well as a database, a pathway similarity search aims to identify novel pathways that are homologous to the query pathway. Unfortunately, the pathway similarity search is computationally inefficient due to the NP-complete graph isomorphism problem. In this study, we introduce a novel algorithmic framework for pathway similarity search, named PathEmb (Pathway Embedding), which is analogous to the Skip-gram model where each pathway is represented as a "document." PathEmb exploits a second order random walk strategy to explore diverse pathway patterns. All signaling paths traversed from random walks are regarded as "sentences," which are constituted as a "document" afterwards. Then, the "document" pattern for the individual pathway is mapped into a low-dimensional feature space for downstream tasks. Furthermore, PathEmb is a topology-free pathway similarity search algorithm, which is feasible to handle any pathway with arbitrary structure. We have extensively evaluated PathEmb and other cutting-edge methods on three pathway datasets. The experimental results demonstrate that PathEmb outperforms the existing methods in terms of computational efficiency and search accuracy. The source code of PathEmb are freely available online https://github.com/zhangjiaobxy/PathEmb.
Collapse
|
11
|
|
12
|
Aparicio D, Ribeiro P, Silva F. Extending the Applicability of Graphlets to Directed Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1302-1315. [PMID: 27362986 DOI: 10.1109/tcbb.2016.2586046] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
With recent advances in high-throughput cell biology, the amount of cellular biological data has grown drastically. Such data is often modeled as graphs (also called networks) and studying them can lead to new insights into molecule-level organization. A possible way to understand their structure is by analyzing the smaller components that constitute them, namely network motifs and graphlets. Graphlets are particularly well suited to compare networks and to assess their level of similarity due to the rich topological information that they offer but are almost always used as small undirected graphs of up to five nodes, thus limiting their applicability in directed networks. However, a large set of interesting biological networks such as metabolic, cell signaling, or transcriptional regulatory networks are intrinsically directional, and using metrics that ignore edge direction may gravely hinder information extraction. Our main purpose in this work is to extend the applicability of graphlets to directed networks by considering their edge direction, thus providing a powerful basis for the analysis of directed biological networks. We tested our approach on two network sets, one composed of synthetic graphs and another of real directed biological networks, and verified that they were more accurately grouped using directed graphlets than undirected graphlets. It is also evident that directed graphlets offer substantially more topological information than simple graph metrics such as degree distribution or reciprocity. However, enumerating graphlets in large networks is a computationally demanding task. Our implementation addresses this concern by using a state-of-the-art data structure, the g-trie, which is able to greatly reduce the necessary computation. We compared our tool to other state-of-the art methods and verified that it is the fastest general tool for graphlet counting.
Collapse
|
13
|
Rinnone F, Micale G, Bonnici V, Bader GD, Shasha D, Ferro A, Pulvirenti A, Giugno R. NetMatchStar: an enhanced Cytoscape network querying app. F1000Res 2015; 4:479. [PMID: 26594341 DOI: 10.12688/f1000research.6656.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2015] [Indexed: 02/03/2023] Open
Abstract
We present NetMatchStar, a Cytoscape app to find all the occurrences of a query graph in a network and check for its significance as a motif with respect to seven different random models. The query can be uploaded or built from scratch using Cytoscape facilities. The app significantly enhances the previous NetMatch in style, performance and functionality. Notably NetMatchStar allows queries with wildcards.
Collapse
Affiliation(s)
- Fabio Rinnone
- Department of Math and Computer Science, University of Catania, Catania, 95125, Italy
| | - Giovanni Micale
- Department of Math and Computer Science, University of Catania, Catania, 95125, Italy
| | - Vincenzo Bonnici
- Department of Computer Science, University of Verona, Verona, 37134, Italy
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Dennis Shasha
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, 10012, USA
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Rosalba Giugno
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| |
Collapse
|
14
|
Rinnone F, Micale G, Bonnici V, Bader GD, Shasha D, Ferro A, Pulvirenti A, Giugno R. NetMatchStar: an enhanced Cytoscape network querying app. F1000Res 2015; 4:479. [PMID: 26594341 PMCID: PMC4642848 DOI: 10.12688/f1000research.6656.2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/25/2015] [Indexed: 02/03/2023] Open
Abstract
We present NetMatchStar, a Cytoscape app to find all the occurrences of a query graph in a network and check for its significance as a motif with respect to seven different random models. The query can be uploaded or built from scratch using Cytoscape facilities. The app significantly enhances the previous NetMatch in style, performance and functionality. Notably NetMatchStar allows queries with wildcards.
Collapse
Affiliation(s)
- Fabio Rinnone
- Department of Math and Computer Science, University of Catania, Catania, 95125, Italy
| | - Giovanni Micale
- Department of Math and Computer Science, University of Catania, Catania, 95125, Italy
| | - Vincenzo Bonnici
- Department of Computer Science, University of Verona, Verona, 37134, Italy
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Dennis Shasha
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, 10012, USA
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Rosalba Giugno
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| |
Collapse
|
15
|
|