1
|
Bénichou A, Masson JB, Vestergaard CL. Compression-based inference of network motif sets. PLoS Comput Biol 2024; 20:e1012460. [PMID: 39388477 PMCID: PMC11495616 DOI: 10.1371/journal.pcbi.1012460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 10/22/2024] [Accepted: 09/04/2024] [Indexed: 10/12/2024] Open
Abstract
Physical and functional constraints on biological networks lead to complex topological patterns across multiple scales in their organization. A particular type of higher-order network feature that has received considerable interest is network motifs, defined as statistically regular subgraphs. These may implement fundamental logical and computational circuits and are referred to as "building blocks of complex networks". Their well-defined structures and small sizes also enable the testing of their functions in synthetic and natural biological experiments. Here, we develop a framework for motif mining based on lossless network compression using subgraph contractions. This provides an alternative definition of motif significance which allows us to compare different motifs and select the collectively most significant set of motifs as well as other prominent network features in terms of their combined compression of the network. Our approach inherently accounts for multiple testing and correlations between subgraphs and does not rely on a priori specification of an appropriate null model. It thus overcomes common problems in hypothesis testing-based motif analysis and guarantees robust statistical inference. We validate our methodology on numerical data and then apply it on synaptic-resolution biological neural networks, as a medium for comparative connectomics, by evaluating their respective compressibility and characterize their inferred circuit motifs.
Collapse
Affiliation(s)
- Alexis Bénichou
- Institut Pasteur, Université Paris Cité, CNRS UMR 3751, Decision and Bayesian Computation, Paris, France
- Epiméthée, Inria, Paris, France
| | - Jean-Baptiste Masson
- Institut Pasteur, Université Paris Cité, CNRS UMR 3751, Decision and Bayesian Computation, Paris, France
- Epiméthée, Inria, Paris, France
| | - Christian L. Vestergaard
- Institut Pasteur, Université Paris Cité, CNRS UMR 3751, Decision and Bayesian Computation, Paris, France
- Epiméthée, Inria, Paris, France
| |
Collapse
|
2
|
Mosar: Efficiently Characterizing Both Frequent and Rare Motifs in Large Graphs. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12147210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Due to high computational costs, exploring motif statistics (such as motif frequencies) of a large graph can be challenging. This is useful for understanding complex networks such as social and biological networks. To address this challenge, many methods explore approximate algorithms using edge/path sampling techniques. However, state-of-the-art methods usually over-sample frequent motifs and under-sample rare motifs, and thus they fail in many real applications such as anomaly detection (i.e., finding rare patterns). Furthermore, it is not feasible to apply existing weighted sampling methods such as stratified sampling to solve this problem, because it is difficult to sample subgraphs from a large graph in a direct manner. In this paper, we observe that rare motifs of most real-world networks have “more edges” than frequent motifs, and motifs with more edges are sampled by random edge sampling with higher probabilities. Based on these two observations, we propose a novel motif sampling method, Mosar, to estimate motif frequencies. In particular, our Mosar method samples frequent and rare motifs with different probabilities, and tends to sample motifs with low frequencies. As a result, the new method greatly reduces the estimation errors of these rare motifs. Finally, we conducted extensive experiments on a variety of real-world datasets with different sizes, and our experimental results show that the Mosar method is two orders of magnitude more accurate than state-of-the-art methods.
Collapse
|
3
|
Matejek B, Wei D, Chen T, Tsourakakis CE, Mitzenmacher M, Pfister H. Edge-colored directed subgraph enumeration on the connectome. Sci Rep 2022; 12:11349. [PMID: 35790766 PMCID: PMC9256670 DOI: 10.1038/s41598-022-15027-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 06/16/2022] [Indexed: 11/24/2022] Open
Abstract
Following significant advances in image acquisition, synapse detection, and neuronal segmentation in connectomics, researchers have extracted an increasingly diverse set of wiring diagrams from brain tissue. Neuroscientists frequently represent these wiring diagrams as graphs with nodes corresponding to a single neuron and edges indicating synaptic connectivity. The edges can contain "colors" or "labels", indicating excitatory versus inhibitory connections, among other things. By representing the wiring diagram as a graph, we can begin to identify motifs, the frequently occurring subgraphs that correspond to specific biological functions. Most analyses on these wiring diagrams have focused on hypothesized motifs-those we expect to find. However, one of the goals of connectomics is to identify biologically-significant motifs that we did not previously hypothesize. To identify these structures, we need large-scale subgraph enumeration to find the frequencies of all unique motifs. Exact subgraph enumeration is a computationally expensive task, particularly in the edge-dense wiring diagrams. Furthermore, most existing methods do not differentiate between types of edges which can significantly affect the function of a motif. We propose a parallel, general-purpose subgraph enumeration strategy to count motifs in the connectome. Next, we introduce a divide-and-conquer community-based subgraph enumeration strategy that allows for enumeration per brain region. Lastly, we allow for differentiation of edges by types to better reflect the underlying biological properties of the graph. We demonstrate our results on eleven connectomes and publish for future analyses extensive overviews for the 26 trillion subgraphs enumerated that required approximately 9.25 years of computation time.
Collapse
Affiliation(s)
- Brian Matejek
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA.
- Computer Science Laboratory, SRI International, Washington, DC, USA.
| | - Donglai Wei
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Department of Computer Science, Boston College, Chestnut Hill, MA, USA
| | - Tianyi Chen
- Department of Computer Science, Boston University, Boston, MA, USA
| | - Charalampos E Tsourakakis
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
- Department of Computer Science, Boston University, Boston, MA, USA
- ISI Foundation, Turin, Italy
| | - Michael Mitzenmacher
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| |
Collapse
|
4
|
Abstract
AbstractWe introduce a new method for finding network motifs. Subgraphs are motifs when their frequency in the data is high compared to the expected frequency under a null model. To compute this expectation, a full or approximate count of the occurrences of a motif is normally repeated on as many as 1000 random graphs sampled from the null model; a prohibitively expensive step. We use ideas from the minimum description length literature to define a new measure of motif relevance. With our method, samples from the null model are not required. Instead we compute the probability of the data under the null model and compare this to the probability under a specially designed alternative model. With this new relevance test, we can search for motifs by random sampling, rather than requiring an accurate count of all instances of a motif. This allows motif analysis to scale to networks with billions of links.
Collapse
|