1
|
Shirzadkhani R, Huang S, Leung A, Rabbany R. Static graph approximations of dynamic contact networks for epidemic forecasting. Sci Rep 2024; 14:11696. [PMID: 38777814 PMCID: PMC11111697 DOI: 10.1038/s41598-024-62271-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
Epidemic modeling is essential in understanding the spread of infectious diseases like COVID-19 and devising effective intervention strategies to control them. Recently, network-based disease models have integrated traditional compartment-based modeling with real-world contact graphs and shown promising results. However, in an ongoing epidemic, future contact network patterns are not observed yet. To address this, we use aggregated static networks to approximate future contacts for disease modeling. The standard method in the literature concatenates all edges from a dynamic graph into one collapsed graph, called the full static graph. However, the full static graph often leads to severe overestimation of key epidemic characteristics. Therefore, we propose two novel static network approximation methods, DegMST and EdgeMST, designed to preserve the sparsity of real world contact network while remaining connected. DegMST and EdgeMST use the frequency of temporal edges and the node degrees respectively to preserve sparsity. Our analysis show that our models more closely resemble the network characteristics of the dynamic graph compared to the full static ones. Moreover, our analysis on seven real-world contact networks suggests EdgeMST yield more accurate estimations of disease dynamics for epidemic forecasting when compared to the standard full static method.
Collapse
Affiliation(s)
- Razieh Shirzadkhani
- Mila, Quebec Artificial Intelligence Institute, Montreal, Canada
- Department of Bioresource Engineering, McGill University, Montreal, Canada
| | - Shenyang Huang
- Mila, Quebec Artificial Intelligence Institute, Montreal, Canada.
- School of Computer Science, McGill University, Montreal, Canada.
| | - Abby Leung
- School of Computer Science, McGill University, Montreal, Canada
| | - Reihaneh Rabbany
- Mila, Quebec Artificial Intelligence Institute, Montreal, Canada
- School of Computer Science, McGill University, Montreal, Canada
- CIFAR AI Chair, Montreal, Canada
| |
Collapse
|
2
|
Hosseinzadeh MM, Cannataro M, Guzzi PH, Dondi R. Temporal networks in biology and medicine: a survey on models, algorithms, and tools. NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2022; 12:10. [PMID: 36618274 PMCID: PMC9803903 DOI: 10.1007/s13721-022-00406-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/16/2022] [Accepted: 12/17/2022] [Indexed: 01/01/2023]
Abstract
The use of static graphs for modelling and analysis of biological and biomedical data plays a key role in biomedical research. However, many real-world scenarios present dynamic behaviours resulting in both node and edges modification as well as feature evolution. Consequently, ad-hoc models for capturing these evolutions along the time have been introduced, also referred to as dynamic, temporal, time-varying graphs. Here, we focus on temporal graphs, i.e., graphs whose evolution is represented by a sequence of time-ordered snapshots. Each snapshot represents a graph active in a particular timestamp. We survey temporal graph models and related algorithms, presenting fundamentals aspects and the recent advances. We formally define temporal graphs, focusing on the problem setting and we present their main applications in biology and medicine. We also present temporal graph embedding and the application to recent problems such as epidemic modelling. Finally, we further state some promising research directions in the area. Main results of this study include a systematic review of fundamental temporal network problems and their algorithmic solutions considered in the literature, in particular those having application in computational biology and medicine. We also include the main software developed in this context.
Collapse
Affiliation(s)
| | - Mario Cannataro
- Department of Surgical and Medical Sciences and Data Analytics Research Center, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences and Data Analytics Research Center, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Riccardo Dondi
- Department of Literature, Philosophy, Communication Studies, University of Bergamo, Bergamo, Italy
| |
Collapse
|
3
|
Carletti V, Foggia P, Percannella G, Ritrovato P, Vento M. Two parallel versions of VF3: Performance analysis on a wide database of graphs. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.03.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
4
|
Desmet S, Brouckaert M, Boerjan W, Morreel K. Seeing the forest for the trees: Retrieving plant secondary biochemical pathways from metabolome networks. Comput Struct Biotechnol J 2020; 19:72-85. [PMID: 33384856 PMCID: PMC7753198 DOI: 10.1016/j.csbj.2020.11.050] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 11/26/2020] [Accepted: 11/28/2020] [Indexed: 02/06/2023] Open
Abstract
Over the last decade, a giant leap forward has been made in resolving the main bottleneck in metabolomics, i.e., the structural characterization of the many unknowns. This has led to the next challenge in this research field: retrieving biochemical pathway information from the various types of networks that can be constructed from metabolome data. Searching putative biochemical pathways, referred to as biotransformation paths, is complicated because several flaws occur during the construction of metabolome networks. Multiple network analysis tools have been developed to deal with these flaws, while in silico retrosynthesis is appearing as an alternative approach. In this review, the different types of metabolome networks, their flaws, and the various tools to trace these biotransformation paths are discussed.
Collapse
Affiliation(s)
- Sandrien Desmet
- Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Marlies Brouckaert
- Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Wout Boerjan
- Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Kris Morreel
- Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| |
Collapse
|
5
|
Bitcoin and Cybersecurity: Temporal Dissection of Blockchain Data to Unveil Changes in Entity Behavioral Patterns. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9235003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails two issues: first, it increases the complexity of the analysis requiring higher efforts and, second, it may hide network micro-dynamics important for detecting short-term changes in entity behavioral patterns. The aim of this paper is to address both issues by performing a “temporal dissection” of the Bitcoin blockchain, i.e., dividing it into smaller temporal batches to achieve entity classification. The idea is that a machine learning model trained on a certain time-interval (batch) should achieve good classification performance when tested on another batch if entity behavioral patterns are similar. We apply cascading machine learning principles—a type of ensemble learning applying stacking techniques—introducing a “k-fold cross-testing” concept across batches of varying size. Results show that blockchain batch size used for entity classification could be reduced for certain classes (Exchange, Gambling, and eWallet) as classification rates did not vary significantly with batch size; suggesting that behavioral patterns did not change significantly over time. Mixer and Market class detection, however, can be negatively affected. A deeper analysis of Mining Pool behavior showed that models trained on recent data perform better than models trained on older data, suggesting that “typical” Mining Pool behavior may be represented better by recent data. This work provides a first step towards uncovering entity behavioral changes via temporal dissection of blockchain data.
Collapse
|
6
|
Huang Y, Zhong C. Detecting list-colored graph motifs in biological networks using branch-and-bound strategy. Comput Biol Med 2019; 107:1-9. [PMID: 30738296 DOI: 10.1016/j.compbiomed.2019.01.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 01/27/2019] [Accepted: 01/27/2019] [Indexed: 01/30/2023]
Abstract
In this work, we study the list-colored graph motif problem, which was introduced to detect functional motifs in biological networks. Given a multi-set M of colors as the query motif and a list-colored graph G where each vertex in G is associated with a set of colors, the aim of this problem is to find a sub-graph of G whose vertex set is colored exactly as motif M. To solve this problem, we present a heuristic method to efficiently and accurately detect list-colored graph motifs in biological networks using branch-and-bound strategy. We transform the detection of list-colored graph motif to the search of connected induced sub-graphs in list-colored graph, where the vertices in the sub-graph are assigned to distinctive colors of query motif. This transformation enables our method to accurately discover the occurrences of query motif without enumerating and verifying all sub-graphs. Furthermore, a new initial vertex selection strategy based on the colors of vertices is proposed to accurately determine the search scope of motifs. Experiments conducted on metabolic networks and protein-interaction networks demonstrate that our method can achieve better performance in accuracy and efficiency in comparison to other existing methods.
Collapse
Affiliation(s)
- Yiran Huang
- School of Computer and Electronics and Information, Guangxi Key Laboratory of Multimedia Communications Network Technology, Guangxi University, Nanning, 530004, China
| | - Cheng Zhong
- School of Computer and Electronics and Information, Guangxi Key Laboratory of Multimedia Communications Network Technology, Guangxi University, Nanning, 530004, China.
| |
Collapse
|
7
|
Metabolic pathways synthesis based on ant colony optimization. Sci Rep 2018; 8:16398. [PMID: 30401873 PMCID: PMC6219534 DOI: 10.1038/s41598-018-34454-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 10/17/2018] [Indexed: 02/02/2023] Open
Abstract
One of the current challenges in bioinformatics is to discover new ways to transform a set of compounds into specific products. The usual approach is finding the reactions to synthesize a particular product, from a given substrate, by means of classical searching algorithms. However, they have three main limitations: difficulty in handling large amounts of reactions and compounds; absence of a step that verifies the availability of substrates; and inability to find branched pathways. We present here a novel bio-inspired algorithm for synthesizing linear and branched metabolic pathways. It allows relating several compounds simultaneously, ensuring the availability of substrates for every reaction in the solution. Comparisons with classical searching algorithms and other recent metaheuristic approaches show clear advantages of this proposal, fully recovering well-known pathways. Furthermore, solutions found can be analyzed in a simple way through graphical representations on the web.
Collapse
|
8
|
Latouche P, Robin S, Ouadah S. Goodness of Fit of Logistic Regression Models for Random Graphs. J Comput Graph Stat 2018. [DOI: 10.1080/10618600.2017.1349663] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Pierre Latouche
- Laboratoire SAMM, EA 4543, Université Paris 1 Panthéon-Sorbonne, France
| | - Stéphane Robin
- AgroParisTech, UMR 518, MIA, Paris, France
- INRA, UMR 518, MIA, Paris, France
| | - Sarah Ouadah
- AgroParisTech, UMR 518, MIA, Paris, France
- INRA, UMR 518, MIA, Paris, France
| |
Collapse
|
9
|
van Ee M. Some notes on bounded starwidth graphs. INFORM PROCESS LETT 2017. [DOI: 10.1016/j.ipl.2017.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
CeFunMO: A centrality based method for discovering functional motifs with application in biological networks. Comput Biol Med 2016; 76:154-9. [DOI: 10.1016/j.compbiomed.2016.07.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 07/12/2016] [Accepted: 07/17/2016] [Indexed: 11/23/2022]
|
11
|
Smoly I, Carmel A, Shemer-Avni Y, Yeger-Lotem E, Ziv-Ukelson M. Algorithms for Regular Tree Grammar Network Search and Their Application to Mining Human-viral Infection Patterns. J Comput Biol 2016; 23:165-79. [PMID: 26953875 DOI: 10.1089/cmb.2015.0168] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Network querying is a powerful approach to mine molecular interaction networks. Most state-of-the-art network querying tools either confine the search to a prespecified topology in the form of some template subnetwork, or do not specify any topological constraints at all. Another approach is grammar-based queries, which are more flexible and expressive as they allow for expressing the topology of the sought pattern according to some grammar-based logic. Previous grammar-based network querying tools were confined to the identification of paths. In this article, we extend the patterns identified by grammar-based query approaches from paths to trees. For this, we adopt a higher order query descriptor in the form of a regular tree grammar (RTG). We introduce a novel problem and propose an algorithm to search a given graph for the k highest scoring subgraphs matching a tree accepted by an RTG. Our algorithm is based on the combination of dynamic programming with color coding, and includes an extension of previous k-best parsing optimization approaches to avoid isomorphic trees in the output. We implement the new algorithm and exemplify its application to mining viral infection patterns within molecular interaction networks. Our code is available online.
Collapse
Affiliation(s)
- Ilan Smoly
- 1 Department of Computer Science, Ben-Gurion University of the Negev , Beer-Sheva, Israel
| | - Amir Carmel
- 1 Department of Computer Science, Ben-Gurion University of the Negev , Beer-Sheva, Israel
| | - Yonat Shemer-Avni
- 2 Department of Virology, Ben-Gurion University of the Negev , Beer-Sheva, Israel
| | - Esti Yeger-Lotem
- 3 Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev , Beer-Sheva, Israel
| | - Michal Ziv-Ukelson
- 1 Department of Computer Science, Ben-Gurion University of the Negev , Beer-Sheva, Israel
| |
Collapse
|
12
|
Mullen J, Cockell SJ, Tipney H, Woollard PM, Wipat A. Mining integrated semantic networks for drug repositioning opportunities. PeerJ 2016; 4:e1558. [PMID: 26844016 PMCID: PMC4736989 DOI: 10.7717/peerj.1558] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 12/11/2015] [Indexed: 11/20/2022] Open
Abstract
Current research and development approaches to drug discovery have become less fruitful and more costly. One alternative paradigm is that of drug repositioning. Many marketed examples of repositioned drugs have been identified through serendipitous or rational observations, highlighting the need for more systematic methodologies to tackle the problem. Systems level approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but requires an integrative approach to biological data. Integrated networks can facilitate systems level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person is able to identify portions of the graph (semantic subgraphs) that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated approaches are required to systematically mine integrated networks for these subgraphs and bring them to the attention of the user. We introduce a formal framework for the definition of integrated networks and their associated semantic subgraphs for drug interaction analysis and describe DReSMin, an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. We demonstrate the utility of our approach by mining an integrated drug interaction network built from 11 sources. This work identified and ranked 9,643,061 putative drug-target interactions, showing a strong correlation between highly scored associations and those supported by literature. We discuss the 20 top ranked associations in more detail, of which 14 are novel and 6 are supported by the literature. We also show that our approach better prioritizes known drug-target interactions, than other state-of-the art approaches for predicting such interactions.
Collapse
Affiliation(s)
- Joseph Mullen
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science, University of Newcastle-upon-Tyne , Newcastle upon Tyne , United Kingdom
| | - Simon J Cockell
- Bioinformatics Support Unit, University of Newcastle-upon-Tyne , United Kingdom
| | - Hannah Tipney
- Computational Biology, Target Sciences, GSK R&D, GlaxoSmithKline , Stevenage, Hertfordshire , United Kingdom
| | - Peter M Woollard
- Computational Biology, Target Sciences, GSK R&D, GlaxoSmithKline , Stevenage, Hertfordshire , United Kingdom
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science, University of Newcastle-upon-Tyne , Newcastle upon Tyne , United Kingdom
| |
Collapse
|
13
|
Sorokina M, Medigue C, Vallenet D. A new network representation of the metabolism to detect chemical transformation modules. BMC Bioinformatics 2015; 16:385. [PMID: 26573681 PMCID: PMC4647279 DOI: 10.1186/s12859-015-0809-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 10/29/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metabolism is generally modeled by directed networks where nodes represent reactions and/or metabolites. In order to explore metabolic pathway conservation and divergence among organisms, previous studies were based on graph alignment to find similar pathways. Few years ago, the concept of chemical transformation modules, also called reaction modules, was introduced and correspond to sequences of chemical transformations which are conserved in metabolism. We propose here a novel graph representation of the metabolic network where reactions sharing a same chemical transformation type are grouped in Reaction Molecular Signatures (RMS). RESULTS RMS were automatically computed for all reactions and encode changes in atoms and bonds. A reaction network containing all available metabolic knowledge was then reduced by an aggregation of reaction nodes and edges to obtain a RMS network. Paths in this network were explored and a substantial number of conserved chemical transformation modules was detected. Furthermore, this graph-based formalism allows us to define several path scores reflecting different biological conservation meanings. These scores are significantly higher for paths corresponding to known metabolic pathways and were used conjointly to build association rules that should predict metabolic pathway types like biosynthesis or degradation. CONCLUSIONS This representation of metabolism in a RMS network offers new insights to capture relevant metabolic contexts. Furthermore, along with genomic context methods, it should improve the detection of gene clusters corresponding to new metabolic pathways.
Collapse
Affiliation(s)
- Maria Sorokina
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| | - Claudine Medigue
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| | - David Vallenet
- Direction des Sciences du Vivant, Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Institut de Génomique, Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, 2 rue Gaston Crémieux, Evry, 91057, France.
- CNRS-UMR8030, 2 rue Gaston Crémieux, Evry, 91057, France.
- UEVE, Université d'Evry Val d'Essonne, Boulevard François Mitterrand, Evry, 91057, France.
| |
Collapse
|
14
|
Ganter M, Kaltenbach HM, Stelling J. Predicting network functions with nested patterns. Nat Commun 2015; 5:3006. [PMID: 24398547 DOI: 10.1038/ncomms4006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Accepted: 11/24/2013] [Indexed: 12/20/2022] Open
Abstract
Identifying suitable patterns in complex biological interaction networks helps understanding network functions and allows for predictions at the pattern level: by recognizing a known pattern, one can assign its previously established function. However, current approaches fail for previously unseen patterns, when patterns overlap and when they are embedded into a new network context. Here we show how to conceptually extend pattern-based approaches. We define metabolite patterns in metabolic networks that formalize co-occurrences of metabolites. Our probabilistic framework decodes the implicit information in the networks' metabolite patterns to predict metabolic functions. We demonstrate the predictive power by identifying 'indicator patterns', for instance, for enzyme classification, by predicting directions of novel reactions and of known reactions in new network contexts, and by ranking candidate network extensions for gap filling. Beyond their use in improving genome annotations and metabolic network models, we expect that the concepts transfer to other network types.
Collapse
Affiliation(s)
- Mathias Ganter
- 1] Department of Biosystems Science & Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland [2]
| | - Hans-Michael Kaltenbach
- 1] Department of Biosystems Science & Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland [2]
| | - Jörg Stelling
- Department of Biosystems Science & Engineering and Swiss Institute of Bioinformatics, ETH Zurich, Mattenstr. 26, 4058 Basel, Switzerland
| |
Collapse
|
15
|
De la Fuente IM. Elements of the cellular metabolic structure. Front Mol Biosci 2015; 2:16. [PMID: 25988183 PMCID: PMC4428431 DOI: 10.3389/fmolb.2015.00016] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2014] [Accepted: 04/12/2015] [Indexed: 12/19/2022] Open
Abstract
A large number of studies have demonstrated the existence of metabolic covalent modifications in different molecular structures, which are able to store biochemical information that is not encoded by DNA. Some of these covalent mark patterns can be transmitted across generations (epigenetic changes). Recently, the emergence of Hopfield-like attractor dynamics has been observed in self-organized enzymatic networks, which have the capacity to store functional catalytic patterns that can be correctly recovered by specific input stimuli. Hopfield-like metabolic dynamics are stable and can be maintained as a long-term biochemical memory. In addition, specific molecular information can be transferred from the functional dynamics of the metabolic networks to the enzymatic activity involved in covalent post-translational modulation, so that determined functional memory can be embedded in multiple stable molecular marks. The metabolic dynamics governed by Hopfield-type attractors (functional processes), as well as the enzymatic covalent modifications of specific molecules (structural dynamic processes) seem to represent the two stages of the dynamical memory of cellular metabolism (metabolic memory). Epigenetic processes appear to be the structural manifestation of this cellular metabolic memory. Here, a new framework for molecular information storage in the cell is presented, which is characterized by two functionally and molecularly interrelated systems: a dynamic, flexible and adaptive system (metabolic memory) and an essentially conservative system (genetic memory). The molecular information of both systems seems to coordinate the physiological development of the whole cell.
Collapse
Affiliation(s)
- Ildefonso M. De la Fuente
- Department of Cell Biology and Immunology, Institute of Parasitology and Biomedicine “López-Neyra,” Consejo Superior de Investigaciones CientíficasGranada, Spain
- Department of Mathematics, University of the Basque Country, UPV/Euskal Herriko UnibertsitateaLeioa, Spain
| |
Collapse
|
16
|
Côme E, Latouche P. Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. STAT MODEL 2015. [DOI: 10.1177/1471082x15577017] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The stochastic block model (SBM) is a mixture model for the clustering of nodes in networks. The SBM has now been employed for more than a decade to analyze very different types of networks in many scientific fields, including biology and the social sciences. Recently, an analytical expression based on the collapsing of the SBM parameters has been proposed, in combination with a sampling procedure that allows the clustering of the vertices and the estimation of the number of clusters to be performed simultaneously. Although the corresponding algorithm can technically accommodate up to 10 000 nodes and millions of edges, the Markov chain, however, tends to exhibit poor mixing properties, that is, low acceptance rates, for large networks. Therefore, the number of clusters tends to be highly overestimated, even for a very large number of samples. In this article, we rely on a similar expression, which we call the integrated complete data log likelihood, and propose a greedy inference algorithm that focuses on maximizing this exact quantity. This algorithm incurs a smaller computational cost than existing inference techniques for the SBM and can be employed to analyze large networks (several tens of thousands of nodes and millions of edges) with no convergence problems. Using toy datasets, the algorithm exhibits improvements over existing strategies, both in terms of clustering and model selection. An application to a network of blogs related to illustrations and comics is also provided.
Collapse
Affiliation(s)
- Etienne Côme
- Université Paris-Est, IFSTTAR, GRETTIA, Noisy-Le-Grand, France
| | - Pierre Latouche
- Laboratoire SAMM, Université Paris 1 Panthéon-Sorbonne, France
| |
Collapse
|
17
|
Panni S, Rombo SE. Searching for repetitions in biological networks: methods, resources and tools. Brief Bioinform 2013; 16:118-36. [PMID: 24300112 DOI: 10.1093/bib/bbt084] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
We present here a compact overview of the data, models and methods proposed for the analysis of biological networks based on the search for significant repetitions. In particular, we concentrate on three problems widely studied in the literature: 'network alignment', 'network querying' and 'network motif extraction'. We provide (i) details of the experimental techniques used to obtain the main types of interaction data, (ii) descriptions of the models and approaches introduced to solve such problems and (iii) pointers to both the available databases and software tools. The intent is to lay out a useful roadmap for identifying suitable strategies to analyse cellular data, possibly based on the joint use of different interaction data types or analysis techniques.
Collapse
|
18
|
Giugno R, Bonnici V, Bombieri N, Pulvirenti A, Ferro A, Shasha D. GRAPES: a software for parallel searching on biological graphs targeting multi-core architectures. PLoS One 2013; 8:e76911. [PMID: 24167551 PMCID: PMC3805575 DOI: 10.1371/journal.pone.0076911] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Accepted: 08/26/2013] [Indexed: 11/19/2022] Open
Abstract
Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i) do not fully exploit available parallel computing power and (ii) they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.
Collapse
Affiliation(s)
- Rosalba Giugno
- Department Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
- * E-mail:
| | - Vincenzo Bonnici
- Department Computer Science, University of Verona, Verona, Italy
| | - Nicola Bombieri
- Department Computer Science, University of Verona, Verona, Italy
| | - Alfredo Pulvirenti
- Department Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
| | - Alfredo Ferro
- Department Clinical and Molecular Biomedicine, University of Catania, Catania, Italy
| | - Dennis Shasha
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
| |
Collapse
|
19
|
Rudi AG, Shahrivari S, Jalili S, Moghadam Kashani ZR. RANGI: a fast list-colored graph motif finding algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:504-513. [PMID: 23929873 DOI: 10.1109/tcbb.2012.167] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Given a multiset of colors as the query and a list-colored graph, i.e., an undirected graph with a set of colors assigned to each of its vertices, in the NP-hard list-colored graph motif problem the goal is to find the largest connected subgraph such that one can select a color from the set of colors assigned to each of its vertices to obtain a subset of the query. This problem was introduced to find functional motifs in biological networks. We present a branch-and-bound algorithm named RANGI for finding and enumerating list-colored graph motifs. As our experimental results show, RANGI's pruning methods and heuristics make it quite fast in practice compared to the algorithms presented in the literature. We also present a parallel version of RANGI that achieves acceptable scalability.
Collapse
Affiliation(s)
- Ali Gholami Rudi
- Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran, PO Box 14115-194, Tehran, Iran.
| | | | | | | |
Collapse
|
20
|
|
21
|
|
22
|
|
23
|
Klein CC, Cottret L, Kielbassa J, Charles H, Gautier C, Ribeiro de Vasconcelos AT, Lacroix V, Sagot MF. Exploration of the core metabolism of symbiotic bacteria. BMC Genomics 2012; 13:438. [PMID: 22938206 PMCID: PMC3543179 DOI: 10.1186/1471-2164-13-438] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 08/18/2012] [Indexed: 12/01/2022] Open
Abstract
Background A large number of genome-scale metabolic networks is now available for many organisms, mostly bacteria. Previous works on minimal gene sets, when analysing host-dependent bacteria, found small common sets of metabolic genes. When such analyses are restricted to bacteria with similar lifestyles, larger portions of metabolism are expected to be shared and their composition is worth investigating. Here we report a comparative analysis of the small molecule metabolism of symbiotic bacteria, exploring common and variable portions as well as the contribution of different lifestyle groups to the reduction of a common set of metabolic capabilities. Results We found no reaction shared by all the bacteria analysed. Disregarding those with the smallest genomes, we still do not find a reaction core, however we did find a core of biochemical capabilities. While obligate intracellular symbionts have no core of reactions within their group, extracellular and cell-associated symbionts do have a small core composed of disconnected fragments. In agreement with previous findings in Escherichia coli, their cores are enriched in biosynthetic processes whereas the variable metabolisms have similar ratios of biosynthetic and degradation reactions. Conversely, the variable metabolism of obligate intracellular symbionts is enriched in anabolism. Conclusion Even when removing the symbionts with the most reduced genomes, there is no core of reactions common to the analysed symbiotic bacteria. The main reason is the very high specialisation of obligate intracellular symbionts, however, host-dependence alone is not an explanation for such absence. The composition of the metabolism of cell-associated and extracellular bacteria shows that while they have similar needs in terms of the building blocks of their cells, they have to adapt to very distinct environments. On the other hand, in obligate intracellular bacteria, catabolism has largely disappeared, whereas synthetic routes appear to have been selected for depending on the nature of the symbiosis. As more genomes are added, we expect, based on our simulations, that the core of cell-associated and extracellular bacteria continues to diminish, converging to approximately 60 reactions.
Collapse
|
24
|
Latouche P, Birmelé E, Ambroise C. Variational Bayesian inference and complexity control for stochastic block models. STAT MODEL 2012. [DOI: 10.1177/1471082x1001200105] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
It is now widely accepted that knowledge can be acquired from networks by clustering their vertices according to the connection profiles. Many methods have been proposed and in this paper we concentrate on the Stochastic Block Model (SBM). The clustering of vertices and the estimation of SBM model parameters have been subject to previous work, and numerous inference strategies such as variational expectation maximization (EM) and classification EM have been proposed. However, SBM still suffers from a lack of criteria to estimate the number of components in the mixture. To our knowledge, only one model-based criterion, Integrated Complete-data Likelihood (ICL), has been derived for SBM in the literature. It relies on an asymptotic approximation of the integrated complete-data likelihood and recent studies have shown that it tends to be too conservative in the case of small networks. To tackle this issue, we propose a new criterion that we call Integrated Likelihood Variational Bayes (ILvb), based on a non-asymptotic approximation of the marginal likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm.
Collapse
Affiliation(s)
- P Latouche
- Laboratoire Statistique et Génome, UMR CNRS 8071, UEVE
| | - E Birmelé
- Laboratoire Statistique et Génome, UMR CNRS 8071, UEVE
| | - C Ambroise
- Laboratoire Statistique et Génome, UMR CNRS 8071, UEVE
| |
Collapse
|
25
|
Betzler N, van Bevern R, Fellows MR, Komusiewicz C, Niedermeier R. Parameterized algorithmics for finding connected motifs in biological networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1296-1308. [PMID: 21282862 DOI: 10.1109/tcbb.2011.19] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
We study the NP-hard LIST-COLORED GRAPH MOTIF problem which, given an undirected list-colored graph G = (V, E) and a multiset M of colors, asks for maximum-cardinality sets S ⊆ V and M' ⊆ M such that G[S] is connected and contains exactly (with respect to multiplicity) the colors in M'. LIST-COLORED GRAPH MOTIF has applications in the analysis of biological networks. We study LIST-COLORED GRAPH MOTIF with respect to three different parameterizations. For the parameters motif size |M| and solution size |S|, we present fixed-parameter algorithms, whereas for the parameter |V| - |M|, we show W[1]-hardness for general instances and achieve fixed-parameter tractability for a special case of LIST-COLORED GRAPH MOTIF. We implemented the fixed-parameter algorithms for parameters |M| and |S|, developed further speed-up heuristics for these algorithms, and applied them in the context of querying protein-interaction networks, demonstrating their usefulness for realistic instances. Furthermore, we show that extending the request for motif connectedness to stronger demands, such as biconnectedness or bridge-connectedness leads to W[1]-hard problems when the parameter is the motif size |M|.
Collapse
Affiliation(s)
- Nadja Betzler
- Institut für Softwaretechnik und Theoretische Informatik, Technische Universtität Berlin, 10587 Berlin, Germany.
| | | | | | | | | |
Collapse
|
26
|
|
27
|
Latouche P, Birmelé E, Ambroise C. Overlapping stochastic block models with application to the French political blogosphere. Ann Appl Stat 2011. [DOI: 10.1214/10-aoas382] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
28
|
|
29
|
Konkoli Z. Multiparticle diffusion limited [Formula: see text] reaction in small volumes. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:495102. [PMID: 21406782 DOI: 10.1088/0953-8984/22/49/495102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A multiparticle reaction model in which particles A annihilate in clusters of size k as [Formula: see text] is investigated analytically. The system is studied for arbitrary reaction order k > 2, dimension d, and size L. Particles diffuse with diffusion constant D, and annihilate with rate σ which depends on the positions of kA particles in the cluster prior to a reaction. The particles are assumed to be spatially extended objects with radius a. Exclusion effects are not taken into account since A particles are allowed to overlap. The master equation is rephrased in the language of a field theory which, in turn, is used to derive the equations of motion for many-point densities. An approximate form of the equations of motion was solved analytically in the diffusion-controlled limit (infinite reaction rate). An explicit expression for the effective reaction rate has been found in the form of the Laplace transform. It was shown that the number of particles saturates to a constant value for large times. The value is approached through an exponential decay. The exponential decay constant is the non-algebraic function of particle size a and system size L.
Collapse
Affiliation(s)
- Zoran Konkoli
- Department of Microtechnology and Nanoscience-MC2, Bionano Systems Laboratory, Chalmers University of Technology, Sweden
| |
Collapse
|
30
|
Blin G, Sikora F, Vialette S. Querying graphs in protein-protein interactions networks using feedback vertex set. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:628-635. [PMID: 20498512 DOI: 10.1109/tcbb.2010.53] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Recent techniques increase rapidly the amount of our knowledge on interactions between proteins. The interpretation of these new information depends on our ability to retrieve known substructures in the data, the Protein-Protein Interactions (PPIs) networks. In an algorithmic point of view, it is an hard task since it often leads to NP-hard problems. To overcome this difficulty, many authors have provided tools for querying patterns with a restricted topology, i.e., paths or trees in PPI networks. Such restriction leads to the development of fixed parameter tractable (FPT) algorithms, which can be practicable for restricted sizes of queries. Unfortunately, Graph Homomorphism is a W[1]-hard problem, and hence, no FPT algorithm can be found when patterns are in the shape of general graphs. However, Dost et al. gave an algorithm (which is not implemented) to query graphs with a bounded treewidth in PPI networks (the treewidth of the query being involved in the time complexity). In this paper, we propose another algorithm for querying pattern in the shape of graphs, also based on dynamic programming and the color-coding technique. To transform graphs queries into trees without loss of informations, we use feedback vertex set coupled to a node duplication mechanism. Hence, our algorithm is FPT for querying graphs with a bounded size of their feedback vertex set. It gives an alternative to the treewidth parameter, which can be better or worst for a given query. We provide a python implementation which allows us to validate our implementation on real data. Especially, we retrieve some human queries in the shape of graphs into the fly PPI network.
Collapse
Affiliation(s)
- Guillaume Blin
- Université Paris-Est, LIGM-UMR CNRS 8049, 5 Bd Descartes, Champs-sur-Marne, F-77454 Marne-la-Vallée cedex 2, France.
| | | | | |
Collapse
|
31
|
Bruckner S, Hüffner F, Karp RM, Shamir R, Sharan R. Topology-free querying of protein interaction networks. J Comput Biol 2010; 17:237-52. [PMID: 20377443 DOI: 10.1089/cmb.2009.0170] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In the network querying problem, one is given a protein complex or pathway of species A and a protein-protein interaction network of species B; the goal is to identify subnetworks of B that are similar to the query in terms of sequence, topology, or both. Existing approaches mostly depend on knowledge of the interaction topology of the query in the network of species A; however, in practice, this topology is often not known. To address this problem, we develop a topology-free querying algorithm, which we call Torque. Given a query, represented as a set of proteins, Torque seeks a matching set of proteins that are sequence-similar to the query proteins and span a connected region of the network, while allowing both insertions and deletions. The algorithm uses alternatively dynamic programming and integer linear programming for the search task. We test Torque with queries from yeast, fly, and human, where we compare it to the QNet topology-based approach, and with queries from less studied species, where only topology-free algorithms apply. Torque detects many more matches than QNet, while giving results that are highly functionally coherent.
Collapse
Affiliation(s)
- Sharon Bruckner
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | | | | | |
Collapse
|
32
|
|
33
|
Ambalath AM, Balasundaram R, Rao H. C, Koppula V, Misra N, Philip G, Ramanujan MS. On the Kernelization Complexity of Colorful Motifs. PARAMETERIZED AND EXACT COMPUTATION 2010. [DOI: 10.1007/978-3-642-17493-3_4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
34
|
Bruckner S, Hüffner F, Karp RM, Shamir R, Sharan R. TORQUE: topology-free querying of protein interaction networks. Nucleic Acids Res 2009; 37:W106-8. [PMID: 19491310 PMCID: PMC2703961 DOI: 10.1093/nar/gkp474] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Torque is a tool for cross-species querying of protein–protein interaction networks. It aims to answer the following question: given a set of proteins constituting a known complex or a pathway in one species, can a similar complex or pathway be found in the protein network of another species? To this end, Torque seeks a matching set of proteins that are sequence similar to the query proteins and span a connected region of the target network, while allowing for both insertions and deletions. Unlike existing approaches, Torque does not require knowledge of the interconnections among the query proteins. It can handle large queries of up to 25 proteins. The Torque web server is freely available for use at http://www.cs.tau.ac.il/∼bnet/torque.html.
Collapse
Affiliation(s)
- Sharon Bruckner
- The Blavatnik School of Computer Science, Tel Aviv University, 69978 Tel Aviv, Israel.
| | | | | | | | | |
Collapse
|
35
|
Bachman P, Liu Y. Structure discovery in PPI networks using pattern-based network decomposition. Bioinformatics 2009; 25:1814-21. [PMID: 19447784 DOI: 10.1093/bioinformatics/btp297] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The large, complex networks of interactions between proteins provide a lens through which one can examine the structure and function of biological systems. Previous analyses of these continually growing networks have primarily followed either of two approaches: large-scale statistical analysis of holistic network properties, or small-scale analysis of local topological features. Meanwhile, investigation of meso-scale network structure (above that of individual functional modules, while maintaining the significance of individual proteins) has been hindered by the computational complexity of structural search in networks. Examining protein-protein interaction (PPI) networks at the meso-scale may provide insights into the presence and form of relationships between individual protein complexes and functional modules. RESULTS In this article, we present an efficient algorithm for performing sub-graph isomorphism queries on a network and show its computational advantage over previous methods. We also present a novel application of this form of topological search which permits analysis of a network's structure at a scale between that of individual functional modules and that of network-wide properties. This analysis provides support for the presence of hierarchical modularity in the PPI network of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Philip Bachman
- Department of Computer Science and Department of Molecular Biology, University of Texas at Dallas, Richardson, TX 75083-0688, USA
| | | |
Collapse
|
36
|
Schbath S, Lacroix V, Sagot MF. Assessing the exceptionality of coloured motifs in networks. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2009. [PMID: 19190777 DOI: 10.1186/1687-4153-2009-616234] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive P-values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound) Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution better approximates the distribution of the motif count compared to Gaussian or Poisson distributions. The Pólya-Aeppli distribution, and more generally the compound Poisson distributions, are indeed well designed to model counts of clumping events. Altogether, these results enable to derive a P-value for a coloured motif, without spending time on simulations.
Collapse
Affiliation(s)
- Sophie Schbath
- Institut National de la Recherche Agronomique, Unité Mathématique, Informatique et Génome, Jouy-en-Josas, France
| | | | | |
Collapse
|
37
|
Assessing the exceptionality of coloured motifs in networks. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2009:616234. [PMID: 19190777 DOI: 10.1155/2009/616234] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2008] [Revised: 08/29/2008] [Accepted: 10/11/2008] [Indexed: 11/17/2022]
Abstract
Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive P-values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound) Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution better approximates the distribution of the motif count compared to Gaussian or Poisson distributions. The Pólya-Aeppli distribution, and more generally the compound Poisson distributions, are indeed well designed to model counts of clumping events. Altogether, these results enable to derive a P-value for a coloured motif, without spending time on simulations.
Collapse
|
38
|
Acuña V, Chierichetti F, Lacroix V, Marchetti-Spaccamela A, Sagot MF, Stougie L. Modes and cuts in metabolic networks: Complexity and algorithms. Biosystems 2009; 95:51-60. [DOI: 10.1016/j.biosystems.2008.06.015] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Revised: 06/25/2008] [Accepted: 06/25/2008] [Indexed: 11/16/2022]
|
39
|
Dondi R, Fertin G, Vialette S. Maximum Motif Problem in Vertex-Colored Graphs. COMBINATORIAL PATTERN MATCHING 2009. [DOI: 10.1007/978-3-642-02441-2_20] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
|
40
|
|
41
|
Lacroix V, Cottret L, Thébault P, Sagot MF. An introduction to metabolic networks and their structural analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:594-617. [PMID: 18989046 DOI: 10.1109/tcbb.2008.79] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
There has been a renewed interest for metabolism in the computational biology community, leading to an avalanche of papers coming from methodological network analysis as well as experimental and theoretical biology. This paper is meant to serve as an initial guide for both the biologists interested in formal approaches and the mathematicians or computer scientists wishing to inject more realism into their models. The paper is focused on the structural aspects of metabolism only. The literature is vast enough already, and the thread through it difficult to follow even for the more experienced worker in the field. We explain methods for acquiring data and reconstructing metabolic networks, and review the various models that have been used for their structural analysis. Several concepts such as modularity are introduced, as are the controversies that have beset the field these past few years, for instance, on whether metabolic networks are small-world or scale-free, and on which model better explains the evolution of metabolism. Clarifying the work that has been done also helps in identifying open questions and in proposing relevant future directions in the field, which we do along the paper and in the conclusion.
Collapse
Affiliation(s)
- Vincent Lacroix
- Genome Bioinformatics Research Group, Centre de Regulacio Genomica (CRG), PRBB, Aiguader 88, 08003 Barcelona, Spain.
| | | | | | | |
Collapse
|
42
|
Banks E, Nabieva E, Chazelle B, Singh M. Organization of physical interactomes as uncovered by network schemas. PLoS Comput Biol 2008; 4:e1000203. [PMID: 18949022 PMCID: PMC2561054 DOI: 10.1371/journal.pcbi.1000203] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 09/09/2008] [Indexed: 11/18/2022] Open
Abstract
Large-scale protein-protein interaction networks provide new opportunities for understanding cellular organization and functioning. We introduce network schemas to elucidate shared mechanisms within interactomes. Network schemas specify descriptions of proteins and the topology of interactions among them. We develop algorithms for systematically uncovering recurring, over-represented schemas in physical interaction networks. We apply our methods to the S. cerevisiae interactome, focusing on schemas consisting of proteins described via sequence motifs and molecular function annotations and interacting with one another in one of four basic network topologies. We identify hundreds of recurring and over-represented network schemas of various complexity, and demonstrate via graph-theoretic representations how more complex schemas are organized in terms of their lower-order constituents. The uncovered schemas span a wide range of cellular activities, with many signaling and transport related higher-order schemas. We establish the functional importance of the schemas by showing that they correspond to functionally cohesive sets of proteins, are enriched in the frequency with which they have instances in the H. sapiens interactome, and are useful for predicting protein function. Our findings suggest that network schemas are a powerful paradigm for organizing, interrogating, and annotating cellular networks.
Collapse
Affiliation(s)
- Eric Banks
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Elena Nabieva
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Bernard Chazelle
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Mona Singh
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
43
|
Banks E, Nabieva E, Peterson R, Singh M. NetGrep: fast network schema searches in interactomes. Genome Biol 2008; 9:R138. [PMID: 18801179 PMCID: PMC2592716 DOI: 10.1186/gb-2008-9-9-r138] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2008] [Revised: 08/22/2008] [Accepted: 09/18/2008] [Indexed: 11/10/2022] Open
Abstract
NetGrep (http://genomics.princeton.edu/singhlab/netgrep/) is a system for searching protein interaction networks for matches to user-supplied 'network schemas'. Each schema consists of descriptions of proteins (for example, their molecular functions or putative domains) along with the desired topology and types of interactions among them. Schemas can thus describe domain-domain interactions, signaling and regulatory pathways, or more complex network patterns. NetGrep provides an advanced graphical interface for specifying schemas and fast algorithms for extracting their matches.
Collapse
Affiliation(s)
- Eric Banks
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Lab, Princeton, NJ 08544, USA
| | - Elena Nabieva
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Lab, Princeton, NJ 08544, USA
| | - Ryan Peterson
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
- Current address: Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Lab, Princeton, NJ 08544, USA
| |
Collapse
|
44
|
Fellows MR, Fertin G, Hermelin D, Vialette S. Sharp Tractability Borderlines for Finding Connected Motifs in Vertex-Colored Graphs. AUTOMATA, LANGUAGES AND PROGRAMMING 2007. [DOI: 10.1007/978-3-540-73420-8_31] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
45
|
Bourqui R, Cottret L, Lacroix V, Auber D, Mary P, Sagot MF, Jourdan F. Metabolic network visualization eliminating node redundance and preserving metabolic pathways. BMC SYSTEMS BIOLOGY 2007; 1:29. [PMID: 17608928 PMCID: PMC1934383 DOI: 10.1186/1752-0509-1-29] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2007] [Accepted: 07/03/2007] [Indexed: 11/10/2022]
Abstract
BACKGROUND The tools that are available to draw and to manipulate the representations of metabolism are usually restricted to metabolic pathways. This limitation becomes problematic when studying processes that span several pathways. The various attempts that have been made to draw genome-scale metabolic networks are confronted with two shortcomings: 1- they do not use contextual information which leads to dense, hard to interpret drawings, 2- they impose to fit to very constrained standards, which implies, in particular, duplicating nodes making topological analysis considerably more difficult. RESULTS We propose a method, called MetaViz, which enables to draw a genome-scale metabolic network and that also takes into account its structuration into pathways. This method consists in two steps: a clustering step which addresses the pathway overlapping problem and a drawing step which consists in drawing the clustered graph and each cluster. CONCLUSION The method we propose is original and addresses new drawing issues arising from the no-duplication constraint. We do not propose a single drawing but rather several alternative ways of presenting metabolism depending on the pathway on which one wishes to focus. We believe that this provides a valuable tool to explore the pathway structure of metabolism.
Collapse
Affiliation(s)
- Romain Bourqui
- LaBRI, Université Bordeaux I, 351 Cours de la libération, 33405 Talence CEDEX, France
| | - Ludovic Cottret
- BAOBAB Team, Inria Rhône-Alpes, Projet HELIX, Université de Lyon ; université Lyon 1 ; CNRS ; UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, Villeurbanne F-69622, France
| | - Vincent Lacroix
- BAOBAB Team, Inria Rhône-Alpes, Projet HELIX, Université de Lyon ; université Lyon 1 ; CNRS ; UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, Villeurbanne F-69622, France
| | - David Auber
- LaBRI, Université Bordeaux I, 351 Cours de la libération, 33405 Talence CEDEX, France
| | - Patrick Mary
- LaBRI, Université Bordeaux I, 351 Cours de la libération, 33405 Talence CEDEX, France
| | - Marie-France Sagot
- BAOBAB Team, Inria Rhône-Alpes, Projet HELIX, Université de Lyon ; université Lyon 1 ; CNRS ; UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, Villeurbanne F-69622, France
| | - Fabien Jourdan
- UMR1089 Xénobiotiques INRA-ENVT, 180 chemin de Tournefeuille – St-Martin-du-Touch, BP 3, 31931 Toulouse CEDEX, France
| |
Collapse
|