1
|
Nartallo-Kaluarachchi R, Asllani M, Deco G, Kringelbach ML, Goriely A, Lambiotte R. Broken detailed balance and entropy production in directed networks. Phys Rev E 2024; 110:034313. [PMID: 39425339 DOI: 10.1103/physreve.110.034313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 09/06/2024] [Indexed: 10/21/2024]
Abstract
The structure of a complex network plays a crucial role in determining its dynamical properties. In this paper , we show that the the degree to which a network is directed and hierarchically organized is closely associated with the degree to which its dynamics break detailed balance and produce entropy. We consider a range of dynamical processes and show how different directed network features affect their entropy production rate. We begin with an analytical treatment of a two-node network followed by numerical simulations of synthetic networks using the preferential attachment and Erdös-Renyi algorithms. Next, we analyze a collection of 97 empirical networks to determine the effect of complex real-world topologies. Finally, we present a simple method for inferring broken detailed balance and directed network structure from multivariate time series and apply our method to identify non-equilibrium dynamics and hierarchical organisation in both human neuroimaging and financial time series. Overall, our results shed light on the consequences of directed network structure on non-equilibrium dynamics and highlight the importance and ubiquity of hierarchical organisation and non-equilibrium dynamics in real-world systems.
Collapse
Affiliation(s)
| | | | | | - Morten L Kringelbach
- Centre for Eudaimonia and Human Flourishing, University of Oxford, 7 Stoke Pl, Oxford OX3 9BX, United Kingdom
- Center for Music in the Brain, Aarhus University, & The Royal Academy of Music, Aarhus/Aalborg, Denmark
- Department of Psychiatry, University of Oxford, Oxford OX3 7JX United Kingdom
| | | | | |
Collapse
|
2
|
Stock M, Popp N, Fiorentino J, Scialdone A. Topological benchmarking of algorithms to infer gene regulatory networks from single-cell RNA-seq data. Bioinformatics 2024; 40:btae267. [PMID: 38627250 PMCID: PMC11096270 DOI: 10.1093/bioinformatics/btae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 02/28/2024] [Accepted: 04/16/2024] [Indexed: 05/18/2024] Open
Abstract
MOTIVATION In recent years, many algorithms for inferring gene regulatory networks from single-cell transcriptomic data have been published. Several studies have evaluated their accuracy in estimating the presence of an interaction between pairs of genes. However, these benchmarking analyses do not quantify the algorithms' ability to capture structural properties of networks, which are fundamental, e.g., for studying the robustness of a gene network to external perturbations. Here, we devise a three-step benchmarking pipeline called STREAMLINE that quantifies the ability of algorithms to capture topological properties of networks and identify hubs. RESULTS To this aim, we use data simulated from different types of networks as well as experimental data from three different organisms. We apply our benchmarking pipeline to four inference algorithms and provide guidance on which algorithm should be used depending on the global network property of interest. AVAILABILITY AND IMPLEMENTATION STREAMLINE is available at https://github.com/ScialdoneLab/STREAMLINE. The data generated in this study are available at https://doi.org/10.5281/zenodo.10710444.
Collapse
Affiliation(s)
- Marco Stock
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich 85354, Germany
| | - Niclas Popp
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Jonathan Fiorentino
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 81377, Germany
- Institute of Functional Epigenetics, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Munich 85764, Germany
| |
Collapse
|
3
|
Hasman M, Mayr M, Theofilatos K. Uncovering Protein Networks in Cardiovascular Proteomics. Mol Cell Proteomics 2023; 22:100607. [PMID: 37356494 PMCID: PMC10460687 DOI: 10.1016/j.mcpro.2023.100607] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 05/01/2023] [Accepted: 06/20/2023] [Indexed: 06/27/2023] Open
Abstract
Biological networks have been widely used in many different diseases to identify potential biomarkers and design drug targets. In the present review, we describe the main computational techniques for reconstructing and analyzing different types of protein networks and summarize the previous applications of such techniques in cardiovascular diseases. Existing tools are critically compared, discussing when each method is preferred such as the use of co-expression networks for functional annotation of protein clusters and the use of directed networks for inferring regulatory associations. Finally, we are presenting examples of reconstructing protein networks of different types (regulatory, co-expression, and protein-protein interaction networks). We demonstrate the necessity to reconstruct networks separately for each cardiovascular tissue type and disease entity and provide illustrative examples of the importance of taking into consideration relevant post-translational modifications. Finally, we demonstrate and discuss how the findings of protein networks could be interpreted using single-cell RNA-sequencing data.
Collapse
Affiliation(s)
- Maria Hasman
- King's British Heart Foundation Centre, Kings College London, London, United Kingdom
| | - Manuel Mayr
- King's British Heart Foundation Centre, Kings College London, London, United Kingdom
| | | |
Collapse
|
4
|
Olivença DV, Davis JD, Voit EO. Inference of dynamic interaction networks: A comparison between Lotka-Volterra and multivariate autoregressive models. FRONTIERS IN BIOINFORMATICS 2022; 2:1021838. [PMID: 36619477 PMCID: PMC9815445 DOI: 10.3389/fbinf.2022.1021838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022] Open
Abstract
Networks are ubiquitous throughout biology, spanning the entire range from molecules to food webs and global environmental systems. Yet, despite substantial efforts by the scientific community, the inference of these networks from data still presents a problem that is unsolved in general. One frequent strategy of addressing the structure of networks is the assumption that the interactions among molecular or organismal populations are static and correlative. While often successful, these static methods are no panacea. They usually ignore the asymmetry of relationships between two species and inferences become more challenging if the network nodes represent dynamically changing quantities. Overcoming these challenges, two very different network inference approaches have been proposed in the literature: Lotka-Volterra (LV) models and Multivariate Autoregressive (MAR) models. These models are computational frameworks with different mathematical structures which, nevertheless, have both been proposed for the same purpose of inferring the interactions within coexisting population networks from observed time-series data. Here, we assess these dynamic network inference methods for the first time in a side-by-side comparison, using both synthetically generated and ecological datasets. Multivariate Autoregressive and Lotka-Volterra models are mathematically equivalent at the steady state, but the results of our comparison suggest that Lotka-Volterra models are generally superior in capturing the dynamics of networks with non-linear dynamics, whereas Multivariate Autoregressive models are better suited for analyses of networks of populations with process noise and close-to linear behavior. To the best of our knowledge, this is the first study comparing LV and MAR approaches. Both frameworks are valuable tools that address slightly different aspects of dynamic networks.
Collapse
Affiliation(s)
- Daniel V. Olivença
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States
| | | | | |
Collapse
|
5
|
Zuhud DAZ, Musa MH, Ismail M, Bahaludin H, Razak FA. The Causality and Uncertainty of the COVID-19 Pandemic to Bursa Malaysia Financial Services Index's Constituents. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1100. [PMID: 36010764 PMCID: PMC9407104 DOI: 10.3390/e24081100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/03/2022] [Accepted: 08/05/2022] [Indexed: 05/28/2023]
Abstract
Valued in hundreds of billions of Malaysian ringgit, the Bursa Malaysia Financial Services Index's constituents comprise several of the strongest performing financial constituents in Bursa Malaysia's Main Market. Although these constituents persistently reside mostly within the large market capitalization (cap), the existence of the individual constituent's causal influence or intensity relative to each other's performance during uncertain or even certain times is unknown. Thus, the key purpose of this paper is to identify and analyze the individual constituent's causal intensity, from early 2018 (pre-COVID-19) to the end of the year 2021 (post-COVID-19) using Granger causality and Schreiber transfer entropy. Furthermore, network science is used to measure and visualize the fluctuating causal degree of the source and the effected constituents. The results show that both the Granger causality and Schreiber transfer entropy networks detected patterns of increasing causality from pre- to post-COVID-19 but with differing causal intensities. Unexpectedly, both networks showed that the small- and mid-caps had high causal intensity during and after COVID-19. Using Bursa Malaysia's sub-sector for further analysis, the Insurance sub-sector rapidly increased in causality as the year progressed, making it one of the index's largest sources of causality. Even after removing large amounts of weak causal intensities, Schreiber transfer entropy was still able to detect higher amounts of causal sources from the Insurance sub-sector, whilst Granger causal sources declined rapidly post-COVID-19. The method of using directed temporal networks for the visualization of temporal causal sources is demonstrated to be a powerful approach that can aid in investment decision making.
Collapse
Affiliation(s)
- Daeng Ahmad Zuhri Zuhud
- Department of Mathematical Sciences, Faculty of Science & Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
| | - Muhammad Hasannudin Musa
- Department of Mathematical Sciences, Faculty of Science & Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
| | - Munira Ismail
- Department of Mathematical Sciences, Faculty of Science & Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
| | - Hafizah Bahaludin
- Department of Computational and Theoretical Sciences, Kulliyyah of Science, International Islamic University Malaysia, Kuantan 25200, Pahang, Malaysia
| | - Fatimah Abdul Razak
- Department of Mathematical Sciences, Faculty of Science & Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia
| |
Collapse
|
6
|
Liu W, Sun X, Yang L, Li K, Yang Y, Fu X. NSCGRN: a network structure control method for gene regulatory network inference. Brief Bioinform 2022; 23:6585392. [PMID: 35554485 DOI: 10.1093/bib/bbac156] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/27/2022] [Accepted: 04/06/2022] [Indexed: 01/18/2023] Open
Abstract
Accurate inference of gene regulatory networks (GRNs) is an essential premise for understanding pathogenesis and curing diseases. Various computational methods have been developed for GRN inference, but the identification of redundant regulation remains a challenge faced by researchers. Although combining global and local topology can identify and reduce redundant regulations, the topologies' specific forms and cooperation modes are unclear and real regulations may be sacrificed. Here, we propose a network structure control method [network-structure-controlling-based GRN inference method (NSCGRN)] that stipulates the global and local topology's specific forms and cooperation mode. The method is carried out in a cooperative mode of 'global topology dominates and local topology refines'. Global topology requires layering and sparseness of the network, and local topology requires consistency of the subgraph association pattern with the network motifs (fan-in, fan-out, cascade and feedforward loop). Specifically, an ordered gene list is obtained by network topology centrality sorting. A Bernaola-Galvan mutation detection algorithm applied to the list gives the hierarchy of GRNs to control the upstream and downstream regulations within the global scope. Finally, four network motifs are integrated into the hierarchy to optimize local complex regulations and form a cooperative mode where global and local topologies play the dominant and refined roles, respectively. NSCGRN is compared with state-of-the-art methods on three different datasets (six networks in total), and it achieves the highest F1 and Matthews correlation coefficient. Experimental results show its unique advantages in GRN inference.
Collapse
Affiliation(s)
- Wei Liu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China.,School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Li Yang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Kaiwen Li
- Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yu Yang
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, 411105, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China
| |
Collapse
|
7
|
Karkowska R, Urjasz S. Linear and Nonlinear Effects in Connectedness Structure: Comparison between European Stock Markets. ENTROPY (BASEL, SWITZERLAND) 2022; 24:303. [PMID: 35205597 PMCID: PMC8870905 DOI: 10.3390/e24020303] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 02/17/2022] [Accepted: 02/18/2022] [Indexed: 12/04/2022]
Abstract
The purpose of this research is to compare the risk transfer structure in Central and Eastern European and Western European stock markets during the 2007-2009 financial crisis and the COVID-19 pandemic. Similar to the global financial crisis (GFC), the spread of coronavirus (COVID-19) created a significant level of risk, causing investors to suffer losses in a very short period of time. We use a variety of methods, including nonstandard like mutual information and transfer entropy. The results that we obtained indicate that there are significant nonlinear correlations in the capital markets that can be practically applied for investment portfolio optimization. From an investor perspective, our findings suggest that in the wake of global crisis and pandemic outbreak, the benefits of diversification will be limited by the transfer of funds between developed and developing country markets. Our study provides an insight into the risk transfer theory in developed and emerging markets as well as a cutting-edge methodology designed for analyzing the connectedness of markets. We contribute to the studies which have examined the different stock markets' response to different turbulences. The study confirms that specific market effects can still play a significant role because of the interconnection of different sectors of the global economy.
Collapse
Affiliation(s)
- Renata Karkowska
- Faculty of Management, University of Warsaw, Szturmowa Street 1/3, 02-678 Warsaw, Poland
| | - Szczepan Urjasz
- Faculty of Management, University of Warsaw, Szturmowa Street 1/3, 02-678 Warsaw, Poland
| |
Collapse
|
8
|
Sarkar C, Parsad R, Mishra DC, Rai A. A Web Tool for Consensus Gene Regulatory Network Construction. Front Genet 2021; 12:745827. [PMID: 34899837 PMCID: PMC8652126 DOI: 10.3389/fgene.2021.745827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 10/19/2021] [Indexed: 11/13/2022] Open
Abstract
Gene regulatory network (GRN) construction involves various steps of complex computational steps. This step-by-step procedure requires prior knowledge of programming languages such as R. Development of a web tool may reduce this complexity in the analysis steps which can be easy accessible for the user. In this study, a web tool for constructing consensus GRN by combining the outcomes obtained from four methods, namely, correlation, principal component regression, partial least square, and ridge regression, has been developed. We have designed the web tool with an interactive and user-friendly web page using the php programming language. We have used R script for the analysis steps which run in the background of the user interface. Users can upload gene expression data for constructing consensus GRN. The output obtained from analysis will be available in downloadable form in the result window of the web tool.
Collapse
Affiliation(s)
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Dwijesh C Mishra
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| |
Collapse
|
9
|
Iliopoulos AC, Papasotiriou I. Functional Complex Networks Based on Operational Architectonics: Application on Electroencephalography-Brain-computer Interface for Imagined Speech. Neuroscience 2021; 484:98-118. [PMID: 34871742 DOI: 10.1016/j.neuroscience.2021.11.045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 01/18/2023]
Abstract
A new method for analyzing brain complex dynamics and states is presented. This method constructs functional brain graphs and is comprised of two pylons: (a) Operational architectonics (OA) concept of brain and mind functioning. (b) Network neuroscience. In particular, the algorithm utilizes OA framework for a non-parametric segmentation of EEGs, which leads to the identification of change points, namely abrupt jumps in EEG amplitude, called Rapid Transition Processes (RTPs). Subsequently, the time coordinates of RTPs are used for the generation of undirected weighted complex networks fulfilling a scale-free topology criterion, from which various network metrics of brain connectivity are estimated. These metrics form feature vectors, which can be used in machine learning algorithms for classification and/or prediction. The method is tested in classification problems on an EEG-based BCI data set, acquired from individuals during imagery pronunciation tasks of various words/vowels. The classification results, based on a Naïve Bayes classifier, show that the overall accuracies were found to be above chance level in all tested cases. This method was also compared with other state-of-the-art computational approaches commonly used for functional network generation, exhibiting competitive performance. The method can be useful to neuroscientists wishing to enhance their repository of brain research algorithms.
Collapse
Affiliation(s)
- A C Iliopoulos
- Research Genetic Cancer Centre S.A. Industrial Area of Florina, 53100 Florina, Greece
| | - I Papasotiriou
- Research Genetic Cancer Centre International GmbH, Zug 6300, Switzerland.
| |
Collapse
|
10
|
Deutschmann IM, Lima-Mendez G, Krabberød AK, Raes J, Vallina SM, Faust K, Logares R. Disentangling environmental effects in microbial association networks. MICROBIOME 2021; 9:232. [PMID: 34823593 PMCID: PMC8620190 DOI: 10.1186/s40168-021-01141-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 07/20/2021] [Indexed: 05/05/2023]
Abstract
BACKGROUND Ecological interactions among microorganisms are fundamental for ecosystem function, yet they are mostly unknown or poorly understood. High-throughput-omics can indicate microbial interactions through associations across time and space, which can be represented as association networks. Associations could result from either ecological interactions between microorganisms, or from environmental selection, where the association is environmentally driven. Therefore, before downstream analysis and interpretation, we need to distinguish the nature of the association, particularly if it is due to environmental selection or not. RESULTS We present EnDED (environmentally driven edge detection), an implementation of four approaches as well as their combination to predict which links between microorganisms in an association network are environmentally driven. The four approaches are sign pattern, overlap, interaction information, and data processing inequality. We tested EnDED on networks from simulated data of 50 microorganisms. The networks contained on average 50 nodes and 1087 edges, of which 60 were true interactions but 1026 false associations (i.e., environmentally driven or due to chance). Applying each method individually, we detected a moderate to high number of environmentally driven edges-87% sign pattern and overlap, 67% interaction information, and 44% data processing inequality. Combining these methods in an intersection approach resulted in retaining more interactions, both true and false (32% of environmentally driven associations). After validation with the simulated datasets, we applied EnDED on a marine microbial network inferred from 10 years of monthly observations of microbial-plankton abundance. The intersection combination predicted that 8.3% of the associations were environmentally driven, while individual methods predicted 24.8% (data processing inequality), 25.7% (interaction information), and up to 84.6% (sign pattern as well as overlap). The fraction of environmentally driven edges among negative microbial associations in the real network increased rapidly with the number of environmental factors. CONCLUSIONS To reach accurate hypotheses about ecological interactions, it is important to determine, quantify, and remove environmentally driven associations in marine microbial association networks. For that, EnDED offers up to four individual methods as well as their combination. However, especially for the intersection combination, we suggest using EnDED with other strategies to reduce the number of false associations and consequently the number of potential interaction hypotheses. Video abstract.
Collapse
Affiliation(s)
- Ina Maria Deutschmann
- Institute of Marine Sciences, CSIC, Passeig Marítim de la Barceloneta, 37-49, 08003 Barcelona, Spain
| | - Gipsi Lima-Mendez
- Research Unit in Biology of Microorganisms (URBM), University of Namur, 61 Rue de Bruxelles, 5000 Namur, Belgium
| | - Anders K. Krabberød
- Department of Biosciences/Section for Genetics and Evolutionary Biology (EVOGENE), University of Oslo, p.b. 1066 Blindern, N-0316 Oslo, Norway
| | - Jeroen Raes
- VIB Center for Microbiology, Herestraat 49-1028, 3000 Leuven, Belgium
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Molecular Bacteriology, Herestraat 49, 3000 Leuven, Belgium
| | - Sergio M. Vallina
- Spanish Institute of Oceanography (IEO - CSIC), Ave Principe de Asturias 70 Bis, 33212 Gijon, Spain
| | - Karoline Faust
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Molecular Bacteriology, Herestraat 49, 3000 Leuven, Belgium
| | - Ramiro Logares
- Institute of Marine Sciences, CSIC, Passeig Marítim de la Barceloneta, 37-49, 08003 Barcelona, Spain
| |
Collapse
|
11
|
Li J, Convertino M. Temperature increase drives critical slowing down of fish ecosystems. PLoS One 2021; 16:e0246222. [PMID: 34669703 PMCID: PMC8528280 DOI: 10.1371/journal.pone.0246222] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 09/12/2021] [Indexed: 01/13/2023] Open
Abstract
Fish ecosystems perform ecological functions that are critically important for the sustainability of marine ecosystems, such as global food security and carbon stock. During the 21st century, significant global warming caused by climate change has created pressing challenges for fish ecosystems that threaten species existence and global ecosystem health. Here, we study a coastal fish community in Maizuru Bay, Japan, and investigate the relationships between fluctuations of ST, abundance-based species interactions and salient fish biodiversity. Observations show that a local 20% increase in temperature from 2002 to 2014 underpins a long-term reduction in fish diversity (∼25%) played out by some native and invasive species (e.g. Chinese wrasse) becoming exceedingly abundant; this causes a large decay in commercially valuable species (e.g. Japanese anchovy) coupled to an increase in ecological productivity. The fish community is analyzed considering five temperature ranges to understand its atemporal seasonal sensitivity to ST changes, and long-term trends. An optimal information flow model is used to reconstruct species interaction networks that emerge as topologically different for distinct temperature ranges and species dynamics. Networks for low temperatures are more scale-free compared to ones for intermediate (15-20°C) temperatures in which the fish ecosystem experiences a first-order phase transition in interactions from locally stable to metastable and globally unstable for high temperatures states as suggested by abundance-spectrum transitions. The dynamic dominant eigenvalue of species interactions shows increasing instability for competitive species (spiking in summer due to intermediate-season critical transitions) leading to enhanced community variability and critical slowing down despite higher time-point resilience. Native competitive species whose abundance is distributed more exponentially have the highest total directed interactions and are keystone species (e.g. Wrasse and Horse mackerel) for the most salient links with cooperative decaying species. Competitive species, with higher eco-climatic memory and synchronization, are the most affected by temperature and play an important role in maintaining fish ecosystem stability via multitrophic cascades (via cooperative-competitive species imbalance), and as bioindicators of change. More climate-fitted species follow temperature increase causing larger divergence divergence between competitive and cooperative species. Decreasing dominant eigenvalues and lower relative network optimality for warmer oceans indicate fishery more attracted toward persistent oscillatory states, yet unpredictable, with lower cooperation, diversity and fish stock despite the increase in community abundance due to non-commercial and venomous species. We emphasize how changes in species interaction organization, primarily affected by temperature fluctuations, are the backbone of biodiversity dynamics and yet for functional diversity in contrast to taxonomic richness. Abundance and richness manifest gradual shifts while interactions show sudden shift. The work provides data-driven tools for analyzing and monitoring fish ecosystems under the pressure of global warming or other stressors. Abundance and interaction patterns derived by network-based analyses proved useful to assess ecosystem susceptibility and effective change, and formulate predictive dynamic information for science-based fishery policy aimed to maintain marine ecosystems stable and sustainable.
Collapse
Affiliation(s)
- Jie Li
- Nexus Group, Laboratory of Information Communication Networks, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
| | - Matteo Convertino
- Institute of Environment and Ecology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
| |
Collapse
|
12
|
Nakajima N, Hayashi T, Fujiki K, Shirahige K, Akiyama T, Akutsu T, Nakato R. Codependency and mutual exclusivity for gene community detection from sparse single-cell transcriptome data. Nucleic Acids Res 2021; 49:e104. [PMID: 34291282 PMCID: PMC8501962 DOI: 10.1093/nar/gkab601] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 05/25/2021] [Accepted: 07/04/2021] [Indexed: 12/04/2022] Open
Abstract
Single-cell RNA-seq (scRNA-seq) can be used to characterize cellular heterogeneity in thousands of cells. The reconstruction of a gene network based on coexpression patterns is a fundamental task in scRNA-seq analyses, and the mutual exclusivity of gene expression can be critical for understanding such heterogeneity. Here, we propose an approach for detecting communities from a genetic network constructed on the basis of coexpression properties. The community-based comparison of multiple coexpression networks enables the identification of functionally related gene clusters that cannot be fully captured through differential gene expression-based analysis. We also developed a novel metric referred to as the exclusively expressed index (EEI) that identifies mutually exclusive gene pairs from sparse scRNA-seq data. EEI quantifies and ranks the exclusive expression levels of all gene pairs from binary expression patterns while maintaining robustness against a low sequencing depth. We applied our methods to glioblastoma scRNA-seq data and found that gene communities were partially conserved after serum stimulation despite a considerable number of differentially expressed genes. We also demonstrate that the identification of mutually exclusive gene sets with EEI can improve the sensitivity of capturing cellular heterogeneity. Our methods complement existing approaches and provide new biological insights, even for a large, sparse dataset, in the single-cell analysis field.
Collapse
Affiliation(s)
- Natsu Nakajima
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Tomoatsu Hayashi
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Katsunori Fujiki
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Katsuhiko Shirahige
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Tetsu Akiyama
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
| | - Ryuichiro Nakato
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| |
Collapse
|
13
|
Sharma C, Sahni N. A mutual information based R-vine copula strategy to estimate VaR in high frequency stock market data. PLoS One 2021; 16:e0253307. [PMID: 34138970 PMCID: PMC8211166 DOI: 10.1371/journal.pone.0253307] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Accepted: 06/03/2021] [Indexed: 11/18/2022] Open
Abstract
In this paper, we explore mutual information based stock networks to build regular vine copula structure on high frequency log returns of stocks and use it for the estimation of Value at Risk (VaR) of a portfolio of stocks. Our model is a data driven model that learns from a high frequency time series data of log returns of top 50 stocks listed on the National Stock Exchange (NSE) in India for the year 2014. The Ljung-Box test revealed the presence of Autocorrelation as well as Heteroscedasticity in the underlying time series data. Analysing the goodness of fit of a number of variants of the GARCH model on each working day of the year 2014, that is, 229 days in all, it was observed that ARMA(1,1)-EGARCH(1,1) demonstrated the best fit. The joint probability distribution of the portfolio is computed by constructed an R-Vine copula structure on the data with the mutual information guided minimum spanning tree as the key building block. The joint PDF is then fed into the Monte-Carlo simulation procedure to compute the VaR. If we replace the mutual information by the Kendall's Tau in the construction of the R-Vine copula structure, the resulting VaR estimations were found to be inferior suggesting the presence of non-linear relationships among stock returns.
Collapse
Affiliation(s)
- Charu Sharma
- Department of Mathematics, Shiv Nadar University, Uttar Pradesh, India
- * E-mail:
| | - Niteesh Sahni
- Department of Mathematics, Shiv Nadar University, Uttar Pradesh, India
| |
Collapse
|
14
|
de Anda-Jáuregui G, Espinal-Enríquez J, Hernández-Lemus E. Highly connected, non-redundant microRNA functional control in breast cancer molecular subtypes. Interface Focus 2021; 11:20200073. [PMID: 34123357 PMCID: PMC8193465 DOI: 10.1098/rsfs.2020.0073] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/21/2021] [Indexed: 12/18/2022] Open
Abstract
Breast cancer is a complex, heterogeneous disease at the phenotypic and molecular level. In particular, the transcriptional regulatory programs are known to be significantly affected and such transcriptional alterations are able to capture some of the heterogeneity of the disease, leading to the emergence of breast cancer molecular subtypes. Recently, it has been found that network biology approaches to decipher such abnormal gene regulation programs, for instance by means of gene co-expression networks, have been able to recapitulate the differences between breast cancer subtypes providing elements to further understand their functional origins and consequences. Network biology approaches may be extended to include other co-expression patterns, like those found between genes and non-coding transcripts such as microRNAs (miRs). As is known, miRs play relevant roles in the establishment of normal and anomalous transcription processes. Commodore miRs (cdre-miRs) have been defined as miRs that, based on their connectivity and redundancy in co-expression networks, are potential control elements of biological functions. In this work, we reconstructed miR–gene co-expression networks for each breast cancer molecular subtype, from high throughput data in 424 samples from the Cancer Genome Atlas consortium. We identified cdre-miRs in three out of four molecular subtypes. We found that in each subtype, each cdre-miR was linked to a different set of associated genes, as well as a different set of associated biological functions. We used a systematic literature validation strategy, and identified that the associated biological functions to these cdre-miRs are hallmarks of cancer such as angiogenesis, cell adhesion, cell cycle and regulation of apoptosis. The relevance of such cdre-miRs as actionable molecular targets in breast cancer is still to be determined from functional studies.
Collapse
Affiliation(s)
- Guillermo de Anda-Jáuregui
- Computational Genomics, Instituto Nacional de Medicina Genómica, Mexico City, Mexico.,Cátedras CONACYT for Young Researchers, Consejo Nacional de Ciencia y Tecnología, Mexico City, Mexico.,Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics, Instituto Nacional de Medicina Genómica, Mexico City, Mexico.,Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics, Instituto Nacional de Medicina Genómica, Mexico City, Mexico.,Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
15
|
Abstract
The detection of causal interactions is of great importance when inferring complex ecosystem functional and structural networks for basic and applied research. Convergent cross mapping (CCM) based on nonlinear state-space reconstruction made substantial progress about network inference by measuring how well historical values of one variable can reliably estimate states of other variables. Here we investigate the ability of a developed optimal information flow (OIF) ecosystem model to infer bidirectional causality and compare that to CCM. Results from synthetic datasets generated by a simple predator-prey model, data of a real-world sardine-anchovy-temperature system and of a multispecies fish ecosystem highlight that the proposed OIF performs better than CCM to predict population and community patterns. Specifically, OIF provides a larger gradient of inferred interactions, higher point-value accuracy and smaller fluctuations of interactions and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha$$\end{document}α-diversity including their characteristic time delays. We propose an optimal threshold on inferred interactions that maximize accuracy in predicting fluctuations of effective \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\alpha$$\end{document}α-diversity, defined as the count of model-inferred interacting species. Overall OIF outperforms all other models in assessing predictive causality (also in terms of computational complexity) due to the explicit consideration of synchronization, divergence and diversity of events that define model sensitivity, uncertainty and complexity. Thus, OIF offers a broad ecological information by extracting predictive causal networks of complex ecosystems from time-series data in the space-time continuum. The accurate inference of species interactions at any biological scale of organization is highly valuable because it allows to predict biodiversity changes, for instance as a function of climate and other anthropogenic stressors. This has practical implications for defining optimal ecosystem management and design, such as fish stock prioritization and delineation of marine protected areas based on derived collective multispecies assembly. OIF can be applied to any complex system and used for model evaluation and design where causality should be considered as non-linear predictability of diverse events of populations or communities.
Collapse
|
16
|
|
17
|
Aziz F, Acharjee A, Williams JA, Russ D, Bravo-Merodio L, Gkoutos GV. Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference. Int J Mol Sci 2020; 21:E7886. [PMID: 33114263 PMCID: PMC7660606 DOI: 10.3390/ijms21217886] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/22/2020] [Accepted: 10/22/2020] [Indexed: 12/12/2022] Open
Abstract
Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments.
Collapse
Affiliation(s)
- Furqan Aziz
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK; (F.A.); (J.A.W.); (D.R.); (L.B.-M.); (G.V.G.)
- Institute of Translational Medicine, University of Birmingham, Birmingham B15 2TT, UK
| | - Animesh Acharjee
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK; (F.A.); (J.A.W.); (D.R.); (L.B.-M.); (G.V.G.)
- Institute of Translational Medicine, University of Birmingham, Birmingham B15 2TT, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham B15 2WB, UK
| | - John A. Williams
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK; (F.A.); (J.A.W.); (D.R.); (L.B.-M.); (G.V.G.)
- Institute of Translational Medicine, University of Birmingham, Birmingham B15 2TT, UK
- Medical Research Council Harwell Institute, Harwell Campus, Oxfordshire OX11 0RD, UK
| | - Dominic Russ
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK; (F.A.); (J.A.W.); (D.R.); (L.B.-M.); (G.V.G.)
- Institute of Translational Medicine, University of Birmingham, Birmingham B15 2TT, UK
| | - Laura Bravo-Merodio
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK; (F.A.); (J.A.W.); (D.R.); (L.B.-M.); (G.V.G.)
- Institute of Translational Medicine, University of Birmingham, Birmingham B15 2TT, UK
| | - Georgios V. Gkoutos
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham B15 2TT, UK; (F.A.); (J.A.W.); (D.R.); (L.B.-M.); (G.V.G.)
- Institute of Translational Medicine, University of Birmingham, Birmingham B15 2TT, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham B15 2WB, UK
- MRC Health Data Research UK (HDR UK), Midlands B15 2TT, UK
- NIHR Experimental Cancer Medicine Centre, Birmingham B15 2TT, UK
- NIHR Biomedical Research Centre, University Hospital Birmingham, Birmingham B15 2WB, UK
| |
Collapse
|
18
|
Young J, Dragoi V, Aazhang B. Precise measurement of correlations between frequency coupling and visual task performance. Sci Rep 2020; 10:17372. [PMID: 33060626 PMCID: PMC7566518 DOI: 10.1038/s41598-020-74057-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 09/25/2020] [Indexed: 11/27/2022] Open
Abstract
Functional connectivity analyses focused on frequency-domain relationships, i.e. frequency coupling, powerfully reveal neurophysiology. Coherence is commonly used but neural activity does not follow its Gaussian assumption. The recently introduced mutual information in frequency (MIF) technique makes no model assumptions and measures non-Gaussian and nonlinear relationships. We develop a powerful MIF estimator optimized for correlating frequency coupling with task performance and other relevant task phenomena. In light of variance reduction afforded by multitaper spectral estimation, which is critical to precisely measuring such correlations, we propose a multitaper approach for MIF and compare its performance with coherence in simulations. Additionally, multitaper MIF and coherence are computed between macaque visual cortical recordings and their correlation with task performance is analyzed. Our multitaper MIF estimator produces low variance and performs better than all other estimators in simulated correlation analyses. Simulations further suggest that multitaper MIF captures more information than coherence. For the macaque data set, coherence and our new MIF estimator largely agree. Overall, we provide a new way to precisely estimate frequency coupling that sheds light on task performance and helps neuroscientists accurately capture correlations between coupling and task phenomena in general. Additionally, we make an MIF toolbox available for the first time.
Collapse
Affiliation(s)
- Joseph Young
- Electrical and Computer Engineering, Rice University, Houston, 77005, USA.
| | - Valentin Dragoi
- Neurobiology and Anatomy, University of Texas John P and Katherine G McGovern Medical School, Houston, 77030, USA
| | - Behnaam Aazhang
- Electrical and Computer Engineering, Rice University, Houston, 77005, USA
| |
Collapse
|
19
|
Delgado-Chaves FM, Gómez-Vela F, Divina F, García-Torres M, Rodriguez-Baena DS. Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks. Genes (Basel) 2020; 11:E831. [PMID: 32708319 PMCID: PMC7397019 DOI: 10.3390/genes11070831] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 06/26/2020] [Accepted: 07/13/2020] [Indexed: 12/21/2022] Open
Abstract
Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E Δ H S C compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E Δ H S C mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E Δ H S C mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches.
Collapse
|
20
|
Development of Stock Networks Using Part Mutual Information and Australian Stock Market Data. ENTROPY 2020; 22:e22070773. [PMID: 33286545 PMCID: PMC7517323 DOI: 10.3390/e22070773] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 07/12/2020] [Accepted: 07/13/2020] [Indexed: 01/07/2023]
Abstract
Complex network is a powerful tool to discover important information from various types of big data. Although substantial studies have been conducted for the development of stock relation networks, correlation coefficient is dominantly used to measure the relationship between stock pairs. Information theory is much less discussed for this important topic, though mutual information is able to measure nonlinear pairwise relationship. In this work we propose to use part mutual information for developing stock networks. The path-consistency algorithm is used to filter out redundant relationships. Using the Australian stock market data, we develop four stock relation networks using different orders of part mutual information. Compared with the widely used planar maximally filtered graph (PMFG), we can generate networks with cliques of large size. In addition, the large cliques show consistency with the structure of industrial sectors. We also analyze the connectivity and degree distributions of the generated networks. Analysis results suggest that the proposed method is an effective approach to develop stock relation networks using information theory.
Collapse
|
21
|
Espinoza JL, Shah N, Singh S, Nelson KE, Dupont CL. Applications of weighted association networks applied to compositional data in biology. Environ Microbiol 2020; 22:3020-3038. [PMID: 32436334 DOI: 10.1111/1462-2920.15091] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 05/15/2020] [Accepted: 05/18/2020] [Indexed: 12/14/2022]
Abstract
Next-generation sequencing technologies have generated, and continue to produce, an increasingly large corpus of biological data. The data generated are inherently compositional as they convey only relative information dependent upon the capacity of the instrument, experimental design and technical bias. There is considerable information to be gained through network analysis by studying the interactions between components within a system. Network theory methods using compositional data are powerful approaches for quantifying relationships between biological components and their relevance to phenotype, environmental conditions or other external variables. However, many of the statistical assumptions used for network analysis are not designed for compositional data and can bias downstream results. In this mini-review, we illustrate the utility of network theory in biological systems and investigate modern techniques while introducing researchers to frameworks for implementation. We overview (1) compositional data analysis, (2) data transformations and (3) network theory along with insight on a battery of network types including static-, temporal-, sample-specific- and differential-networks. The intention of this mini-review is not to provide a comprehensive overview of network methods, rather to introduce microbiology researchers to (semi)-unsupervised data-driven approaches for inferring latent structures that may give insight into biological phenomena or abstract mechanics of complex systems.
Collapse
Affiliation(s)
- Josh L Espinoza
- J. Craig Venter Institute, La Jolla, USA.,Applied Sciences, Durban University of Technology, Durban, South Africa
| | | | - Suren Singh
- Applied Sciences, Durban University of Technology, Durban, South Africa
| | - Karen E Nelson
- J. Craig Venter Institute, La Jolla, USA.,Applied Sciences, Durban University of Technology, Durban, South Africa.,J. Craig Venter Institute, Rockville, USA
| | | |
Collapse
|
22
|
Law J, Ng K, Windram OPF. The Phenotype Paradox: Lessons From Natural Transcriptome Evolution on How to Engineer Plants. FRONTIERS IN PLANT SCIENCE 2020; 11:75. [PMID: 32133018 PMCID: PMC7040092 DOI: 10.3389/fpls.2020.00075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 01/20/2020] [Indexed: 06/10/2023]
Abstract
Plants have evolved genome complexity through iterative rounds of single gene and whole genome duplication. This has led to substantial expansion in transcription factor numbers following preferential retention and subsequent functional divergence of these regulatory genes. Here we review how this simple evolutionary network rewiring process, regulatory gene duplication followed by functional divergence, can be used to inspire synthetic biology approaches that seek to develop novel phenotypic variation for future trait based breeding programs in plants.
Collapse
Affiliation(s)
- Justin Law
- Grand Challenges in Ecosystems and the Environment, Imperial College London, Ascot, United Kingdom
| | - Kangbo Ng
- The Francis Crick Institute, London, United Kingdom
- Institute for the Physics of Living Systems, University College London, London, United Kingdom
| | - Oliver P. F. Windram
- Grand Challenges in Ecosystems and the Environment, Imperial College London, Ascot, United Kingdom
| |
Collapse
|
23
|
Lin C, Ding J, Bar-Joseph Z. Inferring TF activation order in time series scRNA-Seq studies. PLoS Comput Biol 2020; 16:e1007644. [PMID: 32069291 PMCID: PMC7048296 DOI: 10.1371/journal.pcbi.1007644] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 02/28/2020] [Accepted: 01/09/2020] [Indexed: 12/11/2022] Open
Abstract
Methods for the analysis of time series single cell expression data (scRNA-Seq) either do not utilize information about transcription factors (TFs) and their targets or only study these as a post-processing step. Using such information can both, improve the accuracy of the reconstructed model and cell assignments, while at the same time provide information on how and when the process is regulated. We developed the Continuous-State Hidden Markov Models TF (CSHMM-TF) method which integrates probabilistic modeling of scRNA-Seq data with the ability to assign TFs to specific activation points in the model. TFs are assumed to influence the emission probabilities for cells assigned to later time points allowing us to identify not just the TFs controlling each path but also their order of activation. We tested CSHMM-TF on several mouse and human datasets. As we show, the method was able to identify known and novel TFs for all processes, assigned time of activation agrees with both expression information and prior knowledge and combinatorial predictions are supported by known interactions. We also show that CSHMM-TF improves upon prior methods that do not utilize TF-gene interaction. An important attribute of time series single cell RNA-Seq (scRNA-Seq) data, is the ability to infer continuous trajectories of genes based on orderings of the cells. While several methods have been developed for ordering cells and inferring such trajectories, to date it was not possible to use these to infer the temporal activity of several key TFs. These TFs are are only post-transcriptionally regulated and so their expression does not provide complete information on their activity. To address this we developed the Continuous-State Hidden Markov Models TF (CSHMM-TF) methods that assigns continuous activation time to TFs based on both, their expression and the expression of their targets. Applying our method to several time series scRNA-Seq datasets we show that it correctly identifies the key regulators for the processes being studied. We analyze the temporal assignments for these TFs and show that they provide new insights about combinatorial regulation and the ordering of TF activation. We used several complementary sources to validate some of these predictions and discuss a number of other novel suggestions based on the method. As we show, the method is able to scale to large and noisy datasets and so is appropriate for several studies utilizing time series scRNA-Seq data.
Collapse
Affiliation(s)
- Chieh Lin
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Jun Ding
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
24
|
García RA, Martí AC, Cabeza C, Rubido N. Small-worldness favours network inference in synthetic neural networks. Sci Rep 2020; 10:2296. [PMID: 32042036 PMCID: PMC7010800 DOI: 10.1038/s41598-020-59198-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 11/28/2019] [Indexed: 12/15/2022] Open
Abstract
A main goal in the analysis of a complex system is to infer its underlying network structure from time-series observations of its behaviour. The inference process is often done by using bi-variate similarity measures, such as the cross-correlation (CC) or mutual information (MI), however, the main factors favouring or hindering its success are still puzzling. Here, we use synthetic neuron models in order to reveal the main topological properties that frustrate or facilitate inferring the underlying network from CC measurements. Specifically, we use pulse-coupled Izhikevich neurons connected as in the Caenorhabditis elegans neural networks as well as in networks with similar randomness and small-worldness. We analyse the effectiveness and robustness of the inference process under different observations and collective dynamics, contrasting the results obtained from using membrane potentials and inter-spike interval time-series. We find that overall, small-worldness favours network inference and degree heterogeneity hinders it. In particular, success rates in C. elegans networks – that combine small-world properties with degree heterogeneity – are closer to success rates in Erdös-Rényi network models rather than those in Watts-Strogatz network models. These results are relevant to understand better the relationship between topological properties and function in different neural networks.
Collapse
Affiliation(s)
- Rodrigo A García
- Universidad de la República, Instituto de Física de Facultad de Ciencias, Montevideo, 11400, Uruguay.
| | - Arturo C Martí
- Universidad de la República, Instituto de Física de Facultad de Ciencias, Montevideo, 11400, Uruguay
| | - Cecilia Cabeza
- Universidad de la República, Instituto de Física de Facultad de Ciencias, Montevideo, 11400, Uruguay
| | - Nicolás Rubido
- Universidad de la República, Instituto de Física de Facultad de Ciencias, Montevideo, 11400, Uruguay
| |
Collapse
|
25
|
Muldoon JJ, Yu JS, Fassia MK, Bagheri N. Network inference performance complexity: a consequence of topological, experimental and algorithmic determinants. Bioinformatics 2019; 35:3421-3432. [PMID: 30932143 PMCID: PMC6748731 DOI: 10.1093/bioinformatics/btz105] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 01/24/2019] [Accepted: 02/11/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Network inference algorithms aim to uncover key regulatory interactions governing cellular decision-making, disease progression and therapeutic interventions. Having an accurate blueprint of this regulation is essential for understanding and controlling cell behavior. However, the utility and impact of these approaches are limited because the ways in which various factors shape inference outcomes remain largely unknown. RESULTS We identify and systematically evaluate determinants of performance-including network properties, experimental design choices and data processing-by developing new metrics that quantify confidence across algorithms in comparable terms. We conducted a multifactorial analysis that demonstrates how stimulus target, regulatory kinetics, induction and resolution dynamics, and noise differentially impact widely used algorithms in significant and previously unrecognized ways. The results show how even if high-quality data are paired with high-performing algorithms, inferred models are sometimes susceptible to giving misleading conclusions. Lastly, we validate these findings and the utility of the confidence metrics using realistic in silico gene regulatory networks. This new characterization approach provides a way to more rigorously interpret how algorithms infer regulation from biological datasets. AVAILABILITY AND IMPLEMENTATION Code is available at http://github.com/bagherilab/networkinference/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joseph J Muldoon
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
| | - Jessica S Yu
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
| | - Mohammad-Kasim Fassia
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Department of Biomedical Engineering, Northwestern University, Evanston, IL, USA
| | - Neda Bagheri
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL, USA
- Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
| |
Collapse
|
26
|
Sharma C, Habib A. Mutual information based stock networks and portfolio selection for intraday traders using high frequency data: An Indian market case study. PLoS One 2019; 14:e0221910. [PMID: 31465507 PMCID: PMC6715228 DOI: 10.1371/journal.pone.0221910] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 08/16/2019] [Indexed: 11/18/2022] Open
Abstract
In this paper, we explore the problem of establishing a network among the stocks of a market at high frequency level and give an application to program trading. Our work uses high frequency data from the National Stock Exchange, India, for the year 2014. To begin, we analyse the spectrum of the correlation matrix to establish the presence of linear relations amongst the stock returns. A comparison of correlations with pairwise mutual information shows the further existence of non-linear relations which are not captured by correlation. We also see that the non-linear relations are more pronounced at the high frequency level in comparison to the daily returns used in earlier work. We provide two applications of this approach. First, we construct minimal spanning trees for the stock network based on mutual information and study their topology. The year 2014 saw the conduct of general elections in India and the data allows us to explore their impact on aspects of the network, such as the scale-free property and sectorial clusters. Second, having established the presence of non-linear relations, we would like to be able to exploit them. Previous authors have suggested that peripheral stocks in the network would make good proxies for the Markowitz portfolio but with a much smaller number of stocks. We show that peripheral stocks selected using mutual information perform significantly better than ones selected using correlation.
Collapse
Affiliation(s)
- Charu Sharma
- Department of Mathematics, Shiv Nadar University, Gautam Buddha Nagar, Uttar Pradesh, India
- * E-mail:
| | - Amber Habib
- Department of Mathematics, Shiv Nadar University, Gautam Buddha Nagar, Uttar Pradesh, India
| |
Collapse
|
27
|
Burbano Lombana DA, Freeman RA, Lynch KM. Discovering the topology of complex networks via adaptive estimators. CHAOS (WOODBURY, N.Y.) 2019; 29:083121. [PMID: 31472515 DOI: 10.1063/1.5088657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Accepted: 07/24/2019] [Indexed: 06/10/2023]
Abstract
Behind any complex system in nature or engineering, there is an intricate network of interconnections that is often unknown. Using a control-theoretical approach, we study the problem of network reconstruction (NR): inferring both the network structure and the coupling weights based on measurements of each node's activity. We derive two new methods for NR, a low-complexity reduced-order estimator (which projects each node's dynamics to a one-dimensional space) and a full-order estimator for cases where a reduced-order estimator is not applicable. We prove their convergence to the correct network structure using Lyapunov-like theorems and persistency of excitation. Importantly, these estimators apply to systems with partial state measurements, a broad class of node dynamics and internode coupling functions, and in the case of the reduced-order estimator, node dynamics and internode coupling functions that are not fully known. The effectiveness of the estimators is illustrated using both numerical and experimental results on networks of chaotic oscillators.
Collapse
Affiliation(s)
| | - Randy A Freeman
- Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, Illinois 60208, USA
| | - Kevin M Lynch
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, USA
| |
Collapse
|
28
|
Yan J, Risacher SL, Shen L, Saykin AJ. Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Brief Bioinform 2019; 19:1370-1381. [PMID: 28679163 DOI: 10.1093/bib/bbx066] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Indexed: 11/14/2022] Open
Abstract
In the past decade, significant progress has been made in complex disease research across multiple omics layers from genome, transcriptome and proteome to metabolome. There is an increasing awareness of the importance of biological interconnections, and much success has been achieved using systems biology approaches. However, because of the typical focus on one single omics layer at a time, existing systems biology findings explain only a modest portion of complex disease. Recent advances in multi-omics data collection and sharing present us new opportunities for studying complex diseases in a more comprehensive fashion, and yet simultaneously create new challenges considering the unprecedented data dimensionality and diversity. Here, our goal is to review extant and emerging network approaches that can be applied across multiple biological layers to facilitate a more comprehensive and integrative multilayered omics analysis of complex diseases.
Collapse
Affiliation(s)
- Jingwen Yan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, USA
| |
Collapse
|
29
|
Chan TE, Stumpf MPH, Babtie AC. Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures. Cell Syst 2019; 5:251-267.e3. [PMID: 28957658 PMCID: PMC5624513 DOI: 10.1016/j.cels.2017.08.014] [Citation(s) in RCA: 304] [Impact Index Per Article: 50.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 04/26/2017] [Accepted: 08/24/2017] [Indexed: 12/03/2022]
Abstract
While single-cell gene expression experiments present new challenges for data processing, the cell-to-cell variability observed also reveals statistical relationships that can be used by information theory. Here, we use multivariate information theory to explore the statistical dependencies between triplets of genes in single-cell gene expression datasets. We develop PIDC, a fast, efficient algorithm that uses partial information decomposition (PID) to identify regulatory relationships between genes. We thoroughly evaluate the performance of our algorithm and demonstrate that the higher-order information captured by PIDC allows it to outperform pairwise mutual information-based algorithms when recovering true relationships present in simulated data. We also infer gene regulatory networks from three experimental single-cell datasets and illustrate how network context, choices made during analysis, and sources of variability affect network inference. PIDC tutorials and open-source software for estimating PID are available. PIDC should facilitate the identification of putative functional relationships and mechanistic hypotheses from single-cell transcriptomic data. PIDC infers gene regulatory networks from single-cell transcriptomic data Multivariate information measures and context in PIDC improve network inference Heterogeneity in single-cell data carries information about gene-gene interactions Fast, efficient, open-source software is made freely available
Collapse
Affiliation(s)
- Thalia E Chan
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Michael P H Stumpf
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK; MRC London Institute of Medical Sciences, Hammersmith Campus, Imperial College London, London W12 0NN, UK.
| | - Ann C Babtie
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
30
|
Loskot P, Atitey K, Mihaylova L. Comprehensive Review of Models and Methods for Inferences in Bio-Chemical Reaction Networks. Front Genet 2019; 10:549. [PMID: 31258548 PMCID: PMC6588029 DOI: 10.3389/fgene.2019.00549] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 05/24/2019] [Indexed: 01/30/2023] Open
Abstract
The key processes in biological and chemical systems are described by networks of chemical reactions. From molecular biology to biotechnology applications, computational models of reaction networks are used extensively to elucidate their non-linear dynamics. The model dynamics are crucially dependent on the parameter values which are often estimated from observations. Over the past decade, the interest in parameter and state estimation in models of (bio-) chemical reaction networks (BRNs) grew considerably. The related inference problems are also encountered in many other tasks including model calibration, discrimination, identifiability, and checking, and optimum experiment design, sensitivity analysis, and bifurcation analysis. The aim of this review paper is to examine the developments in literature to understand what BRN models are commonly used, and for what inference tasks and inference methods. The initial collection of about 700 documents concerning estimation problems in BRNs excluding books and textbooks in computational biology and chemistry were screened to select over 270 research papers and 20 graduate research theses. The paper selection was facilitated by text mining scripts to automate the search for relevant keywords and terms. The outcomes are presented in tables revealing the levels of interest in different inference tasks and methods for given models in the literature as well as the research trends are uncovered. Our findings indicate that many combinations of models, tasks and methods are still relatively unexplored, and there are many new research opportunities to explore combinations that have not been considered-perhaps for good reasons. The most common models of BRNs in literature involve differential equations, Markov processes, mass action kinetics, and state space representations whereas the most common tasks are the parameter inference and model identification. The most common methods in literature are Bayesian analysis, Monte Carlo sampling strategies, and model fitting to data using evolutionary algorithms. The new research problems which cannot be directly deduced from the text mining data are also discussed.
Collapse
Affiliation(s)
- Pavel Loskot
- College of Engineering, Swansea University, Swansea, United Kingdom
| | - Komlan Atitey
- College of Engineering, Swansea University, Swansea, United Kingdom
| | - Lyudmila Mihaylova
- Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
31
|
Kizhakkethil Youseph AS, Chetty M, Karmakar G. Reverse engineering genetic networks using nonlinear saturation kinetics. Biosystems 2019; 182:30-41. [PMID: 31185246 DOI: 10.1016/j.biosystems.2019.103977] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/25/2019] [Accepted: 05/27/2019] [Indexed: 01/01/2023]
Abstract
A gene regulatory network (GRN) represents a set of genes along with their regulatory interactions. Cellular behavior is driven by genetic level interactions. Dynamics of such systems show nonlinear saturation kinetics which can be best modeled by Michaelis-Menten (MM) and Hill equations. Although MM equation is being widely used for modeling biochemical processes, it has been applied rarely for reverse engineering GRNs. In this paper, we develop a complete framework for a novel model for GRN inference using MM kinetics. A set of coupled equations is first proposed for modeling GRNs. In the coupled model, Michaelis-Menten constant associated with regulation by a gene is made invariant irrespective of the gene being regulated. The parameter estimation of the proposed model is carried out using an evolutionary optimization method, namely, trigonometric differential evolution (TDE). Subsequently, the model is further improved and the regulations of different genes by a given gene are made distinct by allowing varying values of Michaelis-Menten constants for each regulation. Apart from making the model more relevant biologically, the improvement results in a decoupled GRN model with fast estimation of model parameters. Further, to enhance exploitation of the search, we propose a local search algorithm based on hill climbing heuristics. A novel mutation operation is also proposed to avoid population stagnation and premature convergence. Real life benchmark data sets generated in vivo are used for validating the proposed model. Further, we also analyze realistic in silico datasets generated using GeneNetweaver. The comparison of the performance of proposed model with other existing methods shows the potential of the proposed model.
Collapse
Affiliation(s)
| | - Madhu Chetty
- School of Science, Engineering and Information Technology, Federation University Australia, Gippsland 3842, Australia
| | - Gour Karmakar
- School of Science, Engineering and Information Technology, Federation University Australia, Gippsland 3842, Australia
| |
Collapse
|
32
|
Optimal Microbiome Networks: Macroecology and Criticality. ENTROPY 2019; 21:e21050506. [PMID: 33267220 PMCID: PMC7514995 DOI: 10.3390/e21050506] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 05/04/2019] [Accepted: 05/13/2019] [Indexed: 12/11/2022]
Abstract
The human microbiome is an extremely complex ecosystem considering the number of bacterial species, their interactions, and its variability over space and time. Here, we untangle the complexity of the human microbiome for the Irritable Bowel Syndrome (IBS) that is the most prevalent functional gastrointestinal disorder in human populations. Based on a novel information theoretic network inference model, we detected potential species interaction networks that are functionally and structurally different for healthy and unhealthy individuals. Healthy networks are characterized by a neutral symmetrical pattern of species interactions and scale-free topology versus random unhealthy networks. We detected an inverse scaling relationship between species total outgoing information flow, meaningful of node interactivity, and relative species abundance (RSA). The top ten interacting species are also the least relatively abundant for the healthy microbiome and the most detrimental. These findings support the idea about the diminishing role of network hubs and how these should be defined considering the total outgoing information flow rather than the node degree. Macroecologically, the healthy microbiome is characterized by the highest Pareto total species diversity growth rate, the lowest species turnover, and the smallest variability of RSA for all species. This result challenges current views that posit a universal association between healthy states and the highest absolute species diversity in ecosystems. Additionally, we show how the transitory microbiome is unstable and microbiome criticality is not necessarily at the phase transition between healthy and unhealthy states. We stress the importance of considering portfolios of interacting pairs versus single node dynamics when characterizing the microbiome and of ranking these pairs in terms of their interactions (i.e., species collective behavior) that shape transition from healthy to unhealthy states. The macroecological characterization of the microbiome is useful for public health and disease diagnosis and etiognosis, while species-specific analyses can detect beneficial species leading to personalized design of pre- and probiotic treatments and microbiome engineering.
Collapse
|
33
|
González-Serrano L, Talón-Ballestero P, Muñoz-Romero S, Soguero-Ruiz C, Rojo-Álvarez JL. Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management. ENTROPY 2019; 21:e21040419. [PMID: 33267133 PMCID: PMC7514908 DOI: 10.3390/e21040419] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 04/07/2019] [Accepted: 04/17/2019] [Indexed: 12/12/2022]
Abstract
Customer Relationship Management (CRM) is a fundamental tool in the hospitality industry nowadays, which can be seen as a big-data scenario due to the large amount of recordings which are annually handled by managers. Data quality is crucial for the success of these systems, and one of the main issues to be solved by businesses in general and by hospitality businesses in particular in this setting is the identification of duplicated customers, which has not received much attention in recent literature, probably and partly because it is not an easy-to-state problem in statistical terms. In the present work, we address the problem statement of duplicated customer identification as a large-scale data analysis, and we propose and benchmark a general-purpose solution for it. Our system consists of four basic elements: (a) A generic feature representation for the customer fields in a simple table-shape database; (b) An efficient distance for comparison among feature values, in terms of the Wagner-Fischer algorithm to calculate the Levenshtein distance; (c) A big-data implementation using basic map-reduce techniques to readily support the comparison of strategies; (d) An X-from-M criterion to identify those possible neighbors to a duplicated-customer candidate. We analyze the mass density function of the distances in the CRM text-based fields and characterized their behavior and consistency in terms of the entropy and of the mutual information for these fields. Our experiments in a large CRM from a multinational hospitality chain show that the distance distributions are statistically consistent for each feature, and that neighbourhood thresholds are automatically adjusted by the system at a first step and they can be subsequently more-finely tuned according to the manager experience. The entropy distributions for the different variables, as well as the mutual information between pairs, are characterized by multimodal profiles, where a wide gap between close and far fields is often present. This motivates the proposal of the so-called X-from-M strategy, which is shown to be computationally affordable, and can provide the expert with a reduced number of duplicated candidates to supervise, with low X values being enough to warrant the sensitivity required at the automatic detection stage. The proposed system again encourages and supports the benefits of big-data technologies in CRM scenarios for hotel chains, and rather than the use of ad-hoc heuristic rules, it promotes the research and development of theoretically principled approaches.
Collapse
Affiliation(s)
| | - Pilar Talón-Ballestero
- Department of Business and Management, Rey Juan Carlos University, 28943 Madrid, Spain
- Correspondence: ; Tel.: +34-91-488-7315
| | - Sergio Muñoz-Romero
- Department of Business and Management, Rey Juan Carlos University, 28943 Madrid, Spain
- Department of Theory and Comunications, Telematics and Computing Systems, Rey Juan Carlos University, 28943 Madrid, Spain
| | - Cristina Soguero-Ruiz
- Department of Business and Management, Rey Juan Carlos University, 28943 Madrid, Spain
| | - José Luis Rojo-Álvarez
- Department of Business and Management, Rey Juan Carlos University, 28943 Madrid, Spain
- Department of Theory and Comunications, Telematics and Computing Systems, Rey Juan Carlos University, 28943 Madrid, Spain
| |
Collapse
|
34
|
Chan TE, Stumpf MPH, Babtie AC. Gene Regulatory Networks from Single Cell Data for Exploring Cell Fate Decisions. Methods Mol Biol 2019; 1975:211-238. [PMID: 31062312 DOI: 10.1007/978-1-4939-9224-9_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Single cell experimental techniques now allow us to quantify gene expression in up to thousands of individual cells. These data reveal the changes in transcriptional state that occur as cells progress through development and adopt specialized cell fates. In this chapter we describe in detail how to use our network inference algorithm (PIDC)-and the associated software package NetworkInference.jl-to infer functional interactions between genes from the observed gene expression patterns. We exploit the large sample sizes and inherent variability of single cell data to detect statistical dependencies between genes that indicate putative (co-)regulatory relationships, using multivariate information measures that can capture complex statistical relationships. We provide guidelines on how best to combine this analysis with other complementary methods designed to explore single cell data, and how to interpret the resulting gene regulatory network models to gain insight into the processes regulating cell differentiation.
Collapse
Affiliation(s)
- Thalia E Chan
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK
| | - Michael P H Stumpf
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK
| | - Ann C Babtie
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK.
| |
Collapse
|
35
|
Manners HN, Roy S, Kalita JK. Intrinsic-overlapping co-expression module detection with application to Alzheimer's Disease. Comput Biol Chem 2018; 77:373-389. [PMID: 30466046 DOI: 10.1016/j.compbiolchem.2018.10.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 10/28/2018] [Accepted: 10/29/2018] [Indexed: 11/18/2022]
Abstract
Genes interact with each other and may cause perturbation in the molecular pathways leading to complex diseases. Often, instead of any single gene, a subset of genes interact, forming a network, to share common biological functions. Such a subnetwork is called a functional module or motif. Identifying such modules and central key genes in them, that may be responsible for a disease, may help design patient-specific drugs. In this study, we consider the neurodegenerative Alzheimer's Disease (AD) and identify potentially responsible genes from functional motif analysis. We start from the hypothesis that central genes in genetic modules are more relevant to a disease that is under investigation and identify hub genes from the modules as potential marker genes. Motifs or modules are often non-exclusive or overlapping in nature. Moreover, they sometimes show intrinsic or hierarchical distributions with overlapping functional roles. To the best of our knowledge, no prior work handles both the situations in an integrated way. We propose a non-exclusive clustering approach, CluViaN (Clustering Via Network) that can detect intrinsic as well as overlapping modules from gene co-expression networks constructed using microarray expression profiles. We compare our method with existing methods to evaluate the quality of modules extracted. CluViaN reports the presence of intrinsic and overlapping motifs in different species not reported by any other research. We further apply our method to extract significant AD specific modules using CluViaN and rank them based the number of genes from a module involved in the disease pathways. Finally, top central genes are identified by topological analysis of the modules. We use two different AD phenotype data for experimentation. We observe that central genes, namely PSEN1, APP, NDUFB2, NDUFA1, UQCR10, PPP3R1 and a few more, play significant roles in the AD. Interestingly, our experiments also find a hub gene, PML, which has recently been reported to play a role in plasticity, circadian rhythms and the response to proteins which can cause neurodegenerative disorders. MUC4, another hub gene that we find experimentally is yet to be investigated for its potential role in AD. A software implementation of CluViaN in Java is available for download at https://sites.google.com/site/swarupnehu/publications/resources/CluViaN Software.rar.
Collapse
Affiliation(s)
- Hazel Nicolette Manners
- Department of Information Technology, North Eastern Hill University, Shillong, Meghalaya, India.
| | - Swarup Roy
- Department of Computer Applications, Sikkim University, Gangtok, Sikkim, India; Department of Information Technology, North Eastern Hill University, Shillong, Meghalaya, India.
| | - Jugal K Kalita
- Department of Computer Science, University of Colorado, Colorado Springs, USA.
| |
Collapse
|
36
|
Barbosa S, Niebel B, Wolf S, Mauch K, Takors R. A guide to gene regulatory network inference for obtaining predictive solutions: Underlying assumptions and fundamental biological and data constraints. Biosystems 2018; 174:37-48. [PMID: 30312740 DOI: 10.1016/j.biosystems.2018.10.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 10/05/2018] [Accepted: 10/08/2018] [Indexed: 02/07/2023]
Abstract
The study of biological systems at a system level has become a reality due to the increasing powerful computational approaches able to handle increasingly larger datasets. Uncovering the dynamic nature of gene regulatory networks in order to attain a system level understanding and improve the predictive power of biological models is an important research field in systems biology. The task itself presents several challenges, since the problem is of combinatorial nature and highly depends on several biological constraints and also the intended application. Given the intrinsic interdisciplinary nature of gene regulatory network inference, we present a review on the currently available approaches, their challenges and limitations. We propose guidelines to select the most appropriate method considering the underlying assumptions and fundamental biological and data constraints.
Collapse
Affiliation(s)
- Sara Barbosa
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany.
| | - Bastian Niebel
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Sebastian Wolf
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Klaus Mauch
- Insilico Biotechnology AG, Meitnerstrasse 9, 70563 Stuttgart, Germany
| | - Ralf Takors
- Institute of Biochemical Engineering, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| |
Collapse
|
37
|
Villaverde AF, Becker K, Banga JR. PREMER: A Tool to Infer Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1193-1202. [PMID: 28981423 DOI: 10.1109/tcbb.2017.2758786] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features-such as distinguishing between direct and indirect interactions or determining the direction of a causal link-requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end, we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux, and OSX (https://sites.google.com/site/premertoolbox/).
Collapse
|
38
|
Development of stock correlation networks using mutual information and financial big data. PLoS One 2018; 13:e0195941. [PMID: 29668715 PMCID: PMC5905993 DOI: 10.1371/journal.pone.0195941] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 03/18/2018] [Indexed: 11/19/2022] Open
Abstract
Stock correlation networks use stock price data to explore the relationship between different stocks listed in the stock market. Currently this relationship is dominantly measured by the Pearson correlation coefficient. However, financial data suggest that nonlinear relationships may exist in the stock prices of different shares. To address this issue, this work uses mutual information to characterize the nonlinear relationship between stocks. Using 280 stocks traded at the Shanghai Stocks Exchange in China during the period of 2014-2016, we first compare the effectiveness of the correlation coefficient and mutual information for measuring stock relationships. Based on these two measures, we then develop two stock networks using the Minimum Spanning Tree method and study the topological properties of these networks, including degree, path length and the power-law distribution. The relationship network based on mutual information has a better distribution of the degree and larger value of the power-law distribution than those using the correlation coefficient. Numerical results show that mutual information is a more effective approach than the correlation coefficient to measure the stock relationship in a stock market that may undergo large fluctuations of stock prices.
Collapse
|
39
|
Servadio JL, Convertino M. Optimal information networks: Application for data-driven integrated health in populations. SCIENCE ADVANCES 2018; 4:e1701088. [PMID: 29423440 PMCID: PMC5804584 DOI: 10.1126/sciadv.1701088] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 01/05/2018] [Indexed: 05/30/2023]
Abstract
Development of composite indicators for integrated health in populations typically relies on a priori assumptions rather than model-free, data-driven evidence. Traditional variable selection processes tend not to consider relatedness and redundancy among variables, instead considering only individual correlations. In addition, a unified method for assessing integrated health statuses of populations is lacking, making systematic comparison among populations impossible. We propose the use of maximum entropy networks (MENets) that use transfer entropy to assess interrelatedness among selected variables considered for inclusion in a composite indicator. We also define optimal information networks (OINs) that are scale-invariant MENets, which use the information in constructed networks for optimal decision-making. Health outcome data from multiple cities in the United States are applied to this method to create a systemic health indicator, representing integrated health in a city.
Collapse
Affiliation(s)
- Joseph L. Servadio
- Division of Environmental Health Sciences, HumNat Lab, University of Minnesota School of Public Health, Minneapolis, MN 55455, USA
| | - Matteo Convertino
- Complexity Group, Information Communication Networks Lab, Division of Frontier Science and Media and Network Technologies, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
- Global Institution for Collaborative Research and Education (Gi-CoRE) Station for Big Data and Cybersecurity, Hokkaido University, Sapporo, Japan
- Department of Electronics and Information Engineering, Faculty of Engineering, Hokkaido University, Sapporo, Japan
| |
Collapse
|
40
|
|
41
|
|
42
|
Coker EA, Mitsopoulos C, Workman P, Al-Lazikani B. SiGNet: A signaling network data simulator to enable signaling network inference. PLoS One 2017; 12:e0177701. [PMID: 28545060 PMCID: PMC5435248 DOI: 10.1371/journal.pone.0177701] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 05/02/2017] [Indexed: 12/22/2022] Open
Abstract
Network models are widely used to describe complex signaling systems. Cellular wiring varies in different cellular contexts and numerous inference techniques have been developed to infer the structure of a network from experimental data of the network's behavior. To objectively identify which inference strategy is best suited to a specific network, a gold standard network and dataset are required. However, suitable datasets for benchmarking are difficult to find. Numerous tools exist that can simulate data for transcriptional networks, but these are of limited use for the study of signaling networks. Here, we describe SiGNet (Signal Generator for Networks): a Cytoscape app that simulates experimental data for a signaling network of known structure. SiGNet has been developed and tested against published experimental data, incorporating information on network architecture, and the directionality and strength of interactions to create biological data in silico. SiGNet is the first tool to simulate biological signaling data, enabling an accurate and systematic assessment of inference strategies. SiGNet can also be used to produce preliminary models of key biological pathways following perturbation.
Collapse
Affiliation(s)
- Elizabeth A. Coker
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Costas Mitsopoulos
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Paul Workman
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Bissan Al-Lazikani
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| |
Collapse
|
43
|
Kim SJ, Ka S, Ha JW, Kim J, Yoo D, Kim K, Lee HK, Lim D, Cho S, Hanotte O, Mwai OA, Dessie T, Kemp S, Oh SJ, Kim H. Cattle genome-wide analysis reveals genetic signatures in trypanotolerant N'Dama. BMC Genomics 2017; 18:371. [PMID: 28499406 PMCID: PMC5427609 DOI: 10.1186/s12864-017-3742-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/27/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Indigenous cattle in Africa have adapted to various local environments to acquire superior phenotypes that enhance their survival under harsh conditions. While many studies investigated the adaptation of overall African cattle, genetic characteristics of each breed have been poorly studied. RESULTS We performed the comparative genome-wide analysis to assess evidence for subspeciation within species at the genetic level in trypanotolerant N'Dama cattle. We analysed genetic variation patterns in N'Dama from the genomes of 101 cattle breeds including 48 samples of five indigenous African cattle breeds and 53 samples of various commercial breeds. Analysis of SNP variances between cattle breeds using wMI, XP-CLR, and XP-EHH detected genes containing N'Dama-specific genetic variants and their potential associations. Functional annotation analysis revealed that these genes are associated with ossification, neurological and immune system. Particularly, the genes involved in bone formation indicate that local adaptation of N'Dama may engage in skeletal growth as well as immune systems. CONCLUSIONS Our results imply that N'Dama might have acquired distinct genotypes associated with growth and regulation of regional diseases including trypanosomiasis. Moreover, this study offers significant insights into identifying genetic signatures for natural and artificial selection of diverse African cattle breeds.
Collapse
Affiliation(s)
- Soo-Jin Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Republic of Korea.,C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea
| | - Sojeong Ka
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jung-Woo Ha
- Clova, NAVER Corp., Seongnam, 13561, Republic of Korea
| | - Jaemin Kim
- C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea
| | - DongAhn Yoo
- C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Kwondo Kim
- C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Hak-Kyo Lee
- Department of Animal Biotechnology, Chonbuk National University, Jeonju, 66414, Republic of Korea
| | - Dajeong Lim
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, RDA, Jeonju, 55365, Republic of Korea
| | - Seoae Cho
- C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea
| | - Olivier Hanotte
- University of Nottingham, School of Life Sciences, Nottingham, NG7 2RD, UK.,International Livestock Research Institute, Addis Ababa, Ethiopia
| | - Okeyo Ally Mwai
- International Livestock Research Institute, Box 30709-00100, Nairobi, Kenya
| | - Tadelle Dessie
- International Livestock Research Institute, Addis Ababa, Ethiopia
| | - Stephen Kemp
- International Livestock Research Institute, Box 30709-00100, Nairobi, Kenya.,The Centre for Tropical Livestock Genetics and Health, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, Scotland, UK
| | - Sung Jong Oh
- National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea.
| | - Heebal Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Republic of Korea. .,C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
44
|
Alderisio F, Fiore G, di Bernardo M. Reconstructing the structure of directed and weighted networks of nonlinear oscillators. Phys Rev E 2017; 95:042302. [PMID: 28505733 DOI: 10.1103/physreve.95.042302] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Indexed: 06/07/2023]
Abstract
The formalism of complex networks is extensively employed to describe the dynamics of interacting agents in several applications. The features of the connections among the nodes in a network are not always provided beforehand, hence the problem of appropriately inferring them often arises. Here, we present a method to reconstruct directed and weighted topologies of networks of heterogeneous nonlinear oscillators. We illustrate the theory on a set of representative examples.
Collapse
Affiliation(s)
- Francesco Alderisio
- Department of Engineering Mathematics, Merchant Venturers Building, University of Bristol, Woodland Road, Clifton, Bristol BS8 1UB, United Kingdom
| | - Gianfranco Fiore
- Department of Engineering Mathematics, Merchant Venturers Building, University of Bristol, Woodland Road, Clifton, Bristol BS8 1UB, United Kingdom
| | - Mario di Bernardo
- Department of Engineering Mathematics, Merchant Venturers Building, University of Bristol, Woodland Road, Clifton, Bristol BS8 1UB, United Kingdom
| |
Collapse
|
45
|
Barman S, Kwon YK. A novel mutual information-based Boolean network inference method from time-series gene expression data. PLoS One 2017; 12:e0171097. [PMID: 28178334 PMCID: PMC5298315 DOI: 10.1371/journal.pone.0171097] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 01/16/2017] [Indexed: 11/27/2022] Open
Abstract
Background Inferring a gene regulatory network from time-series gene expression data in systems biology is a challenging problem. Many methods have been suggested, most of which have a scalability limitation due to the combinatorial cost of searching a regulatory set of genes. In addition, they have focused on the accurate inference of a network structure only. Therefore, there is a pressing need to develop a network inference method to search regulatory genes efficiently and to predict the network dynamics accurately. Results In this study, we employed a Boolean network model with a restricted update rule scheme to capture coarse-grained dynamics, and propose a novel mutual information-based Boolean network inference (MIBNI) method. Given time-series gene expression data as an input, the method first identifies a set of initial regulatory genes using mutual information-based feature selection, and then improves the dynamics prediction accuracy by iteratively swapping a pair of genes between sets of the selected regulatory genes and the other genes. Through extensive simulations with artificial datasets, MIBNI showed consistently better performance than six well-known existing methods, REVEAL, Best-Fit, RelNet, CST, CLR, and BIBN in terms of both structural and dynamics prediction accuracy. We further tested the proposed method with two real gene expression datasets for an Escherichia coli gene regulatory network and a fission yeast cell cycle network, and also observed better results using MIBNI compared to the six other methods. Conclusions Taken together, MIBNI is a promising tool for predicting both the structure and the dynamics of a gene regulatory network.
Collapse
Affiliation(s)
- Shohag Barman
- School of Electrical Engineering, University of Ulsan, Daehak-ro, Nam-gu, Ulsan, Republic of Korea
| | - Yung-Keun Kwon
- School of Electrical Engineering, University of Ulsan, Daehak-ro, Nam-gu, Ulsan, Republic of Korea
- * E-mail:
| |
Collapse
|
46
|
Henriques D, Villaverde AF, Rocha M, Saez-Rodriguez J, Banga JR. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput Biol 2017; 13:e1005379. [PMID: 28166222 PMCID: PMC5319798 DOI: 10.1371/journal.pcbi.1005379] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 02/21/2017] [Accepted: 01/24/2017] [Indexed: 11/19/2022] Open
Abstract
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM's ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Collapse
Affiliation(s)
- David Henriques
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| | - Alejandro F. Villaverde
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Julio Saez-Rodriguez
- Joint Research Center for Computational Biomedicine, RWTH-Aachen University, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Julio R. Banga
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| |
Collapse
|
47
|
Liu W, Zhu W, Liao B, Chen H, Ren S, Cai L. Improving gene regulatory network structure using redundancy reduction in the MRNET algorithm. RSC Adv 2017. [DOI: 10.1039/c7ra01557g] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Inferring gene regulatory networks from expression data is a central problem in systems biology.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Wen Zhu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Bo Liao
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Haowen Chen
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Siqi Ren
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Lijun Cai
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| |
Collapse
|
48
|
Saccenti E. Correlation Patterns in Experimental Data Are Affected by Normalization Procedures: Consequences for Data Analysis and Network Inference. J Proteome Res 2016; 16:619-634. [PMID: 27977202 DOI: 10.1021/acs.jproteome.6b00704] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Normalization is a fundamental step in data processing to account for the sample-to-sample variation observed in biological samples. However, data structure is affected by normalization. In this paper, we show how, and to what extent, the correlation structure is affected by the application of 11 different normalization procedures. We also discuss the consequences for data analysis and interpretation, including principal component analysis, partial least-squares discrimination, and the inference of metabolite-metabolite association networks.
Collapse
Affiliation(s)
- Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University , Stippeneng 4 6708 HB Wageningen, The Netherlands
| |
Collapse
|
49
|
Kannan V, Tegner J. Adaptive input data transformation for improved network reconstruction with information theoretic algorithms. Stat Appl Genet Mol Biol 2016; 15:507-520. [PMID: 27875324 DOI: 10.1515/sagmb-2016-0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
We propose a novel systematic procedure of non-linear data transformation for an adaptive algorithm in the context of network reverse-engineering using information theoretic methods. Our methodology is rooted in elucidating and correcting for the specific biases in the estimation techniques for mutual information (MI) given a finite sample of data. These are, in turn, tied to lack of well-defined bounds for numerical estimation of MI for continuous probability distributions from finite data. The nature and properties of the inevitable bias is described, complemented by several examples illustrating their form and variation. We propose an adaptive partitioning scheme for MI estimation that effectively transforms the sample data using parameters determined from its local and global distribution guaranteeing a more robust and reliable reconstruction algorithm. Together with a normalized measure (Shared Information Metric) we report considerably enhanced performance both for in silico and real-world biological networks. We also find that the recovery of true interactions is in particular better for intermediate range of false positive rates, suggesting that our algorithm is less vulnerable to spurious signals of association.
Collapse
|
50
|
Shi M, Shen W, Wang HQ, Chong Y. Adaptive modelling of gene regulatory network using Bayesian information criterion-guided sparse regression approach. IET Syst Biol 2016; 10:252-259. [PMID: 27879480 PMCID: PMC8687338 DOI: 10.1049/iet-syb.2016.0005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 06/13/2016] [Accepted: 06/14/2016] [Indexed: 11/19/2022] Open
Abstract
Inferring gene regulatory networks (GRNs) from microarray expression data are an important but challenging issue in systems biology. In this study, the authors propose a Bayesian information criterion (BIC)-guided sparse regression approach for GRN reconstruction. This approach can adaptively model GRNs by optimising the l1-norm regularisation of sparse regression based on a modified version of BIC. The use of the regularisation strategy ensures the inferred GRNs to be as sparse as natural, while the modified BIC allows incorporating prior knowledge on expression regulation and thus avoids the overestimation of expression regulators as usual. Especially, the proposed method provides a clear interpretation of combinatorial regulations of gene expression by optimally extracting regulation coordination for a given target gene. Experimental results on both simulation data and real-world microarray data demonstrate the competent performance of discovering regulatory relationships in GRN reconstruction.
Collapse
Affiliation(s)
- Ming Shi
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Weiming Shen
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China
| | - Hong-Qiang Wang
- Machine Intelligence and Computational Biology Lab, Institute of Intelligent Machines, Chinese Academy of Science, P.O. Box 1130, Hefei 230031, People's Republic of China
| | - Yanwen Chong
- State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Wuhan 430079, People's Republic of China.
| |
Collapse
|