1
|
Bernal V, Bischoff R, Horvatovich P, Guryev V, Grzegorczyk M. The 'un-shrunk' partial correlation in Gaussian graphical models. BMC Bioinformatics 2021; 22:424. [PMID: 34493207 PMCID: PMC8424921 DOI: 10.1186/s12859-021-04313-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 08/02/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In systems biology, it is important to reconstruct regulatory networks from quantitative molecular profiles. Gaussian graphical models (GGMs) are one of the most popular methods to this end. A GGM consists of nodes (representing the transcripts, metabolites or proteins) inter-connected by edges (reflecting their partial correlations). Learning the edges from quantitative molecular profiles is statistically challenging, as there are usually fewer samples than nodes ('high dimensional problem'). Shrinkage methods address this issue by learning a regularized GGM. However, it remains open to study how the shrinkage affects the final result and its interpretation. RESULTS We show that the shrinkage biases the partial correlation in a non-linear way. This bias does not only change the magnitudes of the partial correlations but also affects their order. Furthermore, it makes networks obtained from different experiments incomparable and hinders their biological interpretation. We propose a method, referred to as 'un-shrinking' the partial correlation, which corrects for this non-linear bias. Unlike traditional methods, which use a fixed shrinkage value, the new approach provides partial correlations that are closer to the actual (population) values and that are easier to interpret. This is demonstrated on two gene expression datasets from Escherichia coli and Mus musculus. CONCLUSIONS GGMs are popular undirected graphical models based on partial correlations. The application of GGMs to reconstruct regulatory networks is commonly performed using shrinkage to overcome the 'high-dimensional problem'. Besides it advantages, we have identified that the shrinkage introduces a non-linear bias in the partial correlations. Ignoring this type of effects caused by the shrinkage can obscure the interpretation of the network, and impede the validation of earlier reported results.
Collapse
Affiliation(s)
- Victor Bernal
- Bernoulli Institute, University of Groningen, Groningen, 9747 AG, The Netherlands.,Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, Groningen, 9713 AV, The Netherlands
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, Groningen, 9713 AV, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, Groningen, 9713 AV, The Netherlands.
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, 9713 AV, The Netherlands.
| | - Marco Grzegorczyk
- Bernoulli Institute, University of Groningen, Groningen, 9747 AG, The Netherlands.
| |
Collapse
|
2
|
Bernal V, Bischoff R, Guryev V, Grzegorczyk M, Horvatovich P. Exact hypothesis testing for shrinkage-based Gaussian graphical models. Bioinformatics 2020; 35:5011-5017. [PMID: 31077287 PMCID: PMC6901079 DOI: 10.1093/bioinformatics/btz357] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/08/2019] [Accepted: 04/26/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION One of the main goals in systems biology is to learn molecular regulatory networks from quantitative profile data. In particular, Gaussian graphical models (GGMs) are widely used network models in bioinformatics where variables (e.g. transcripts, metabolites or proteins) are represented by nodes, and pairs of nodes are connected with an edge according to their partial correlation. Reconstructing a GGM from data is a challenging task when the sample size is smaller than the number of variables. The main problem consists in finding the inverse of the covariance estimator which is ill-conditioned in this case. Shrinkage-based covariance estimators are a popular approach, producing an invertible 'shrunk' covariance. However, a proper significance test for the 'shrunk' partial correlation (i.e. the GGM edges) is an open challenge as a probability density including the shrinkage is unknown. In this article, we present (i) a geometric reformulation of the shrinkage-based GGM, and (ii) a probability density that naturally includes the shrinkage parameter. RESULTS Our results show that the inference using this new 'shrunk' probability density is as accurate as Monte Carlo estimation (an unbiased non-parametric method) for any shrinkage value, while being computationally more efficient. We show on synthetic data how the novel test for significance allows an accurate control of the Type I error and outperforms the network reconstruction obtained by the widely used R package GeneNet. This is further highlighted in two gene expression datasets from stress response in Eschericha coli, and the effect of influenza infection in Mus musculus. AVAILABILITY AND IMPLEMENTATION https://github.com/V-Bernal/GGM-Shrinkage. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victor Bernal
- Bernoulli Institute, University of Groningen, Groningen AG, The Netherlands.,Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen AV, The Netherlands
| | - Marco Grzegorczyk
- Bernoulli Institute, University of Groningen, Groningen AG, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| |
Collapse
|
3
|
Alexiou A, Chatzichronis S, Perveen A, Hafeez A, Ashraf GM. Algorithmic and Stochastic Representations of Gene Regulatory Networks and Protein-Protein Interactions. Curr Top Med Chem 2019; 19:413-425. [PMID: 30854971 DOI: 10.2174/1568026619666190311125256] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 10/15/2018] [Accepted: 12/26/2018] [Indexed: 02/06/2023]
Abstract
BACKGROUND Latest studies reveal the importance of Protein-Protein interactions on physiologic functions and biological structures. Several stochastic and algorithmic methods have been published until now, for the modeling of the complex nature of the biological systems. OBJECTIVE Biological Networks computational modeling is still a challenging task. The formulation of the complex cellular interactions is a research field of great interest. In this review paper, several computational methods for the modeling of GRN and PPI are presented analytically. METHODS Several well-known GRN and PPI models are presented and discussed in this review study such as: Graphs representation, Boolean Networks, Generalized Logical Networks, Bayesian Networks, Relevance Networks, Graphical Gaussian models, Weight Matrices, Reverse Engineering Approach, Evolutionary Algorithms, Forward Modeling Approach, Deterministic models, Static models, Hybrid models, Stochastic models, Petri Nets, BioAmbients calculus and Differential Equations. RESULTS GRN and PPI methods have been already applied in various clinical processes with potential positive results, establishing promising diagnostic tools. CONCLUSION In literature many stochastic algorithms are focused in the simulation, analysis and visualization of the various biological networks and their dynamics interactions, which are referred and described in depth in this review paper.
Collapse
Affiliation(s)
| | | | - Asma Perveen
- Glocal School of Life Sciences, Glocal University, Mirzapur Pole, Saharanpur, Uttar Pradesh, India
| | - Abdul Hafeez
- Glocal School of Pharmacy, Glocal University, Mirzapur Pole, Saharanpur, Uttar Pradesh, India
| | - Ghulam Md. Ashraf
- King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
4
|
Matache MT, Matache V. Logical Reduction of Biological Networks to Their Most Determinative Components. Bull Math Biol 2016; 78:1520-45. [PMID: 27417985 PMCID: PMC4993808 DOI: 10.1007/s11538-016-0193-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Accepted: 06/23/2016] [Indexed: 12/20/2022]
Abstract
Boolean networks have been widely used as models for gene regulatory networks, signal transduction networks, or neural networks, among many others. One of the main difficulties in analyzing the dynamics of a Boolean network and its sensitivity to perturbations or mutations is the fact that it grows exponentially with the number of nodes. Therefore, various approaches for simplifying the computations and reducing the network to a subset of relevant nodes have been proposed in the past few years. We consider a recently introduced method for reducing a Boolean network to its most determinative nodes that yield the highest information gain. The determinative power of a node is obtained by a summation of all mutual information quantities over all nodes having the chosen node as a common input, thus representing a measure of information gain obtained by the knowledge of the node under consideration. The determinative power of nodes has been considered in the literature under the assumption that the inputs are independent in which case one can use the Bahadur orthonormal basis. In this article, we relax that assumption and use a standard orthonormal basis instead. We use techniques of Hilbert space operators and harmonic analysis to generate formulas for the sensitivity to perturbations of nodes, quantified by the notions of influence, average sensitivity, and strength. Since we work on finite-dimensional spaces, our formulas and estimates can be and are formulated in plain matrix algebra terminology. We analyze the determinative power of nodes for a Boolean model of a signal transduction network of a generic fibroblast cell. We also show the similarities and differences induced by the alternative complete orthonormal basis used. Among the similarities, we mention the fact that the knowledge of the states of the most determinative nodes reduces the entropy or uncertainty of the overall network significantly. In a special case, we obtain a stronger result than in previous works, showing that a large information gain from a set of input nodes generates increased sensitivity to perturbations of those inputs.
Collapse
Affiliation(s)
- Mihaela T. Matache
- Department of Mathematics, University of Nebraska at Omaha, Omaha, NE 68182-0243 USA
| | - Valentin Matache
- Department of Mathematics, University of Nebraska at Omaha, Omaha, NE 68182-0243 USA
| |
Collapse
|
5
|
Al-Dalky R, Taha K, Al Homouz D, Qasaimeh M. Applying Monte Carlo Simulation to Biomedical Literature to Approximate Genetic Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:494-504. [PMID: 26415184 DOI: 10.1109/tcbb.2015.2481399] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Biologists often need to know the set of genes associated with a given set of genes or a given disease. We propose in this paper a classifier system called Monte Carlo for Genetic Network (MCforGN) that can construct genetic networks, identify functionally related genes, and predict gene-disease associations. MCforGN identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g , the system first extracts the set of genes found within the abstracts of biomedical literature associated with g. It then ranks these genes to determine the ones with high co-occurrences with g . It overcomes the limitations of current approaches that employ analytical deterministic algorithms by applying Monte Carlo Simulation to approximate genetic networks. It does so by conducting repeated random sampling to obtain numerical results and to optimize these results. Moreover, it analyzes results to obtain the probabilities of different genes' co-occurrences using series of statistical tests. MCforGN can detect gene-disease associations by employing a combination of centrality measures (to identify the central genes in disease-specific genetic networks) and Monte Carlo Simulation. MCforGN aims at enhancing state-of-the-art biological text mining by applying novel extraction techniques. We evaluated MCforGN by comparing it experimentally with nine approaches. Results showed marked improvement.
Collapse
|
6
|
Woodhouse S, Moignard V, Göttgens B, Fisher J. Processing, visualising and reconstructing network models from single-cell data. Immunol Cell Biol 2016; 94:256-65. [PMID: 26577213 DOI: 10.1038/icb.2015.102] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 11/03/2015] [Accepted: 11/11/2015] [Indexed: 11/09/2022]
Abstract
New single-cell technologies readily permit gene expression profiling of thousands of cells at single-cell resolution. In this review, we will discuss methods for visualisation and interpretation of single-cell gene expression data, and the computational analysis needed to go from raw data to predictive executable models of gene regulatory network function. We will focus primarily on single-cell real-time quantitative PCR and RNA-sequencing data, but much of what we cover will also be relevant to other platforms, such as the mass cytometry technology for high-dimensional single-cell proteomics.
Collapse
Affiliation(s)
- Steven Woodhouse
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
- Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Victoria Moignard
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
- Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Berthold Göttgens
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
- Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Jasmin Fisher
- Microsoft Research, Cambridge, UK
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
Isewon I, Oyelade J, Brors B, Adebiyi E. In Silico Gene Regulatory Network of the Maurer's Cleft Pathway in Plasmodium falciparum. Evol Bioinform Online 2015; 11:231-8. [PMID: 26526876 PMCID: PMC4620995 DOI: 10.4137/ebo.s25585] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Revised: 07/28/2015] [Accepted: 08/03/2015] [Indexed: 11/15/2022] Open
Abstract
The Maurer's clefts (MCs) are very important for the survival of Plasmodium falciparum within an infected cell as they are induced by the parasite itself in the erythrocyte for protein trafficking. The MCs form an interesting part of the parasite's biology as they shed more light on how the parasite remodels the erythrocyte leading to host pathogenesis and death. Here, we predicted and analyzed the genetic regulatory network of genes identified to belong to the MCs using regularized graphical Gaussian model. Our network shows four major activators, their corresponding target genes, and predicted binding sites. One of these master activators is the serine repeat antigen 5 (SERA5), predominantly expressed among the SERA multigene family of P. falciparum, which is one of the blood-stage malaria vaccine candidates. Our results provide more details about functional interactions and the regulation of the genes in the MCs’ pathway of P. falciparum.
Collapse
Affiliation(s)
- Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Benedikt Brors
- Department of Applied Bioinformatics, German Cancer Research Centre (DKFZ), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Applied Bioinformatics, German Cancer Research Centre (DKFZ), Heidelberg, Germany
| |
Collapse
|
8
|
Geman D, Ochs M, Price ND, Tomasetti C, Younes L. An argument for mechanism-based statistical inference in cancer. Hum Genet 2015; 134:479-95. [PMID: 25381197 PMCID: PMC4612627 DOI: 10.1007/s00439-014-1501-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 10/14/2014] [Indexed: 01/07/2023]
Abstract
Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning biomarkers, metabolism, cell signaling, network inference and tumorigenesis.
Collapse
Affiliation(s)
- Donald Geman
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21210, USA,
| | | | | | | | | |
Collapse
|
9
|
Acharya L, Judeh T, Duan Z, Rabbat M, Zhu D. GSGS: a computational approach to reconstruct signaling pathway structures from gene sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 9:438-450. [PMID: 22025758 DOI: 10.1109/tcbb.2011.143] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Reconstruction of signaling pathway structures is essential to decipher complex regulatory relationships in living cells. Existing approaches often rely on unrealistic biological assumptions and do not explicitly consider signal transduction mechanisms. Signal transduction events refer to linear cascades of reactions from cell surface to nucleus and characterize a signaling pathway. We propose a novel approach, Gene Set Gibbs Sampling, to reverse engineer signaling pathway structures from gene sets related to pathways. We hypothesize that signaling pathways are structurally an ensemble of overlapping linear signal transduction events which we encode as Information Flows (IFs). We infer signaling pathway structures from gene sets, referred to as Information Flow Gene Sets (IFGSs), corresponding to these events. Thus, an IFGS only reflects which genes appear in the underlying IF but not their ordering. GSGS offers a Gibbs sampling procedure to reconstruct the underlying signaling pathway structure by sequentially inferring IFs from the overlapping IFGSs related to the pathway. In the proof-of-concept studies, our approach is shown to outperform existing network inference approaches using data generated from benchmark networks in DREAM. We perform a sensitivity analysis to assess the robustness of our approach. Finally, we implement GSGS to reconstruct signaling mechanisms in breast cancer cells.
Collapse
|
10
|
Serb JM, Orr MC, West Greenlee MH. Using evolutionary conserved modules in gene networks as a strategy to leverage high throughput gene expression queries. PLoS One 2010; 5:e12525. [PMID: 20824082 PMCID: PMC2932711 DOI: 10.1371/journal.pone.0012525] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2010] [Accepted: 08/04/2010] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Large-scale gene expression studies have not yielded the expected insight into genetic networks that control complex processes. These anticipated discoveries have been limited not by technology, but by a lack of effective strategies to investigate the data in a manageable and meaningful way. Previous work suggests that using a pre-determined seed-network of gene relationships to query large-scale expression datasets is an effective way to generate candidate genes for further study and network expansion or enrichment. Based on the evolutionary conservation of gene relationships, we test the hypothesis that a seed network derived from studies of retinal cell determination in the fly, Drosophila melanogaster, will be an effective way to identify novel candidate genes for their role in mouse retinal development. METHODOLOGY/PRINCIPAL FINDINGS Our results demonstrate that a number of gene relationships regulating retinal cell differentiation in the fly are identifiable as pairwise correlations between genes from developing mouse retina. In addition, we demonstrate that our extracted seed-network of correlated mouse genes is an effective tool for querying datasets and provides a context to generate hypotheses. Our query identified 46 genes correlated with our extracted seed-network members. Approximately 54% of these candidates had been previously linked to the developing brain and 33% had been previously linked to the developing retina. Five of six candidate genes investigated further were validated by experiments examining spatial and temporal protein expression in the developing retina. CONCLUSIONS/SIGNIFICANCE We present an effective strategy for pursuing a systems biology approach that utilizes an evolutionary comparative framework between two model organisms, fly and mouse. Future implementation of this strategy will be useful to determine the extent of network conservation, not just gene conservation, between species and will facilitate the use of prior biological knowledge to develop rational systems-based hypotheses.
Collapse
Affiliation(s)
- Jeanne M Serb
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, Iowa, United States of America.
| | | | | |
Collapse
|