1
|
Song Z, Gunn S, Monti S, Peloso GM, Liu CT, Lunetta K, Sebastiani P. Learning Gaussian Graphical Models from Correlated Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.03.587948. [PMID: 38617340 PMCID: PMC11014549 DOI: 10.1101/2024.04.03.587948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Gaussian Graphical Models (GGM) have been widely used in biomedical research to explore complex relationships between many variables. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a Bootstrap algorithm to infer GGM from correlated data. We use extensive simulations of correlated data from family-based studies to show that the Bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions. We apply our method to learn the GGM that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from a family-based study known as the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well in this real example.
Collapse
|
2
|
Wu Z, Sinha S. SPREd: a simulation-supervised neural network tool for gene regulatory network reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae011. [PMID: 38444538 PMCID: PMC10913396 DOI: 10.1093/bioadv/vbae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/08/2023] [Accepted: 01/18/2024] [Indexed: 03/07/2024]
Abstract
Summary Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd," is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g. correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA, and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold-standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step toward incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction. Availability and implementation Data and code are available from https://github.com/iiiime/SPREd.
Collapse
Affiliation(s)
- Zijun Wu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
- H. Milton Steward School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
3
|
Naghizadeh MM, Osati S, Homayounfar R, Masoudi-Nejad A. Food co-consumption network as a new approach to dietary pattern in non-alcoholic fatty liver disease. Sci Rep 2023; 13:20703. [PMID: 38001137 PMCID: PMC10673913 DOI: 10.1038/s41598-023-47752-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 11/17/2023] [Indexed: 11/26/2023] Open
Abstract
Dietary patterns strongly correlate with non-alcoholic fatty liver disease (NAFLD), which is a leading cause of chronic liver disease in developed societies. In this study, we introduce a new definition, the co-consumption network (CCN), which depicts the common consumption patterns of food groups through network analysis. We then examine the relationship between dietary patterns and NAFLD by analyzing this network. We selected 1500 individuals living in Tehran, Iran, cross-sectionally. They completed a food frequency questionnaire and underwent scanning via the FibroScan for liver stiffness, using the CAP score. The food items were categorized into 40 food groups. We reconstructed the CCN using the Spearman correlation-based connection. We then created healthy and unhealthy clusters using the label propagation algorithm. Participants were assigned to two clusters using the hypergeometric distribution. Finally, we classified participants into two healthy NAFLD networks, and reconstructed the gender and disease differential CCNs. We found that the sweet food group was the hub of the proposed CCN, with the largest cliques of size 5 associated with the unhealthy cluster. The unhealthy module members had a significantly higher CAP score (253.7 ± 47.8) compared to the healthy module members (218.0 ± 46.4) (P < 0.001). The disease differential CCN showed that in the case of NAFLD, processed meat had been co-consumed with mayonnaise and soft drinks, in contrast to the healthy participants, who had co-consumed fruits with green leafy and yellow vegetables. The CCN is a powerful method for presenting food groups, their consumption quantity, and their interactions efficiently. Moreover, it facilitates the examination of the relationship between dietary patterns and NAFLD.
Collapse
Affiliation(s)
- Mohammad Mehdi Naghizadeh
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
- Noncommunicable Diseases Research Center, Fasa University of Medical Science, Fasa, Iran
| | - Saeed Osati
- National Nutrition and Food Technology Research Institute, Faculty of Nutrition Sciences and Food Technology, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Reza Homayounfar
- National Nutrition and Food Technology Research Institute, Faculty of Nutrition Sciences and Food Technology, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
4
|
Bianchi P, Elgui K, Portier F. Conditional independence testing via weighted partial copulas. J MULTIVARIATE ANAL 2023. [DOI: 10.1016/j.jmva.2022.105120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
5
|
Shi T, Yu H, Blair RH. Integrated regulatory and metabolic networks of the tumor microenvironment for therapeutic target prioritization. Stat Appl Genet Mol Biol 2023; 22:sagmb-2022-0054. [PMID: 37988745 DOI: 10.1515/sagmb-2022-0054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 09/28/2023] [Indexed: 11/23/2023]
Abstract
Translation of genomic discovery, such as single-cell sequencing data, to clinical decisions remains a longstanding bottleneck in the field. Meanwhile, computational systems biological models, such as cellular metabolism models and cell signaling pathways, have emerged as powerful approaches to provide efficient predictions in metabolites and gene expression levels, respectively. However, there has been limited research on the integration between these two models. This work develops a methodology for integrating computational models of probabilistic gene regulatory networks with a constraint-based metabolism model. By using probabilistic reasoning with Bayesian Networks, we aim to predict cell-specific changes under different interventions, which are embedded into the constraint-based models of metabolism. Applications to single-cell sequencing data of glioblastoma brain tumors generate predictions about the effects of pharmaceutical interventions on the regulatory network and downstream metabolisms in different cell types from the tumor microenvironment. The model presents possible insights into treatments that could potentially suppress anaerobic metabolism in malignant cells with minimal impact on other cell types' metabolism. The proposed integrated model can guide therapeutic target prioritization, the formulation of combination therapies, and future drug discovery. This model integration framework is also generalizable to other applications, such as different cell types, organisms, and diseases.
Collapse
Affiliation(s)
- Tiange Shi
- University at Buffalo, Biostatistics, Buffalo, USA
| | - Han Yu
- Roswell Park Comprehensive Cancer Center, Biostatistics and Bioinformatics, Buffalo, USA
| | - Rachael Hageman Blair
- University at Buffalo, Biostatistics, Institute for Artificial Intelligence and Data Science, Buffalo, USA
| |
Collapse
|
6
|
Shutta KH, Weighill D, Burkholz R, Guebila M, DeMeo DL, Zacharias HU, Quackenbush J, Altenbuchinger M. DRAGON: Determining Regulatory Associations using Graphical models on multi-Omic Networks. Nucleic Acids Res 2022; 51:e15. [PMID: 36533448 PMCID: PMC9943674 DOI: 10.1093/nar/gkac1157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 11/08/2022] [Accepted: 11/23/2022] [Indexed: 12/23/2022] Open
Abstract
The increasing quantity of multi-omic data, such as methylomic and transcriptomic profiles collected on the same specimen or even on the same cell, provides a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis of paired omics data. This method, called DRAGON (Determining Regulatory Associations using Graphical models on multi-Omic Networks), calibrates its parameters to achieve an optimal trade-off between the network's complexity and estimation accuracy, while explicitly accounting for the characteristics of each of the assessed omics 'layers.' In simulation studies, we show that DRAGON adapts to edge density and feature size differences between omics layers, improving model inference and edge recovery compared to state-of-the-art methods. We further demonstrate in an analysis of joint transcriptome - methylome data from TCGA breast cancer specimens that DRAGON can identify key molecular mechanisms such as gene regulation via promoter methylation. In particular, we identify Transcription Factor AP-2 Beta (TFAP2B) as a potential multi-omic biomarker for basal-type breast cancer. DRAGON is available as open-source code in Python through the Network Zoo package (netZooPy v0.8; netzoo.github.io).
Collapse
Affiliation(s)
| | | | - Rebekka Burkholz
- CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Brigham and Women’s Hospital, and Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Helena U Zacharias
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Kiel, Kiel, Germany,Institute of Clinical Molecular Biology, Kiel University and University Medical Center Schleswig-Holstein, Campus Kiel, Kiel, Germany,Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover, Germany
| | | | - Michael Altenbuchinger
- To whom correspondence should be addressed. Tel: +49 551 39 61788; Fax: +49 551 39 61783;
| |
Collapse
|
7
|
Thermodynamic Modelling of Transcriptional Control: A Sensitivity Analysis. MATHEMATICS 2022. [DOI: 10.3390/math10132169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Modelling is a tool used to decipher the biochemical mechanisms involved in transcriptional control. Experimental evidence in genetics is usually supported by theoretical models in order to evaluate the effects of all the possible interactions that can occur in these complicated processes. Models derived from the thermodynamic method are critical in this labour because they are able to take into account multiple mechanisms operating simultaneously at the molecular micro-scale and relate them to transcriptional initiation at the tissular macro-scale. This work is devoted to adapting computational techniques to this context in order to theoretically evaluate the role played by several biochemical mechanisms. The interest of this theoretical analysis relies on the fact that it can be contrasted against those biological experiments where the response to perturbations in the transcriptional machinery environment is evaluated in terms of genetically activated/repressed regions. The theoretical reproduction of these experiments leads to a sensitivity analysis whose results are expressed in terms of the elasticity of a threshold function determining those activated/repressed regions. The study of this elasticity function in thermodynamic models already proposed in the literature reveals that certain modelling approaches can alter the balance between the biochemical mechanisms considered, and this can cause false/misleading outcomes. The reevaluation of classical thermodynamic models gives us a more accurate and complete picture of the interactions involved in gene regulation and transcriptional control, which enables more specific predictions. This sensitivity approach provides a definite advantage in the interpretation of a wide range of genetic experimental results.
Collapse
|
8
|
Jansen JE, Aschenbrenner D, Uhlig HH, Coles MC, Gaffney EA. A method for the inference of cytokine interaction networks. PLoS Comput Biol 2022; 18:e1010112. [PMID: 35731827 PMCID: PMC9216621 DOI: 10.1371/journal.pcbi.1010112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 04/15/2022] [Indexed: 11/19/2022] Open
Abstract
Cell-cell communication is mediated by many soluble mediators, including over 40 cytokines. Cytokines, e.g. TNF, IL1β, IL5, IL6, IL12 and IL23, represent important therapeutic targets in immune-mediated inflammatory diseases (IMIDs), such as inflammatory bowel disease (IBD), psoriasis, asthma, rheumatoid and juvenile arthritis. The identification of cytokines that are causative drivers of, and not just associated with, inflammation is fundamental for selecting therapeutic targets that should be studied in clinical trials. As in vitro models of cytokine interactions provide a simplified framework to study complex in vivo interactions, and can easily be perturbed experimentally, they are key for identifying such targets. We present a method to extract a minimal, weighted cytokine interaction network, given in vitro data on the effects of the blockage of single cytokine receptors on the secretion rate of other cytokines. Existing biological network inference methods typically consider the correlation structure of the underlying dataset, but this can make them poorly suited for highly connected, non-linear cytokine interaction data. Our method uses ordinary differential equation systems to represent cytokine interactions, and efficiently computes the configuration with the lowest Akaike information criterion value for all possible network configurations. It enables us to study indirect cytokine interactions and quantify inhibition effects. The extracted network can also be used to predict the combined effects of inhibiting various cytokines simultaneously. The model equations can easily be adjusted to incorporate more complicated dynamics and accommodate temporal data. We validate our method using synthetic datasets and apply our method to an experimental dataset on the regulation of IL23, a cytokine with therapeutic relevance in psoriasis and IBD. We validate several model predictions against experimental data that were not used for model fitting. In summary, we present a novel method specifically designed to efficiently infer cytokine interaction networks from cytokine perturbation data in the context of IMIDs. Cytokines are the messenger molecules of the immune system, allowing intercellular communication and mediating effective immune responses. They are an important therapeutic target in immune mediated inflammatory diseases such as inflammatory bowel disease (IBD) and rheumatoid arthritis. Cytokines interact in a tightly regulated network and depending on the context a particular cytokine can be involved in anti-inflammatory or inflammatory activities. In order to determine which cytokines to target in specific disease types and patient subsets, it is critical to study the effects of the inhibition of one or more cytokines on the larger cytokine interaction network. We present a novel method to extract a minimal, weighted network from cytokine interaction data. Existing biological network inference methods typically consider the correlation structure of the underlying dataset and/or make further assumptions of the dataset such as the existence of a small core of regulators. This can make them poorly suited for highly connected, non-linear cytokine interaction data. We validated our method using synthetic data and applied our method to a dataset on the regulation of IL23, a cytokine implicated in IBD pathogenesis. Predictions of the extracted IL23 network were validated using additional experimental data and were used to support the view of the cytokines IL1 and IL23 as promising targets for those patients that fail to respond to TNFα inhibition, the current golden standard in IBD treatment.
Collapse
Affiliation(s)
- Joanneke E. Jansen
- Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford, United Kingdom
- Translational Gastroenterology Unit, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom
| | - Dominik Aschenbrenner
- Translational Gastroenterology Unit, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- Autoimmunity, Transplantation and Inflammation, Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland
| | - Holm H. Uhlig
- Translational Gastroenterology Unit, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
- Department of Paediatrics, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
| | - Mark C. Coles
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom
| | - Eamonn A. Gaffney
- Wolfson Centre for Mathematical Biology, Mathematical Institute, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
9
|
Xin H, Zhao SD. A compound decision approach to covariance matrix estimation. Biometrics 2022. [PMID: 35499364 DOI: 10.1111/biom.13686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 04/18/2022] [Indexed: 11/29/2022]
Abstract
Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is sub-optimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings are common in modern genomics, where covariance matrix estimation is frequently employed as a method for inferring gene networks. To achieve estimation accuracy in these settings, existing methods typically either assume that the population covariance matrix has some particular structure, for example sparsity, or apply shrinkage to better estimate the population eigenvalues. In this paper, we study a new approach to estimating high-dimensional covariance matrices. We first frame covariance matrix estimation as a compound decision problem. This motivates defining a class of decision rules and using a nonparametric empirical Bayes g-modeling approach to estimate the optimal rule in the class. Simulation results and gene network inference in an RNA-seq experiment in mouse show that our approach is comparable to or can outperform a number of state-of-the-art proposals. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Huiqin Xin
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois
| | - Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois
| |
Collapse
|
10
|
Abstract
DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.In this chapter we provide an overview of the methods and tools used to create networks from microarray data and describe multiple methods on how to analyze a single network or a group of networks. The described methods range from topological metrics, functional group identification to data integration strategies, topological pathway analysis as well as graphical models.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology , University of Helsinki, Helsinki, Finland.
| |
Collapse
|
11
|
Zhang S, Knaack S, Roy S. Enabling Studies of Genome-Scale Regulatory Network Evolution in Large Phylogenies with MRTLE. Methods Mol Biol 2022; 2477:439-455. [PMID: 35524131 PMCID: PMC9794031 DOI: 10.1007/978-1-0716-2257-5_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Transcriptional regulatory networks specify context-specific patterns of genes and play a central role in how species evolve and adapt. Inferring genome-scale regulatory networks in non-model species is the first step for examining patterns of conservation and divergence of regulatory networks. Transcriptomic data obtained under varying environmental stimuli in multiple species are becoming increasingly available, which can be used to infer regulatory networks. However, inference and analysis of multiple gene regulatory networks in a phylogenetic setting remains challenging. We developed an algorithm, Multi-species Regulatory neTwork LEarning (MRTLE), to facilitate such studies of regulatory network evolution. MRTLE is a probabilistic graphical model-based algorithm that uses phylogenetic structure, transcriptomic data for multiple species, and sequence-specific motifs in each species to simultaneously infer genome-scale regulatory networks across multiple species. We applied MRTLE to study regulatory network evolution across six ascomycete yeasts using transcriptomic measurements collected across different stress conditions. MRTLE networks recapitulated experimentally derived interactions in the model organism S. cerevisiae as well as non-model species, and it was more beneficial for network inference than methods that do not use phylogenetic information. We examined the regulatory networks across species and found that regulators associated with significant expression and network changes are involved in stress-related processes. MTRLE and its associated downstream analysis provide a scalable and principled framework to examine evolutionary dynamics of transcriptional regulatory networks across multiple species in a large phylogeny.
Collapse
Affiliation(s)
- Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sara Knaack
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
12
|
On the Fourier transform of a quantitative trait: Implications for compressive sensing. J Theor Biol 2021; 540:110985. [PMID: 34953868 DOI: 10.1016/j.jtbi.2021.110985] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 12/01/2021] [Accepted: 12/09/2021] [Indexed: 11/23/2022]
Abstract
This paper explores the genotype-phenotype relationship. It outlines conditions under which the dependence of a quantitative trait on the genome might be predictable, based on measurement of a limited subset of genotypes. It uses the theory of real-valued Boolean functions in a systematic way to translate trait data into the Fourier domain. Important trait features, such as the roughness of the trait landscape or the modularity of a trait have a simple Fourier interpretation. Roughness at a gene location corresponds to high sensitivity to mutation, while a modular organization of gene activity reduces such sensitivity. Traits where rugged loci are rare will naturally compress gene data in the Fourier domain, leading to a sparse representation of trait data, concentrated in identifiable, low-level coefficients. This Fourier representation of a trait organizes epistasis in a form which is isometric to the trait data. As Fourier matrices are known to be maximally incoherent with the standard basis, this permits employing compressive sensing techniques to work from data sets that are relatively small-sometimes even of polynomial size-compared to the exponentially large sets of possible genomes. This theory provides a theoretical underpinning for systematic use of Boolean function machinery to dissect the dependency of a trait on the genome and environment.
Collapse
|
13
|
Castelletti F, Mascaro A. Structural learning and estimation of joint causal effects among network-dependent variables. STAT METHOD APPL-GER 2021. [DOI: 10.1007/s10260-021-00579-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractBayesian networks in the form of Directed Acyclic Graphs (DAGs) represent an effective tool for modeling and inferring dependence relations among variables, a process known as structural learning. In addition, when equipped with the notion of intervention, a causal DAG model can be adopted to quantify the causal effect on a response due to a hypothetical intervention on some variable. Observational data cannot distinguish between DAGs encoding the same set of conditional independencies (Markov equivalent DAGs), which however can be different from a causal perspective. In addition, because causal effects depend on the underlying network structure, uncertainty around the DAG generating model crucially affects the causal estimation results. We propose a Bayesian methodology which combines structural learning of Gaussian DAG models and inference of causal effects as arising from simultaneous interventions on any given set of variables in the system. Our approach fully accounts for the uncertainty around both the network structure and causal relationships through a joint posterior distribution over DAGs, DAG parameters and then causal effects.
Collapse
|
14
|
Shang J, Wang J, Sun Y, Li F, Liu JX, Zhang H. Multiscale part mutual information for quantifying nonlinear direct associations in networks. Bioinformatics 2021; 37:2920-2929. [PMID: 33730153 DOI: 10.1093/bioinformatics/btab182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/15/2021] [Accepted: 03/15/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. RESULTS In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. AVAILABILITY AND IMPLEMENTATION The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jing Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yan Sun
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Honghai Zhang
- College of Life Science, Qufu Normal University, Qufu 273165, China
| |
Collapse
|
15
|
Klebermass EM, Mahmudi M, Geist BK, Pichler V, Vraka C, Balber T, Miller A, Haschemi A, Viernstein H, Rohr-Udilova N, Hacker M, Mitterhauser M. If It Works, Don't Touch It? A Cell-Based Approach to Studying 2-[ 18F]FDG Metabolism. Pharmaceuticals (Basel) 2021; 14:ph14090910. [PMID: 34577610 PMCID: PMC8467898 DOI: 10.3390/ph14090910] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/04/2021] [Accepted: 09/06/2021] [Indexed: 11/28/2022] Open
Abstract
The glucose derivative 2-[18F]fluoro-2-deoxy-D-glucose (2-[18F]FDG) is still the most used radiotracer for positron emission tomography, as it visualizes glucose utilization and energy demand. In general, 2-[18F]FDG is said to be trapped intracellularly as 2-[18F]FDG-6-phosphate, which cannot be further metabolized. However, increasingly, this dogma is being questioned because of publications showing metabolism beyond 2-[18F]FDG-6-phosphate and even postulating 2-[18F]FDG imaging to depend on the enzyme hexose-6-phosphate dehydrogenase in the endoplasmic reticulum. Therefore, we aimed to study 2-[18F]FDG metabolism in the human cancer cell lines HT1080, HT29 and Huh7 applying HPLC. We then compared 2-[18F]FDG metabolism with intracellular tracer accumulation, efflux and the cells’ metabolic state and used a graphical Gaussian model to visualize metabolic patterns. The extent of 2-[18F]FDG metabolism varied considerably, dependent on the cell line, and was significantly enhanced by glucose withdrawal. However, the metabolic pattern was quite conserved. The most important radiometabolites beyond 2-[18F]FDG-6-phosphate were 2-[18F]FDMannose-6-phosphate, 2-[18F]FDG-1,6-bisphosphate and 2-[18F]FD-phosphogluconolactone. Enhanced radiometabolite formation under glucose reduction was accompanied by reduced efflux and mirrored the cells’ metabolic switch as assessed via extracellular lactate levels. We conclude that there can be considerable metabolism beyond 2-[18F]FDG-6-phosphate in cancer cell lines and a comprehensive understanding of 2-[18F]FDG metabolism might help to improve cancer research and tumor diagnosis.
Collapse
Affiliation(s)
- Eva-Maria Klebermass
- Division of Nuclear Medicine, Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, 1090 Vienna, Austria; (E.-M.K.); (M.M.); (B.K.G.); (C.V.); (T.B.); (M.H.)
- Division of Pharmaceutical Technology and Biopharmaceutics, Department of Pharmaceutical Sciences, University of Vienna, 1090 Vienna, Austria;
| | - Mahshid Mahmudi
- Division of Nuclear Medicine, Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, 1090 Vienna, Austria; (E.-M.K.); (M.M.); (B.K.G.); (C.V.); (T.B.); (M.H.)
| | - Barbara Katharina Geist
- Division of Nuclear Medicine, Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, 1090 Vienna, Austria; (E.-M.K.); (M.M.); (B.K.G.); (C.V.); (T.B.); (M.H.)
| | - Verena Pichler
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, 1090 Vienna, Austria;
| | - Chrysoula Vraka
- Division of Nuclear Medicine, Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, 1090 Vienna, Austria; (E.-M.K.); (M.M.); (B.K.G.); (C.V.); (T.B.); (M.H.)
| | - Theresa Balber
- Division of Nuclear Medicine, Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, 1090 Vienna, Austria; (E.-M.K.); (M.M.); (B.K.G.); (C.V.); (T.B.); (M.H.)
- Ludwig Boltzmann Institute Applied Diagnostics, 1090 Vienna, Austria
| | - Anne Miller
- Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria; (A.M.); (A.H.)
| | - Arvand Haschemi
- Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria; (A.M.); (A.H.)
| | - Helmut Viernstein
- Division of Pharmaceutical Technology and Biopharmaceutics, Department of Pharmaceutical Sciences, University of Vienna, 1090 Vienna, Austria;
| | - Nataliya Rohr-Udilova
- Division of Gastroenterology and Hepatology, Department of Internal Medicine III, Medical University of Vienna, 1090 Vienna, Austria;
| | - Marcus Hacker
- Division of Nuclear Medicine, Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, 1090 Vienna, Austria; (E.-M.K.); (M.M.); (B.K.G.); (C.V.); (T.B.); (M.H.)
| | - Markus Mitterhauser
- Division of Nuclear Medicine, Department of Biomedical Imaging and Image-Guided Therapy, Medical University of Vienna, 1090 Vienna, Austria; (E.-M.K.); (M.M.); (B.K.G.); (C.V.); (T.B.); (M.H.)
- Ludwig Boltzmann Institute Applied Diagnostics, 1090 Vienna, Austria
- Correspondence:
| |
Collapse
|
16
|
Han SW, Park S, Zhong H, Ryu ES, Wang P, Jung S, Lim J, Yoon J, Kim S. Estimation of joint directed acyclic graphs with lasso family for gene networks. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2019.1618869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Sung Won Han
- School of Industrial Management Engineering, Korea University, Seongbuk-gu, Seoul, Republic of Korea
| | - Sunghoon Park
- School of Industrial Management Engineering, Korea University, Seongbuk-gu, Seoul, Republic of Korea
| | - Hua Zhong
- Division of Biostatistics, Department of Population Health, New York University, New York, New York, USA
| | - Eun-Seok Ryu
- Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Sehee Jung
- AI Analytics Team, Deep Visions, Seodaemun-gu, Seoul, Republic of Korea
| | - Jayeon Lim
- Department of Applied Statistics, Konkuk University, Gwangjin-gu, Seoul, Republic of Korea
| | - Jeewhan Yoon
- Department of Management of Technology, Graduate School of Management of Technology, Korea University, Seongbuk-gu, Seoul, South Korea
| | - SungHwan Kim
- Department of Applied Statistics, Konkuk University, Gwangjin-gu, Seoul, Republic of Korea
| |
Collapse
|
17
|
Yu CY, Mitrofanova A. Mechanism-Centric Approaches for Biomarker Detection and Precision Therapeutics in Cancer. Front Genet 2021; 12:687813. [PMID: 34408770 PMCID: PMC8365516 DOI: 10.3389/fgene.2021.687813] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/28/2021] [Indexed: 12/18/2022] Open
Abstract
Biomarker discovery is at the heart of personalized treatment planning and cancer precision therapeutics, encompassing disease classification and prognosis, prediction of treatment response, and therapeutic targeting. However, many biomarkers represent passenger rather than driver alterations, limiting their utilization as functional units for therapeutic targeting. We suggest that identification of driver biomarkers through mechanism-centric approaches, which take into account upstream and downstream regulatory mechanisms, is fundamental to the discovery of functionally meaningful markers. Here, we examine computational approaches that identify mechanism-centric biomarkers elucidated from gene co-expression networks, regulatory networks (e.g., transcriptional regulation), protein-protein interaction (PPI) networks, and molecular pathways. We discuss their objectives, advantages over gene-centric approaches, and known limitations. Future directions highlight the importance of input and model interpretability, method and data integration, and the role of recently introduced technological advantages, such as single-cell sequencing, which are central for effective biomarker discovery and time-cautious precision therapeutics.
Collapse
Affiliation(s)
- Christina Y. Yu
- Department of Biomedical and Health Informatics, School of Health Professions, Rutgers, The State University of New Jersey, Newark, NJ, United States
| | - Antonina Mitrofanova
- Department of Biomedical and Health Informatics, School of Health Professions, Rutgers, The State University of New Jersey, Newark, NJ, United States
- Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, United States
| |
Collapse
|
18
|
Pyne S, Anand A. Rapid Reconstruction of Time-varying Gene Regulatory Networks with Limited Main Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1608-1619. [PMID: 31613774 DOI: 10.1109/tcbb.2019.2946826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reconstruction of time-varying gene regulatory networks underlying a time-series gene expression data is a fundamental challenge in the computational systems biology. The challenge increases multi-fold if the target networks need to be constructed for hundreds to thousands of genes. There have been constant efforts to design an algorithm that can perform the reconstruction task correctly as well as can scale efficiently (with respect to both time and memory) to such a large number of genes. However, the existing algorithms either do not offer time-efficiency, or they offer it at other costs - memory-inefficiency or imposition of a constraint, known as the 'smoothly time-varying assumption'. In this article, two novel algorithms - 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators - which is Light on memory' (TGS-Lite) and 'TGS-Lite Plus' (TGS-Lite+) - are proposed that are time-efficient, memory-efficient and do not impose the smoothly time-varying assumption. Additionally, they offer state-of-the-art reconstruction correctness as demonstrated with three benchmark datasets. Source Code: https://github.com/sap01/TGS-Lite-supplem/tree/master/sourcecode.
Collapse
|
19
|
Xu T, Ou-Yang L, Yan H, Zhang XF. Time-Varying Differential Network Analysis for Revealing Network Rewiring over Cancer Progression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1632-1642. [PMID: 31647444 DOI: 10.1109/tcbb.2019.2949039] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
To reveal how gene regulatory networks change over cancer development, multiple time-varying differential networks between adjacent cancer stages should be estimated simultaneously. Since the network rewiring may be driven by the perturbation of certain individual genes, there may be some hub nodes shared by these differential networks. Although several methods have been developed to estimate differential networks from gene expression data, most of them are designed for estimating a single differential network, which neglect the similarities between different differential networks. In this article, we propose a new Gaussian graphical model-based method to jointly estimate multiple time-varying differential networks for identifying network rewiring over cancer development. A D-trace loss is used to determine the differential networks. A tree-structured group Lasso penalty is designed to identify the common hub nodes shared by different differential networks and the specific hub nodes unique to individual differential networks. Simulation experiment results demonstrate that our method outperforms other state-of-the-art techniques in most cases. We also apply our method to The Cancer Genome Atlas data to explore gene network rewiring over different breast cancer stages. Hub nodes in the estimated differential networks rediscover well known genes associated with the development and progression of breast cancer.
Collapse
|
20
|
Jones DC, Ruzzo WL. Polee: RNA-Seq analysis using approximate likelihood. NAR Genom Bioinform 2021; 3:lqab046. [PMID: 34056596 PMCID: PMC8152449 DOI: 10.1093/nargab/lqab046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/11/2021] [Accepted: 05/11/2021] [Indexed: 12/20/2022] Open
Abstract
The analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving uncertainty necessitates a full probabilistic model of the all the sequencing reads, which quickly becomes intractable, as experiments can consist of billions of reads. To overcome these limitations, we propose a new method of approximating the likelihood function of a sparse mixture model, using a technique we call the Pólya tree transformation. We demonstrate that substituting this approximation for the real thing achieves most of the benefits with a fraction of the computational costs, leading to more accurate detection of differential transcript expression and transcript coexpression.
Collapse
Affiliation(s)
- Daniel C Jones
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Walter L Ruzzo
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA 98195-5065, USA
- Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., P.O. Box 19024, Seattle, WA 98109, USA
| |
Collapse
|
21
|
Vatsa D, Agarwal S. PEPN-GRN: A Petri net-based approach for the inference of gene regulatory networks from noisy gene expression data. PLoS One 2021; 16:e0251666. [PMID: 33989333 PMCID: PMC8121333 DOI: 10.1371/journal.pone.0251666] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 04/30/2021] [Indexed: 11/22/2022] Open
Abstract
The inference of gene regulatory networks (GRNs) from expression data is a challenging problem in systems biology. The stochasticity or fluctuations in the biochemical processes that regulate the transcription process poses as one of the major challenges. In this paper, we propose a novel GRN inference approach, named the Probabilistic Extended Petri Net for Gene Regulatory Network (PEPN-GRN), for the inference of gene regulatory networks from noisy expression data. The proposed inference approach makes use of transition of discrete gene expression levels across adjacent time points as different evidence types that relate to the production or decay of genes. The paper examines three variants of the PEPN-GRN method, which mainly differ by the way the scores of network edges are computed using evidence types. The proposed method is evaluated on the benchmark DREAM4 in silico data sets and a real time series data set of E. coli from the DREAM5 challenge. The PEPN-GRN_v3 variant (the third variant of the PEPN-GRN approach) sought to learn the weights of evidence types in accordance with their contribution to the activation and inhibition gene regulation process. The learned weights help understand the time-shifted and inverted time-shifted relationship between regulator and target gene. Thus, PEPN-GRN_v3, along with the inference of network edges, also provides a functional understanding of the gene regulation process.
Collapse
Affiliation(s)
- Deepika Vatsa
- Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
| | - Sumeet Agarwal
- Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
- * E-mail: ,
| |
Collapse
|
22
|
Zhang Y, Zhu L, Wang X. NEM-Tar: A Probabilistic Graphical Model for Cancer Regulatory Network Inference and Prioritization of Potential Therapeutic Targets From Multi-Omics Data. Front Genet 2021; 12:608042. [PMID: 33968127 PMCID: PMC8100334 DOI: 10.3389/fgene.2021.608042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 03/22/2021] [Indexed: 11/13/2022] Open
Abstract
Targeted therapy has been widely adopted as an effective treatment strategy to battle against cancer. However, cancers are not single disease entities, but comprising multiple molecularly distinct subtypes, and the heterogeneity nature prevents precise selection of patients for optimized therapy. Dissecting cancer subtype-specific signaling pathways is crucial to pinpointing dysregulated genes for the prioritization of novel therapeutic targets. Nested effects models (NEMs) are a group of graphical models that encode subset relations between observed downstream effects under perturbations to upstream signaling genes, providing a prototype for mapping the inner workings of the cell. In this study, we developed NEM-Tar, which extends the original NEMs to predict drug targets by incorporating causal information of (epi)genetic aberrations for signaling pathway inference. An information theory-based score, weighted information gain (WIG), was proposed to assess the impact of signaling genes on a specific downstream biological process of interest. Subsequently, we conducted simulation studies to compare three inference methods and found that the greedy hill-climbing algorithm demonstrated the highest accuracy and robustness to noise. Furthermore, two case studies were conducted using multi-omics data for colorectal cancer (CRC) and gastric cancer (GC) in the TCGA database. Using NEM-Tar, we inferred signaling networks driving the poor-prognosis subtypes of CRC and GC, respectively. Our model prioritized not only potential individual drug targets such as HER2, for which FDA-approved inhibitors are available but also the combinations of multiple targets potentially useful for the design of combination therapies.
Collapse
Affiliation(s)
- Yuchen Zhang
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Lina Zhu
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Xin Wang
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China.,Key Laboratory of Biochip Technology, Biotech and Health Centre, Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
23
|
Shojaie A. Differential Network Analysis: A Statistical Perspective. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2021; 13:e1508. [PMID: 37050915 PMCID: PMC10088462 DOI: 10.1002/wics.1508] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 03/03/2020] [Indexed: 11/06/2022]
Abstract
Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of complex diseases. They have also been used to gain insight into mechanisms of disease initiation and progression. Primarily motivated by biological applications, this article provides a review of recent statistical machine learning methods for inferring networks and identifying changes in their structures.
Collapse
Affiliation(s)
- Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle WA
| |
Collapse
|
24
|
Becker AK, Dörr M, Felix SB, Frost F, Grabe HJ, Lerch MM, Nauck M, Völker U, Völzke H, Kaderali L. From heterogeneous healthcare data to disease-specific biomarker networks: A hierarchical Bayesian network approach. PLoS Comput Biol 2021; 17:e1008735. [PMID: 33577591 PMCID: PMC7906470 DOI: 10.1371/journal.pcbi.1008735] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 02/25/2021] [Accepted: 01/22/2021] [Indexed: 01/26/2023] Open
Abstract
In this work, we introduce an entirely data-driven and automated approach to reveal disease-associated biomarker and risk factor networks from heterogeneous and high-dimensional healthcare data. Our workflow is based on Bayesian networks, which are a popular tool for analyzing the interplay of biomarkers. Usually, data require extensive manual preprocessing and dimension reduction to allow for effective learning of Bayesian networks. For heterogeneous data, this preprocessing is hard to automatize and typically requires domain-specific prior knowledge. We here combine Bayesian network learning with hierarchical variable clustering in order to detect groups of similar features and learn interactions between them entirely automated. We present an optimization algorithm for the adaptive refinement of such group Bayesian networks to account for a specific target variable, like a disease. The combination of Bayesian networks, clustering, and refinement yields low-dimensional but disease-specific interaction networks. These networks provide easily interpretable, yet accurate models of biomarker interdependencies. We test our method extensively on simulated data, as well as on data from the Study of Health in Pomerania (SHIP-TREND), and demonstrate its effectiveness using non-alcoholic fatty liver disease and hypertension as examples. We show that the group network models outperform available biomarker scores, while at the same time, they provide an easily interpretable interaction network. High-dimensional and heterogeneous healthcare data, such as electronic health records or epidemiological study data, contain much information on yet unknown risk factors that are associated with disease development. The identification of these risk factors may help to improve prevention, diagnosis, and therapy. Bayesian networks are powerful statistical models that can decipher these complex relationships. However, high dimensionality and heterogeneity of data, together with missing values and high feature correlation, make it difficult to automatically learn a good model from data. To facilitate the use of network models, we present a novel, fully automated workflow that combines network learning with hierarchical clustering. The algorithm reveals groups of strongly related features and models the interactions among those groups. It results in simpler network models that are easier to analyze. We introduce a method of adaptive refinement of such models to ensure that disease-relevant parts of the network are modeled in great detail. Our approach makes it easy to learn compact, accurate, and easily interpretable biomarker interaction networks. We test our method extensively on simulated data as well as data from the Study of Health in Pomerania (SHIP-Trend) by learning models of hypertension and non-alcoholic fatty liver disease.
Collapse
Affiliation(s)
- Ann-Kristin Becker
- Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany
| | - Marcus Dörr
- Department of Internal Medicine B, University Medicine Greifswald, Greifswald, Germany
- German Centre for Cardiovascular Research (DZHK), partner site Greifswald, Greifswald, Germany
| | - Stephan B. Felix
- Department of Internal Medicine B, University Medicine Greifswald, Greifswald, Germany
- German Centre for Cardiovascular Research (DZHK), partner site Greifswald, Greifswald, Germany
| | - Fabian Frost
- Department of Internal Medicine A, University Medicine Greifswald, Greifswald, Germany
| | - Hans J. Grabe
- Department of Psychiatry, University Medicine Greifswald, Greifswald, Germany
| | - Markus M. Lerch
- Department of Internal Medicine A, University Medicine Greifswald, Greifswald, Germany
| | - Matthias Nauck
- Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Uwe Völker
- Interfaculty Institute of Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Henry Völzke
- Institute of Community Medicine, SHIP/KEF, University Medicine Greifswald, Greifswald, Germany
| | - Lars Kaderali
- Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany
- * E-mail:
| |
Collapse
|
25
|
Manzour H, Küçükyavuz S, Wu HH, Shojaie A. Integer Programming for Learning Directed Acyclic Graphs from Continuous Data. INFORMS JOURNAL ON OPTIMIZATION 2021; 3:46-73. [PMID: 37051459 PMCID: PMC10088505 DOI: 10.1287/ijoo.2019.0040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a mathematical programming model that can naturally incorporate a superstructure to reduce the set of possible candidate DAGs. We use a negative log-likelihood score function with both [Formula: see text] and [Formula: see text] penalties and propose a new mixed-integer quadratic program, referred to as a layered network (LN) formulation. The LN formulation is a compact model that enjoys as tight an optimal continuous relaxation value as the stronger but larger formulations under a mild condition. Computational results indicate that the proposed formulation outperforms existing mathematical formulations and scales better than available algorithms that can solve the same problem with only [Formula: see text] regularization. In particular, the LN formulation clearly outperforms existing methods in terms of computational time needed to find an optimal DAG in the presence of a sparse superstructure.
Collapse
Affiliation(s)
- Hasan Manzour
- Department of Industrial and Systems Engineering, University of Washington, Seattle, Washington 98195
| | - Simge Küçükyavuz
- Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
| | - Hao-Hsiang Wu
- Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington 98195
| |
Collapse
|
26
|
Mignone P, Pio G, Džeroski S, Ceci M. Multi-task learning for the simultaneous reconstruction of the human and mouse gene regulatory networks. Sci Rep 2020; 10:22295. [PMID: 33339842 PMCID: PMC7749184 DOI: 10.1038/s41598-020-78033-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 10/29/2020] [Indexed: 12/31/2022] Open
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data, supported by machine learning approaches, has received increasing attention in recent years. The task at hand is to identify regulatory links between genes in a network. However, existing methods often suffer when the number of labeled examples is low or when no negative examples are available. In this paper we propose a multi-task method that is able to simultaneously reconstruct the human and the mouse GRNs using the similarities between the two. This is done by exploiting, in a transfer learning approach, possible dependencies that may exist among them. Simultaneously, we solve the issues arising from the limited availability of examples of links by relying on a novel clustering-based approach, able to estimate the degree of certainty of unlabeled examples of links, so that they can be exploited during the training together with the labeled examples. Our experiments show that the proposed method can reconstruct both the human and the mouse GRNs more effectively compared to reconstructing each network separately. Moreover, it significantly outperforms three state-of-the-art transfer learning approaches that, analogously to our method, can exploit the knowledge coming from both organisms. Finally, a specific robustness analysis reveals that, even when the number of labeled examples is very low with respect to the number of unlabeled examples, the proposed method is almost always able to outperform its single-task counterpart.
Collapse
Affiliation(s)
- Paolo Mignone
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy.
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy.,Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| |
Collapse
|
27
|
Nepomuceno-Chamorro IA, Nepomuceno JA, Galván-Rojas JL, Vega-Márquez B, Rubio-Escudero C. Using prior knowledge in the inference of gene association networks. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
28
|
Angelin-Bonnet O, Biggs PJ, Baldwin S, Thomson S, Vignes M. sismonr: simulation of in silico multi-omic networks with adjustable ploidy and post-transcriptional regulation in R. Bioinformatics 2020; 36:2938-2940. [PMID: 31960894 DOI: 10.1093/bioinformatics/btaa002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 12/11/2019] [Accepted: 01/17/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY We present sismonr, an R package for an integral generation and simulation of in silico biological systems. The package generates gene regulatory networks, which include protein-coding and non-coding genes along with different transcriptional and post-transcriptional regulations. The effect of genetic mutations on the system behaviour is accounted for via the simulation of genetically different in silico individuals. The ploidy of the system is not restricted to the usual haploid or diploid situations but can be defined by the user to higher ploidies. A choice of stochastic simulation algorithms allows us to simulate the expression profiles of the genes in the in silico system. We illustrate the use of sismonr by simulating the anthocyanin biosynthesis regulation pathway for three genetically distinct in silico plants. AVAILABILITY AND IMPLEMENTATION The sismonr package is implemented in R and Julia and is publicly available on the CRAN repository (https://CRAN.R-project.org/package=sismonr). A detailed tutorial is available from GitHub at https://oliviaab.github.io/sismonr/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Patrick J Biggs
- School of Fundamental Sciences.,School of Veterinary Science, Massey University, Palmerston North 4442, New Zealand
| | - Samantha Baldwin
- New Cultivar Innovation, The New Zealand Institute for Plant & Food Research Limited, Christchurch 8140, New Zealand
| | - Susan Thomson
- New Cultivar Innovation, The New Zealand Institute for Plant & Food Research Limited, Christchurch 8140, New Zealand
| | | |
Collapse
|
29
|
Thorvaldsen S, Hössjer O. Using statistical methods to model the fine-tuning of molecular machines and systems. J Theor Biol 2020; 501:110352. [PMID: 32505827 DOI: 10.1016/j.jtbi.2020.110352] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 05/26/2020] [Accepted: 05/27/2020] [Indexed: 10/24/2022]
Abstract
Fine-tuning has received much attention in physics, and it states that the fundamental constants of physics are finely tuned to precise values for a rich chemistry and life permittance. It has not yet been applied in a broad manner to molecular biology. However, in this paper we argue that biological systems present fine-tuning at different levels, e.g. functional proteins, complex biochemical machines in living cells, and cellular networks. This paper describes molecular fine-tuning, how it can be used in biology, and how it challenges conventional Darwinian thinking. We also discuss the statistical methods underpinning fine-tuning and present a framework for such analysis.
Collapse
Affiliation(s)
| | - Ola Hössjer
- Stockholm University, Dep. of Mathematics, Division of Mathematical Statistics, Sweden.
| |
Collapse
|
30
|
Agbleke AA, Amitai A, Buenrostro JD, Chakrabarti A, Chu L, Hansen AS, Koenig KM, Labade AS, Liu S, Nozaki T, Ovchinnikov S, Seeber A, Shaban HA, Spille JH, Stephens AD, Su JH, Wadduwage D. Advances in Chromatin and Chromosome Research: Perspectives from Multiple Fields. Mol Cell 2020; 79:881-901. [PMID: 32768408 PMCID: PMC7888594 DOI: 10.1016/j.molcel.2020.07.003] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 06/12/2020] [Accepted: 07/06/2020] [Indexed: 12/12/2022]
Abstract
Nucleosomes package genomic DNA into chromatin. By regulating DNA access for transcription, replication, DNA repair, and epigenetic modification, chromatin forms the nexus of most nuclear processes. In addition, dynamic organization of chromatin underlies both regulation of gene expression and evolution of chromosomes into individualized sister objects, which can segregate cleanly to different daughter cells at anaphase. This collaborative review shines a spotlight on technologies that will be crucial to interrogate key questions in chromatin and chromosome biology including state-of-the-art microscopy techniques, tools to physically manipulate chromatin, single-cell methods to measure chromatin accessibility, computational imaging with neural networks and analytical tools to interpret chromatin structure and dynamics. In addition, this review provides perspectives on how these tools can be applied to specific research fields such as genome stability and developmental biology and to test concepts such as phase separation of chromatin.
Collapse
Affiliation(s)
| | - Assaf Amitai
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jason D Buenrostro
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Aditi Chakrabarti
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Lingluo Chu
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Anders S Hansen
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Kristen M Koenig
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; JHDSF Program, Harvard University, Cambridge, MA 02138, USA
| | - Ajay S Labade
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - Sirui Liu
- FAS Division of Science, Harvard University, Cambridge, MA 02138, USA
| | - Tadasu Nozaki
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Sergey Ovchinnikov
- JHDSF Program, Harvard University, Cambridge, MA 02138, USA; FAS Division of Science, Harvard University, Cambridge, MA 02138, USA
| | - Andrew Seeber
- JHDSF Program, Harvard University, Cambridge, MA 02138, USA; Center for Advanced Imaging, Harvard University, Cambridge, MA 02138, USA.
| | - Haitham A Shaban
- Center for Advanced Imaging, Harvard University, Cambridge, MA 02138, USA; Spectroscopy Department, Physics Division, National Research Centre, Dokki, 12622 Cairo, Egypt
| | - Jan-Hendrik Spille
- Department of Physics, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Andrew D Stephens
- Biology Department, University of Massachusetts, Amherst, Amherst, MA 01003, USA
| | - Jun-Han Su
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Dushan Wadduwage
- JHDSF Program, Harvard University, Cambridge, MA 02138, USA; Center for Advanced Imaging, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
31
|
Bernal V, Bischoff R, Guryev V, Grzegorczyk M, Horvatovich P. Exact hypothesis testing for shrinkage-based Gaussian graphical models. Bioinformatics 2020; 35:5011-5017. [PMID: 31077287 PMCID: PMC6901079 DOI: 10.1093/bioinformatics/btz357] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/08/2019] [Accepted: 04/26/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION One of the main goals in systems biology is to learn molecular regulatory networks from quantitative profile data. In particular, Gaussian graphical models (GGMs) are widely used network models in bioinformatics where variables (e.g. transcripts, metabolites or proteins) are represented by nodes, and pairs of nodes are connected with an edge according to their partial correlation. Reconstructing a GGM from data is a challenging task when the sample size is smaller than the number of variables. The main problem consists in finding the inverse of the covariance estimator which is ill-conditioned in this case. Shrinkage-based covariance estimators are a popular approach, producing an invertible 'shrunk' covariance. However, a proper significance test for the 'shrunk' partial correlation (i.e. the GGM edges) is an open challenge as a probability density including the shrinkage is unknown. In this article, we present (i) a geometric reformulation of the shrinkage-based GGM, and (ii) a probability density that naturally includes the shrinkage parameter. RESULTS Our results show that the inference using this new 'shrunk' probability density is as accurate as Monte Carlo estimation (an unbiased non-parametric method) for any shrinkage value, while being computationally more efficient. We show on synthetic data how the novel test for significance allows an accurate control of the Type I error and outperforms the network reconstruction obtained by the widely used R package GeneNet. This is further highlighted in two gene expression datasets from stress response in Eschericha coli, and the effect of influenza infection in Mus musculus. AVAILABILITY AND IMPLEMENTATION https://github.com/V-Bernal/GGM-Shrinkage. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victor Bernal
- Bernoulli Institute, University of Groningen, Groningen AG, The Netherlands.,Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen AV, The Netherlands
| | - Marco Grzegorczyk
- Bernoulli Institute, University of Groningen, Groningen AG, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy
| |
Collapse
|
32
|
Holding AN, Cook HV, Markowetz F. Data generation and network reconstruction strategies for single cell transcriptomic profiles of CRISPR-mediated gene perturbations. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194441. [PMID: 31756390 DOI: 10.1016/j.bbagrm.2019.194441] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 10/01/2019] [Accepted: 10/01/2019] [Indexed: 02/05/2023]
Abstract
Recent advances in single-cell RNA-sequencing (scRNA-seq) in combination with CRISPR/Cas9 technologies have enabled the development of methods for large-scale perturbation studies with transcriptional readouts. These methods are highly scalable and have the potential to provide a wealth of information on the biological networks that underlie cellular response. Here we discuss how to overcome several key challenges to generate and analyse data for the confident reconstruction of models of the underlying cellular network. Some challenges are generic, and apply to analysing any single-cell transcriptomic data, while others are specific to combined single-cell CRISPR/Cas9 data, in particular barcode swapping, knockdown efficiency, multiplicity of infection and potential confounding factors. We also provide a curated collection of published data sets to aid the development of analysis strategies. Finally, we discuss several network reconstruction approaches, including co-expression networks and Bayesian networks, as well as their limitations, and highlight the potential of Nested Effects Models for network reconstruction from scRNA-seq data. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Andrew N Holding
- Department of Biology, University of York, York, UK; York Biomedical Research Institute, University of York, York, UK; CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, UK; The Alan Turing Institute, 96 Euston Road, Kings Cross, London, UK
| | - Helen V Cook
- Department of Biology, University of York, York, UK
| | | |
Collapse
|
33
|
Abstract
Making decisions on how best to treat cancer patients requires the integration of different data sets, including genomic profiles, tumour histopathology, radiological images, proteomic analysis and more. This wealth of biological information calls for novel strategies to integrate such information in a meaningful, predictive and experimentally verifiable way. In this Perspective we explain how executable computational models meet this need. Such models provide a means for comprehensive data integration, can be experimentally validated, are readily interpreted both biologically and clinically, and have the potential to predict effective therapies for different cancer types and subtypes. We explain what executable models are and how they can be used to represent the dynamic biological behaviours inherent in cancer, and demonstrate how such models, when coupled with automated reasoning, facilitate our understanding of the mechanisms by which oncogenic signalling pathways regulate tumours. We explore how executable models have impacted the field of cancer research and argue that extending them to represent a tumour in a specific patient (that is, an avatar) will pave the way for improved personalized treatments and precision medicine. Finally, we highlight some of the ongoing challenges in developing executable models and stress that effective cross-disciplinary efforts are key to forward progress in the field.
Collapse
Affiliation(s)
- Matthew A Clarke
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Jasmin Fisher
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
- UCL Cancer Institute, University College London, London, UK.
| |
Collapse
|
34
|
Shah RD, Peters J. The hardness of conditional independence testing and the generalised covariance measure. Ann Stat 2020. [DOI: 10.1214/19-aos1857] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
35
|
Córdoba I, Bielza C, Larrañaga P. A review of Gaussian Markov models for conditional independence. J Stat Plan Inference 2020. [DOI: 10.1016/j.jspi.2019.09.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
36
|
Sverchkov Y, Ho YH, Gasch A, Craven M. Context-Specific Nested Effects Models. J Comput Biol 2020; 27:403-417. [PMID: 32053004 DOI: 10.1089/cmb.2019.0459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Advances in systems biology have made clear the importance of network models for capturing knowledge about complex relationships in gene regulation, metabolism, and cellular signaling. A common approach to uncovering biological networks involves performing perturbations on elements of the network, such as gene knockdown experiments, and measuring how the perturbation affects some reporter of the process under study. In this article, we develop context-specific nested effects models (CSNEMs), an approach to inferring such networks that generalizes nested effects models (NEMs). The main contribution of this work is that CSNEMs explicitly model the participation of a gene in multiple contexts, meaning that a gene can appear in multiple places in the network. Biologically, the representation of regulators in multiple contexts may indicate that these regulators have distinct roles in different cellular compartments or cell cycle phases. We present an evaluation of the method on simulated data as well as on data from a study of the sodium chloride stress response in Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Yuriy Sverchkov
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin
| | - Yi-Hsuan Ho
- Department of Genetics, University of Wisconsin-Madison, Madison, Wisconsin
| | - Audrey Gasch
- Department of Genetics, University of Wisconsin-Madison, Madison, Wisconsin
| | - Mark Craven
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
37
|
Wu Y, Li T, Liu X, Chen L. Differential network inference via the fused D-trace loss with cross variables. Electron J Stat 2020. [DOI: 10.1214/20-ejs1691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
38
|
Pyne S, Kumar AR, Anand A. Rapid Reconstruction of Time-Varying Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:278-291. [PMID: 30072338 DOI: 10.1109/tcbb.2018.2861698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Rapid advancements in high-throughput technologies have resulted in genome-scale time series datasets. Uncovering the temporal sequence of gene regulatory events, in the form of time-varying gene regulatory networks (GRNs), demands computationally fast, accurate, and scalable algorithms. The existing algorithms can be divided into two categories: ones that are time-intensive and hence unscalable; and others that impose structural constraints to become scalable. In this paper, a novel algorithm, namely 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators' (TGS), is proposed. TGS is time-efficient and does not impose any structural constraints. Moreover, it provides such flexibility and time-efficiency, without losing its accuracy. TGS consistently outperforms the state-of-the-art algorithms in true positive detection, on three benchmark synthetic datasets. However, TGS does not perform as well in false positive rejection. To mitigate this issue, TGS+ is proposed. TGS+ demonstrates competitive false positive rejection power, while maintaining the superior speed and true positive detection power of TGS. Nevertheless, the main memory requirements of both TGS variants grow exponentially with the number of genes, which they tackle by restricting the maximum number of regulators for each gene. Relaxing this restriction remains a challenge as the actual number of regulators is not known a priori.
Collapse
|
39
|
Ghanbari M, Lasserre J, Vingron M. The Distance Precision Matrix: computing networks from non-linear relationships. Bioinformatics 2019; 35:1009-1017. [PMID: 30165509 PMCID: PMC6420154 DOI: 10.1093/bioinformatics/bty724] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 06/12/2018] [Accepted: 08/23/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation Full-order partial correlation, a fundamental approach for network reconstruction, e.g. in the context of gene regulation, relies on the precision matrix (the inverse of the covariance matrix) as an indicator of which variables are directly associated. The precision matrix assumes Gaussian linear data and its entries are zero for pairs of variables that are independent given all other variables. However, there is still very little theory on network reconstruction under the assumption of non-linear interactions among variables. Results We propose Distance Precision Matrix, a network reconstruction method aimed at both linear and non-linear data. Like partial distance correlation, it builds on distance covariance, a measure of possibly non-linear association, and on the idea of full-order partial correlation, which allows to discard indirect associations. We provide evidence that the Distance Precision Matrix method can successfully compute networks from linear and non-linear data, and consistently so across different datasets, even if sample size is low. The method is fast enough to compute networks on hundreds of nodes. Availability and implementation An R package DPM is available at https://github.molgen.mpg.de/ghanbari/DPM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahsa Ghanbari
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany
| | - Julia Lasserre
- Zalando Research, Mühlenstr. 25, D-10243 Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany
| |
Collapse
|
40
|
|
41
|
Schubert M, Colomé-Tatché M, Foijer F. Gene networks in cancer are biased by aneuploidies and sample impurities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194444. [PMID: 31654805 DOI: 10.1016/j.bbagrm.2019.194444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/05/2019] [Accepted: 10/14/2019] [Indexed: 12/14/2022]
Abstract
Gene regulatory network inference is a standard technique for obtaining structured regulatory information from, for instance, gene expression measurements. Methods performing this task have been extensively evaluated on synthetic, and to a lesser extent real data sets. In contrast to these test evaluations, applications to gene expression data of human cancers are often limited by fewer samples and more potential regulatory links, and are biased by copy number aberrations as well as cell mixtures and sample impurities. Here, we take networks inferred from TCGA cohorts as an example to show that (1) transcription factor annotations are essential to obtain reliable networks, and (2) even for state of the art methods, we expect that between 20 and 80% of edges are caused by copy number changes and cell mixtures rather than transcription factor regulation.
Collapse
Affiliation(s)
- Michael Schubert
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany.
| | - Maria Colomé-Tatché
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Floris Foijer
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands.
| |
Collapse
|
42
|
Tan QW, Mutwil M. Inferring biosynthetic and gene regulatory networks from Artemisia annua RNA sequencing data on a credit card-sized ARM computer. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194429. [PMID: 31634636 DOI: 10.1016/j.bbagrm.2019.194429] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/06/2019] [Accepted: 09/06/2019] [Indexed: 02/05/2023]
Abstract
Prediction of gene function and gene regulatory networks is one of the most active topics in bioinformatics. The accumulation of publicly available gene expression data for hundreds of plant species, together with advances in bioinformatical methods and affordable computing, sets ingenuity as one of the major bottlenecks in understanding gene function and regulation. Here, we show how a credit card-sized computer retailing for <50 USD can be used to rapidly predict gene function and infer regulatory networks from RNA sequencing data. To achieve this, we constructed a bioinformatical pipeline that downloads and allows quality-control of RNA sequencing data; and generates a gene co-expression network that can reveal enzymes and transcription factors participating and controlling a given biosynthetic pathway. We exemplify this by first identifying genes and transcription factors involved in the biosynthesis of secondary cell wall in the plant Artemisia annua, the main natural source of the anti-malarial drug artemisinin. Networks were then used to dissect the artemisinin biosynthesis pathway, which suggest potential transcription factors regulating artemisinin biosynthesis. We provide the source code of our pipeline (https://github.com/mutwil/LSTrAP-Lite) and envision that the ubiquity of affordable computing, availability of biological data and increased bioinformatical training of biologists will transform the field of bioinformatics. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Qiao Wen Tan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore.
| |
Collapse
|
43
|
A Statistical Test for Differential Network Analysis Based on Inference of Gaussian Graphical Model. Sci Rep 2019; 9:10863. [PMID: 31350445 PMCID: PMC6659630 DOI: 10.1038/s41598-019-47362-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Accepted: 07/15/2019] [Indexed: 11/09/2022] Open
Abstract
Differential network analysis investigates how the network of connected genes changes from one condition to another and has become a prevalent tool to provide a deeper and more comprehensive understanding of the molecular etiology of complex diseases. Based on the asymptotically normal estimation of large Gaussian graphical model (GGM) in the high-dimensional setting, we developed a computationally efficient test for differential network analysis through testing the equality of two precision matrices, which summarize the conditional dependence network structures of the genes. Additionally, we applied a multiple testing procedure to infer the differential network structure with false discovery rate (FDR) control. Through extensive simulation studies with different combinations of parameters including sample size, number of vertices, level of heterogeneity and graph structure, we demonstrated that our method performed much better than the current available methods in terms of accuracy and computational time. In real data analysis on lung adenocarcinoma, we revealed a differential network with 3503 nodes and 2550 edges, which consisted of 50 clusters with an FDR threshold at 0.05. Many of the top gene pairs in the differential network have been reported relevant to human cancers. Our method represents a powerful tool of network analysis for high-dimensional biological data.
Collapse
|
44
|
He S, Deng M. Direct interaction network and differential network inference from compositional data via lasso penalized D-trace loss. PLoS One 2019; 14:e0207731. [PMID: 31339885 PMCID: PMC6655598 DOI: 10.1371/journal.pone.0207731] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 07/02/2019] [Indexed: 11/30/2022] Open
Abstract
The development of high-throughput sequencing technologies for 16S rRNA gene profiling provides higher quality compositional data for microbe communities. Inferring the direct interaction network under a specific condition and understanding how the network structure changes between two different environmental or genetic conditions are two important topics in biological studies. However, the compositional nature and high dimensionality of the data are challenging in the context of network and differential network recovery. To address this problem in the present paper, we proposed two new loss functions to incorporate the data transformations developed for compositional data analysis into D-trace loss for network and differential network estimation, respectively. The sparse matrix estimators are defined as the minimizer of the corresponding lasso penalized loss. Our method is characterized by its straightforward application based on the ADMM algorithm for numerical solution. Simulations show that the proposed method outperforms other state-of-the-art methods in network and differential network inference under different scenarios. Finally, as an illustration, our method is applied to a mouse skin microbiome data.
Collapse
Affiliation(s)
- Shun He
- School of Mathematical Sciences, Peking University, Beijing, 10087, P.R. China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, 10087, P.R. China
- Center for Statistical Science, Peking University, Beijing, 10087, P.R. China
- * E-mail:
| |
Collapse
|
45
|
Yu H, Blair RH. Integration of probabilistic regulatory networks into constraint-based models of metabolism with applications to Alzheimer's disease. BMC Bioinformatics 2019; 20:386. [PMID: 31291905 PMCID: PMC6617954 DOI: 10.1186/s12859-019-2872-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 05/02/2019] [Indexed: 01/08/2023] Open
Abstract
Background Mathematical models of biological networks can provide important predictions and insights into complex disease. Constraint-based models of cellular metabolism and probabilistic models of gene regulatory networks are two distinct areas that have progressed rapidly in parallel over the past decade. In principle, gene regulatory networks and metabolic networks underly the same complex phenotypes and diseases. However, systematic integration of these two model systems remains a fundamental challenge. Results In this work, we address this challenge by fusing probabilistic models of gene regulatory networks into constraint-based models of metabolism. The novel approach utilizes probabilistic reasoning in BN models of regulatory networks serves as the “glue” that enables a natural interface between the two systems. Probabilistic reasoning is used to predict and quantify system-wide effects of perturbation to the regulatory network in the form of constraints for flux variability analysis. In this setting, both regulatory and metabolic networks inherently account for uncertainty. Applications leverage constraint-based metabolic models of brain metabolism and gene regulatory networks parameterized by gene expression data from the hippocampus to investigate the role of the HIF-1 pathway in Alzheimer’s disease. Integrated models support HIF-1A as effective target to reduce the effects of hypoxia in Alzheimer’s disease. However, HIF-1A activation is far less effective in shifting metabolism when compared to brain metabolism in healthy controls. Conclusions The direct integration of probabilistic regulatory networks into constraint-based models of metabolism provides novel insights into how perturbations in the regulatory network may influence metabolic states. Predictive modeling of enzymatic activity can be facilitated using probabilistic reasoning, thereby extending the predictive capacity of the network. This framework for model integration is generalizable to other systems. Electronic supplementary material The online version of this article (10.1186/s12859-019-2872-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Han Yu
- State University of New York at Buffalo, 3435 Main Street, Buffalo, 14214, US
| | | |
Collapse
|
46
|
McDavid A, Gottardo R, Simon N, Drton M. GRAPHICAL MODELS FOR ZERO-INFLATED SINGLE CELL GENE EXPRESSION. Ann Appl Stat 2019; 13:848-873. [PMID: 31388390 PMCID: PMC6684253 DOI: 10.1214/18-aoas1213] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene co-regulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods; or in bulk data sets. An R implementation is available at https://github.com/amcdavid/HurdleNormal.
Collapse
Affiliation(s)
- Andrew McDavid
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center; Rochester, New York
| | - Raphael Gottardo
- Vaccine and Infectuous Disease Division, Fred Hutchinson Cancer Research Center
- Department of Statistic, University of Washington; Seattle, Washington
| | - Noah Simon
- Department of Biostatistics, University of Washington; Seattle, Washington
| | - Mathias Drton
- Department of Statistic, University of Washington; Seattle, Washington
- Department of Mathematical Sciences, University of Copenhagen; Denmark
| |
Collapse
|
47
|
Wang C, Gao F, Giannakis GB, D'Urso G, Cai X. Efficient proximal gradient algorithm for inference of differential gene networks. BMC Bioinformatics 2019; 20:224. [PMID: 31046666 PMCID: PMC6498668 DOI: 10.1186/s12859-019-2749-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Accepted: 03/18/2019] [Indexed: 02/07/2023] Open
Abstract
Background Gene networks in living cells can change depending on various conditions such as caused by different environments, tissue types, disease states, and development stages. Identifying the differential changes in gene networks is very important to understand molecular basis of various biological process. While existing algorithms can be used to infer two gene networks separately from gene expression data under two different conditions, and then to identify network changes, such an approach does not exploit the similarity between two gene networks, and it is thus suboptimal. A desirable approach would be clearly to infer two gene networks jointly, which can yield improved estimates of network changes. Results In this paper, we developed a proximal gradient algorithm for differential network (ProGAdNet) inference, that jointly infers two gene networks under different conditions and then identifies changes in the network structure. Computer simulations demonstrated that our ProGAdNet outperformed existing algorithms in terms of inference accuracy, and was much faster than a similar approach for joint inference of gene networks. Gene expression data of breast tumors and normal tissues in the TCGA database were analyzed with our ProGAdNet, and revealed that 268 genes were involved in the changed network edges. Gene set enrichment analysis identified a significant number of gene sets related to breast cancer or other types of cancer that are enriched in this set of 268 genes. Network analysis of the kidney cancer data in the TCGA database with ProGAdNet also identified a set of genes involved in network changes, and the majority of the top genes identified have been reported in the literature to be implicated in kidney cancer. These results corroborated that the gene sets identified by ProGAdNet were very informative about the cancer disease status. A software package implementing the ProGAdNet, computer simulations, and real data analysis is available as Additional file 1. Conclusion With its superior performance over existing algorithms, ProGAdNet provides a valuable tool for finding changes in gene networks, which may aid the discovery of gene-gene interactions changed under different conditions. Electronic supplementary material The online version of this article (10.1186/s12859-019-2749-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chen Wang
- Department of Electrical and Computer Engineering, University of Miami, 1251 Memorial Drive, Coral Gables, 33146, FL, USA
| | - Feng Gao
- Department of Electrical and Computer Engineering, University of Miami, 1251 Memorial Drive, Coral Gables, 33146, FL, USA
| | - Georgios B Giannakis
- Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, 55455, MN, USA
| | - Gennaro D'Urso
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, 33136, FL, USA
| | - Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, 1251 Memorial Drive, Coral Gables, 33146, FL, USA. .,Sylvester Comprehensive Cancer Center, University of Miami, Miami, 33136, FL, USA.
| |
Collapse
|
48
|
de Campos LM, Cano A, Castellano JG, Moral S. Combining gene expression data and prior knowledge for inferring gene regulatory networks via Bayesian networks using structural restrictions. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0042. [PMID: 31042646 DOI: 10.1515/sagmb-2018-0042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Gene Regulatory Networks (GRNs) are known as the most adequate instrument to provide a clear insight and understanding of the cellular systems. One of the most successful techniques to reconstruct GRNs using gene expression data is Bayesian networks (BN) which have proven to be an ideal approach for heterogeneous data integration in the learning process. Nevertheless, the incorporation of prior knowledge has been achieved by using prior beliefs or by using networks as a starting point in the search process. In this work, the utilization of different kinds of structural restrictions within algorithms for learning BNs from gene expression data is considered. These restrictions will codify prior knowledge, in such a way that a BN should satisfy them. Therefore, one aim of this work is to make a detailed review on the use of prior knowledge and gene expression data to inferring GRNs from BNs, but the major purpose in this paper is to research whether the structural learning algorithms for BNs from expression data can achieve better outcomes exploiting this prior knowledge with the use of structural restrictions. In the experimental study, it is shown that this new way to incorporate prior knowledge leads us to achieve better reverse-engineered networks.
Collapse
Affiliation(s)
- Luis M de Campos
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Andrés Cano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Javier G Castellano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Serafín Moral
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|
49
|
Pacini C, Koziol MJ. Bioinformatics challenges and perspectives when studying the effect of epigenetic modifications on alternative splicing. Philos Trans R Soc Lond B Biol Sci 2019; 373:rstb.2017.0073. [PMID: 29685977 PMCID: PMC5915717 DOI: 10.1098/rstb.2017.0073] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/14/2017] [Indexed: 02/07/2023] Open
Abstract
It is widely known that epigenetic modifications are important in regulating transcription, but several have also been reported in alternative splicing. The regulation of pre-mRNA splicing is important to explain proteomic diversity and the misregulation of splicing has been implicated in many diseases. Here, we give a brief overview of the role of epigenetics in alternative splicing and disease. We then discuss the bioinformatics methods that can be used to model interactions between epigenetic marks and regulators of splicing. These models can be used to identify alternative splicing and epigenetic changes across different phenotypes. This article is part of a discussion meeting issue ‘Frontiers in epigenetic chemical biology’.
Collapse
Affiliation(s)
- Clare Pacini
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Magdalena J Koziol
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK .,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| |
Collapse
|
50
|
Siahpirani AF, Chasman D, Roy S. Integrative Approaches for Inference of Genome-Scale Gene Regulatory Networks. Methods Mol Biol 2019; 1883:161-194. [PMID: 30547400 DOI: 10.1007/978-1-4939-8882-2_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Transcriptional regulatory networks specify the regulatory proteins of target genes that control the context-specific expression levels of genes. With our ability to profile the different types of molecular components of cells under different conditions, we are now uniquely positioned to infer regulatory networks in diverse biological contexts such as different cell types, tissues, and time points. In this chapter, we cover two main classes of computational methods to integrate different types of information to infer genome-scale transcriptional regulatory networks. The first class of methods focuses on integrative methods for specifically inferring connections between transcription factors and target genes by combining gene expression data with regulatory edge-specific knowledge. The second class of methods integrates upstream signaling networks with transcriptional regulatory networks by combining gene expression data with protein-protein interaction networks and proteomic datasets. We conclude with a section on practical applications of a network inference algorithm to infer a genome-scale regulatory network.
Collapse
Affiliation(s)
- Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.,Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|