1
|
Zenere A, Rundquist O, Gustafsson M, Altafini C. Multi-omics protein-coding units as massively parallel Bayesian networks: empirical validation of causality structure. iScience 2022; 25:104048. [PMID: 35355520 PMCID: PMC8958332 DOI: 10.1016/j.isci.2022.104048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 01/17/2022] [Accepted: 03/08/2022] [Indexed: 11/29/2022] Open
Abstract
In this article we use high-throughput epigenomics, transcriptomics, and proteomics data to construct fine-graded models of the “protein-coding units” gathering all transcript isoforms and chromatin accessibility peaks associated with more than 4000 genes in humans. Each protein-coding unit has the structure of a directed acyclic graph (DAG) and can be represented as a Bayesian network. The factorization of the joint probability distribution induced by the DAGs imposes a number of conditional independence relationships among the variables forming a protein-coding unit, corresponding to the missing edges in the DAGs. We show that a large fraction of these conditional independencies are indeed verified by the data. Factors driving this verification appear to be the structural and functional annotation of the transcript isoforms, as well as a notion of structural balance (or frustration-free) of the corresponding sample correlation graph, which naturally leads to reduction of correlation (and hence to independence) upon conditioning. Protein coding unit: DAG associated with epigenetic and gene information of a protein DAGs correspond to Bayesian networks Edge absence on a DAG corresponds to conditional independence Multi-omics data (ATAC-seq, RNA-seq and mass-spec) are used for DAG validation
Collapse
|
2
|
Zenere A, Rundquist O, Gustafsson M, Altafini C. Using high-throughput multi-omics data to investigate structural balance in elementary gene regulatory network motifs. Bioinformatics 2021; 38:173-178. [PMID: 34383882 PMCID: PMC8696094 DOI: 10.1093/bioinformatics/btab577] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 07/04/2021] [Accepted: 08/10/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The simultaneous availability of ATAC-seq and RNA-seq experiments allows to obtain a more in-depth knowledge on the regulatory mechanisms occurring in gene regulatory networks. In this article, we highlight and analyze two novel aspects that leverage on the possibility of pairing RNA-seq and ATAC-seq data. Namely we investigate the causality of the relationships between transcription factors, chromatin and target genes and the internal consistency between the two omics, here measured in terms of structural balance in the sample correlations along elementary length-3 cycles. RESULTS We propose a framework that uses the a priori knowledge on the data to infer elementary causal regulatory motifs (namely chains and forks) in the network. It is based on the notions of conditional independence and partial correlation, and can be applied to both longitudinal and non-longitudinal data. Our analysis highlights a strong connection between the causal regulatory motifs that are selected by the data and the structural balance of the underlying sample correlation graphs: strikingly, >97% of the selected regulatory motifs belong to a balanced subgraph. This result shows that internal consistency, as measured by structural balance, is close to a necessary condition for 3-node regulatory motifs to satisfy causality rules. AVAILABILITY AND IMPLEMENTATION The analysis was carried out in MATLAB and the code can be found at https://github.com/albertozenere/Multi-omics-elementary-regulatory-motifs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alberto Zenere
- Division of Automatic Control, Department of Electrical Engineering, Linköping University, SE-58183 Linköping, Sweden
| | - Olof Rundquist
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-58183 Linköping, Sweden
| | - Mika Gustafsson
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-58183 Linköping, Sweden
| | | |
Collapse
|
3
|
Iliopoulos A, Beis G, Apostolou P, Papasotiriou I. Complex Networks, Gene Expression and Cancer Complexity: A Brief Review of Methodology and Applications. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017093504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
In this brief survey, various aspects of cancer complexity and how this complexity can
be confronted using modern complex networks’ theory and gene expression datasets, are described.
In particular, the causes and the basic features of cancer complexity, as well as the challenges
it brought are underlined, while the importance of gene expression data in cancer research
and in reverse engineering of gene co-expression networks is highlighted. In addition, an introduction
to the corresponding theoretical and mathematical framework of graph theory and complex
networks is provided. The basics of network reconstruction along with the limitations of gene
network inference, the enrichment and survival analysis, evolution, robustness-resilience and cascades
in complex networks, are described. Finally, an indicative and suggestive example of a cancer
gene co-expression network inference and analysis is given.
Collapse
Affiliation(s)
- A.C. Iliopoulos
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - G. Beis
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - P. Apostolou
- Research and Development Department, Research Genetic Cancer Centre S.A., Florina, Greece
| | - I. Papasotiriou
- Research Genetic Cancer Centre International GmbH, Zug, Switzerland
| |
Collapse
|
4
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. (Differential) Co-Expression Analysis of Gene Expression: A Survey of Best Practices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1154-1173. [PMID: 30668502 DOI: 10.1109/tcbb.2019.2893170] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in gene expression data analysis in terms of analysis of (differential) co-expression, co-expression network, differential networking, and differential connectivity considering both microarray and RNA-seq data along with comparisons. We highlight hurdles in RNA-seq data analysis using methods developed for microarrays. We include discussion of necessary tools for gene expression analysis throughout the paper. In addition, we shed light on scRNA-seq data analysis by including preprocessing and scRNA-seq in co-expression analysis along with useful tools specific to scRNA-seq. To get insights, biological interpretation and functional profiling is included. Finally, we provide guidelines for the analyst, along with research issues and challenges which should be addressed.
Collapse
|
5
|
Rago A, Werren JH, Colbourne JK. Sex biased expression and co-expression networks in development, using the hymenopteran Nasonia vitripennis. PLoS Genet 2020; 16:e1008518. [PMID: 31986136 PMCID: PMC7004391 DOI: 10.1371/journal.pgen.1008518] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 02/06/2020] [Accepted: 11/13/2019] [Indexed: 12/17/2022] Open
Abstract
Sexual dimorphism requires regulation of gene expression in developing organisms. These developmental differences are caused by differential expression of genes and isoforms. The effect of expressing a gene is also influenced by which other genes are simultaneously expressed (functional interactions). However, few studies have described how these processes change across development. We compare the dynamics of differential expression, isoform switching and functional interactions in the sexual development of the model parasitoid wasp Nasonia vitripennis, a system that permits genome wide analysis of sex bias from early embryos to adults. We find relatively little sex-bias in embryos and larvae at the gene level, but several sub-networks show sex-biased functional interactions in early developmental stages. These networks provide new candidates for hymenopteran sex determination, including histone modification. In contrast, sex-bias in pupae and adults is driven by the differential expression of genes. We observe sex-biased isoform switching consistently across development, but mostly in genes that are already differentially expressed. Finally, we discover that sex-biased networks are enriched by genes specific to the Nasonia clade, and that those genes possess the topological properties of key regulators. These findings suggest that regulators in sex-biased networks evolve more rapidly than regulators of other developmental networks.
Collapse
Affiliation(s)
- Alfredo Rago
- School of Biosciences, The University of Birmingham, Birmingham, United Kingdom
| | - John H. Werren
- Department of Biology, University of Rochester, Rochester, NY, United States of America
| | - John K. Colbourne
- School of Biosciences, The University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
6
|
Mercatelli D, Scalambra L, Triboli L, Ray F, Giorgi FM. Gene regulatory network inference resources: A practical overview. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194430. [PMID: 31678629 DOI: 10.1016/j.bbagrm.2019.194430] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/06/2019] [Accepted: 09/09/2019] [Indexed: 02/08/2023]
Abstract
Transcriptional regulation is a fundamental molecular mechanism involved in almost every aspect of life, from homeostasis to development, from metabolism to behavior, from reaction to stimuli to disease progression. In recent years, the concept of Gene Regulatory Networks (GRNs) has grown popular as an effective applied biology approach for describing the complex and highly dynamic set of transcriptional interactions, due to its easy-to-interpret features. Since cataloguing, predicting and understanding every GRN connection in all species and cellular contexts remains a great challenge for biology, researchers have developed numerous tools and methods to infer regulatory processes. In this review, we catalogue these methods in six major areas, based on the dominant underlying information leveraged to infer GRNs: Coexpression, Sequence Motifs, Chromatin Immunoprecipitation (ChIP), Orthology, Literature and Protein-Protein Interaction (PPI) specifically focused on transcriptional complexes. The methods described here cover a wide range of user-friendliness: from web tools that require no prior computational expertise to command line programs and algorithms for large scale GRN inferences. Each method for GRN inference described herein effectively illustrates a type of transcriptional relationship, with many methods being complementary to others. While a truly holistic approach for inferring and displaying GRNs remains one of the greatest challenges in the field of systems biology, we believe that the integration of multiple methods described herein provides an effective means with which experimental and computational biologists alike may obtain the most complete pictures of transcriptional relationships. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Laura Scalambra
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Luca Triboli
- Centre for Integrative Biology (CIBIO), University of Trento, Italy
| | - Forest Ray
- Department of Systems Biology, Columbia University Medical Center, New York, NY, United States
| | - Federico M Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| |
Collapse
|
7
|
Oliveira GB, Regitano LCA, Cesar ASM, Reecy JM, Degaki KY, Poleti MD, Felício AM, Koltes JE, Coutinho LL. Integrative analysis of microRNAs and mRNAs revealed regulation of composition and metabolism in Nelore cattle. BMC Genomics 2018; 19:126. [PMID: 29415651 PMCID: PMC5804041 DOI: 10.1186/s12864-018-4514-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 01/31/2018] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND The amount of intramuscular fat can influence the sensory characteristics and nutritional value of beef, thus the selection of animals with adequate fat deposition is important to the consumer. There is growing knowledge about the genes and pathways that control the biological processes involved in fat deposition in muscle. MicroRNAs (miRNAs) belong to a well-conserved class of non-coding small RNAs that modulate gene expression across a range of biological functions in animal development and physiology. The aim of this study was to identify differentially expressed (DE) miRNAs, regulatory candidate genes and co-expression networks related to intramuscular fat (IMF) deposition. To achieve this, we used mRNA and miRNA expression data from the Longissimus dorsi muscle of 30 Nelore steers with high (H) and low (L) genomic estimated breeding values (GEBV) for IMF deposition. RESULTS Differential miRNA expression analysis between animals with extreme GEBV values for IMF identified six DE miRNAs (FDR 10%). Functional annotation of the target genes for these microRNAs indicated that the PPARs signaling pathway is involved with IMF deposition. Candidate regulatory genes such as SDHAF4, FBXO17, ALDOA and PKM were identified by partial correlation with information theory (PCIT), phenotypic impact factor (PIF) and regulatory impact factor (RIF) co-expression approaches from integrated miRNA-mRNA expression data. Two DE miRNAs (FDR 10%), bta-miR-143 and bta-miR-146b, which were upregulated in the Low IMF group, were correlated with regulatory candidate genes, which were functionally enriched for fatty acid oxidation GO terms. Co-expression patterns obtained by weighted correlation network analysis (WGCNA), which showed possible interaction and regulation between mRNAs and miRNAs, identified several modules related to immune system function, protein metabolism, energy metabolism and glucose catabolism according to in silico analysis performed herein. CONCLUSION In this study, several genes and miRNAs were identified as candidate regulators of IMF by analyzing DE miRNAs using two different miRNA-mRNA co-expression network methods. This study contributes to the understanding of potential regulatory mechanisms of gene signaling networks involved in fat deposition processes measured in muscle. Glucose metabolism and inflammation processes were the main pathways found in silico to influence intramuscular fat deposition in beef cattle in the integrative mRNA-miRNA co-expression analysis.
Collapse
Affiliation(s)
- Gabriella B. Oliveira
- Department of Animal Science, University of São Paulo, Piracicaba, SP 13418-900 Brazil
| | | | - Aline S. M. Cesar
- Department of Animal Science, University of São Paulo, Piracicaba, SP 13418-900 Brazil
| | - James M. Reecy
- Department of Animal Science, Iowa State University, Ames, IA 50011 USA
| | - Karina Y. Degaki
- Department of Animal Science, University of São Paulo, Piracicaba, SP 13418-900 Brazil
| | - Mirele D. Poleti
- Department of Animal Science, University of São Paulo, Piracicaba, SP 13418-900 Brazil
| | - Andrezza M. Felício
- Department of Animal Science, University of São Paulo, Piracicaba, SP 13418-900 Brazil
| | - James E. Koltes
- Department of Animal Science, University of Arkansas, Fayetteville, AR 72701 USA
| | - Luiz L. Coutinho
- Department of Animal Science, University of São Paulo, Piracicaba, SP 13418-900 Brazil
| |
Collapse
|
8
|
Bottje W, Kong BW, Reverter A, Waardenberg AJ, Lassiter K, Hudson NJ. Progesterone signalling in broiler skeletal muscle is associated with divergent feed efficiency. BMC SYSTEMS BIOLOGY 2017; 11:29. [PMID: 28235404 PMCID: PMC5324283 DOI: 10.1186/s12918-017-0396-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 01/16/2017] [Indexed: 01/08/2023]
Abstract
Background We contrast the pectoralis muscle transcriptomes of broilers selected from within a single genetic line expressing divergent feed efficiency (FE) in an effort to improve our understanding of the mechanistic basis of FE. Results Application of a virtual muscle model to gene expression data pointed to a coordinated reduction in slow twitch muscle isoforms of the contractile apparatus (MYH15, TPM3, MYOZ2, TNNI1, MYL2, MYOM3, CSRP3, TNNT2), consistent with diminishment in associated slow machinery (myoglobin and phospholamban) in the high FE animals. These data are in line with the repeated transition from red slow to white fast muscle fibres observed in agricultural species selected on mass and FE. Surprisingly, we found that the expression of 699 genes encoding the broiler mitoproteome is modestly–but significantly–biased towards the high FE group, suggesting a slightly elevated mitochondrial content. This is contrary to expectation based on the slow muscle isoform data and theoretical physiological capacity arguments. Reassuringly, the extreme 40 most DE genes can successfully cluster the 12 individuals into the appropriate FE treatment group. Functional groups contained in this DE gene list include metabolic proteins (including opposing patterns of CA3 and CA4), mitochondrial proteins (CKMT1A), oxidative status (SEPP1, HIG2A) and cholesterol homeostasis (APOA1, INSIG1). We applied a differential network method (Regulatory Impact Factors) whose aim is to use patterns of differential co-expression to detect regulatory molecules transcriptionally rewired between the groups. This analysis clearly points to alterations in progesterone signalling (via the receptor PGR) as the major driver. We show the progesterone receptor localises to the mitochondria in a quail muscle cell line. Conclusions Progesterone is sometimes used in the cattle industry in exogenous hormone mixes that lead to a ~20% increase in FE. Because the progesterone receptor can localise to avian mitochondria, our data continue to point to muscle mitochondrial metabolism as an important component of the phenotypic expression of variation in broiler FE. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0396-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Walter Bottje
- Department of Poultry Science, University of Arkansas, Fayetteville, AR, USA
| | - Byung-Whi Kong
- Department of Poultry Science, University of Arkansas, Fayetteville, AR, USA
| | - Antonio Reverter
- Agriculture, Commonwealth Science and Industrial Research Organisation, 306 Carmody Road, Brisbane, QLD, 4072, Australia
| | - Ashley J Waardenberg
- Agriculture, Commonwealth Science and Industrial Research Organisation, 306 Carmody Road, Brisbane, QLD, 4072, Australia.,Children's Medical Research Institute, University of Sydney, 214 Hawkesbury Road, Westmead, NSW, 2145, Australia
| | - Kentu Lassiter
- Department of Poultry Science, University of Arkansas, Fayetteville, AR, USA
| | - Nicholas J Hudson
- School of Agriculture and Food Science, University of Queensland, Gatton, QLD, 4343, Australia.
| |
Collapse
|
9
|
Wang D, Wang J, Jiang Y, Liang Y, Xu D. BFDCA: A Comprehensive Tool of Using Bayes Factor for Differential Co-Expression Analysis. J Mol Biol 2016; 429:446-453. [PMID: 27984044 DOI: 10.1016/j.jmb.2016.10.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 10/22/2016] [Accepted: 10/23/2016] [Indexed: 10/20/2022]
Abstract
Comparing the gene-expression profiles between biological conditions is useful for understanding gene regulation underlying complex phenotypes. Along this line, analysis of differential co-expression (DC) has gained attention in the recent years, where genes under one condition have different co-expression patterns compared with another. We developed an R package Bayes Factor approach for Differential Co-expression Analysis (BFDCA) for DC analysis. BFDCA is unique in integrating various aspects of DC patterns (including Shift, Cross, and Re-wiring) into one uniform Bayes factor. We tested BFDCA using simulation data and experimental data. Simulation results indicate that BFDCA outperforms existing methods in accuracy and robustness of detecting DC pairs and DC modules. Results of using experimental data suggest that BFDCA can cluster disease-related genes into functional DC subunits and estimate the regulatory impact of disease-related genes well. BFDCA also achieves high accuracy in predicting case-control phenotypes by using significant DC gene pairs as markers. BFDCA is publicly available at http://dx.doi.org/10.17632/jdz4vtvnm3.1.
Collapse
Affiliation(s)
- Duolin Wang
- College of Computer Science and Technology, Jilin University, Changchun, China 130012; Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Juexin Wang
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yuexu Jiang
- College of Computer Science and Technology, Jilin University, Changchun, China 130012; Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yanchun Liang
- College of Computer Science and Technology, Jilin University, Changchun, China 130012; Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- College of Computer Science and Technology, Jilin University, Changchun, China 130012; Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.
| |
Collapse
|
10
|
Differential Coexpression Analysis Reveals Extensive Rewiring of Arabidopsis Gene Coexpression in Response to Pseudomonas syringae Infection. Sci Rep 2016; 6:35064. [PMID: 27721457 PMCID: PMC5056366 DOI: 10.1038/srep35064] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 09/23/2016] [Indexed: 01/21/2023] Open
Abstract
Plant defense responses to pathogens involve massive transcriptional reprogramming. Recently, differential coexpression analysis has been developed to study the rewiring of gene networks through microarray data, which is becoming an important complement to traditional differential expression analysis. Using time-series microarray data of Arabidopsis thaliana infected with Pseudomonas syringae, we analyzed Arabidopsis defense responses to P. syringae through differential coexpression analysis. Overall, we found that differential coexpression was a common phenomenon of plant immunity. Genes that were frequently involved in differential coexpression tend to be related to plant immune responses. Importantly, many of those genes have similar average expression levels between normal plant growth and pathogen infection but have different coexpression partners. By integrating the Arabidopsis regulatory network into our analysis, we identified several transcription factors that may be regulators of differential coexpression during plant immune responses. We also observed extensive differential coexpression between genes within the same metabolic pathways. Several metabolic pathways, such as photosynthesis light reactions, exhibited significant changes in expression correlation between normal growth and pathogen infection. Taken together, differential coexpression analysis provides a new strategy for analyzing transcriptional data related to plant defense responses and new insights into the understanding of plant-pathogen interactions.
Collapse
|
11
|
Giorgi FM, Lopez G, Woo JH, Bisikirska B, Califano A, Bansal M. Inferring protein modulation from gene expression data using conditional mutual information. PLoS One 2014; 9:e109569. [PMID: 25314274 PMCID: PMC4196905 DOI: 10.1371/journal.pone.0109569] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 09/12/2014] [Indexed: 01/18/2023] Open
Abstract
Systematic, high-throughput dissection of causal post-translational regulatory dependencies, on a genome wide basis, is still one of the great challenges of biology. Due to its complexity, however, only a handful of computational algorithms have been developed for this task. Here we present CINDy (Conditional Inference of Network Dynamics), a novel algorithm for the genome-wide, context specific inference of regulatory dependencies between signaling protein and transcription factor activity, from gene expression data. The algorithm uses a novel adaptive partitioning methodology to accurately estimate the full Condition Mutual Information (CMI) between a transcription factor and its targets, given the expression of a signaling protein. We show that CMI analysis is optimally suited to dissecting post-translational dependencies. Indeed, when tested against a gold standard dataset of experimentally validated protein-protein interactions in signal transduction networks, CINDy significantly outperforms previous methods, both in terms of sensitivity and precision.
Collapse
Affiliation(s)
- Federico M. Giorgi
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Gonzalo Lopez
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Jung H. Woo
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Brygida Bisikirska
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Andrea Califano
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Columbia Genome Center, High Throughput Screening facility, Columbia University, New York, New York, United States of America
- Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Institute for Cancer Genetics, Columbia University, New York, New York, United States of America
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, New York, United States of America
- * E-mail: (AC); (MB)
| | - Mukesh Bansal
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail: (AC); (MB)
| |
Collapse
|
12
|
Wang HQ, Tsai CJ. CorSig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis. PLoS One 2013; 8:e77429. [PMID: 24194884 PMCID: PMC3806744 DOI: 10.1371/journal.pone.0077429] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2013] [Accepted: 09/02/2013] [Indexed: 11/19/2022] Open
Abstract
UNLABELLED With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. SOFTWARE AVAILABILITY A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.
Collapse
Affiliation(s)
- Hong-Qiang Wang
- Intelligent Computing Lab, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- * E-mail: (HQW); (CJT)
| | - Chung-Jui Tsai
- Department of Genetics, University of Georgia, Athens, Georgia, United States of America
- Warnell School of Forestry and Natural Resources, University of Georgia, Athens, Georgia, United States of America
- * E-mail: (HQW); (CJT)
| |
Collapse
|
13
|
Giorgi FM, Del Fabbro C, Licausi F. Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana. ACTA ACUST UNITED AC 2013; 29:717-24. [PMID: 23376351 DOI: 10.1093/bioinformatics/btt053] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
MOTIVATION Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions. They have been used for hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes. So far, the main platform for expression data has been DNA microarrays; however, the recent development of RNA-seq allows for higher accuracy and coverage of transcript populations. It is therefore important to assess the potential for biological investigation of coexpression networks derived from this novel technique in a condition-independent dataset. RESULTS We collected 65 publicly available Illumina RNA-seq high quality Arabidopsis thaliana samples and generated Pearson correlation coexpression networks. These networks were then compared with those derived from analogous microarray data. We show how Variance-Stabilizing Transformed (VST) RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture. Microarray networks show a slightly higher score in biology-derived quality assessments such as overlap with the known protein-protein interaction network and edge ontological agreement. Different coexpression network centralities are investigated; in particular, we show how betweenness centrality is generally a positive marker for essential genes in A.thaliana, regardless of the platform originating the data. In the end, we focus on a specific gene network case, showing that although microarray data seem more suited for gene network reverse engineering, RNA-seq offers the great advantage of extending coexpression analyses to the entire transcriptome.
Collapse
|
14
|
Vasilevski A, Giorgi FM, Bertinetti L, Usadel B. LASSO modeling of the Arabidopsis thaliana seed/seedling transcriptome: a model case for detection of novel mucilage and pectin metabolism genes. MOLECULAR BIOSYSTEMS 2013; 8:2566-74. [PMID: 22735692 DOI: 10.1039/c2mb25096a] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Whole genome transcript correlation-based approaches have been shown to be enormously useful for candidate gene detection. Consequently, simple Pearson correlation has been widely applied in several web based tools. That said, several more sophisticated methods based on e.g. mutual information or Bayesian network inference have been developed and have been shown to be theoretically superior but are not yet commonly applied. Here, we propose the application of a recently developed statistical regression technique, the LASSO, to detect novel candidates from high throughput transcriptomic datasets. We apply the LASSO to a tissue specific dataset in the model plant Arabidopsis thaliana to identify novel players in Arabidopsis thaliana seed coat mucilage synthesis. We built LASSO models based on a list of genes known to be involved in a sub-pathway of Arabidopsis mucilage synthesis. After identifying a putative transcription factor, we verified its involvement in mucilage synthesis by obtaining knock-out mutants for this gene. We show that a loss of function of this putative transcription factor leads to a significant decrease in mucilage pectin.
Collapse
Affiliation(s)
- Aleksandar Vasilevski
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | | | | | | |
Collapse
|
15
|
Yu S, Zheng L, Li Y, Li C, Ma C, Yu Y, Li X, Hao P. Causal co-expression method with module analysis to screen drugs with specific target. Gene 2012; 518:145-51. [PMID: 23266800 DOI: 10.1016/j.gene.2012.11.051] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2012] [Accepted: 11/27/2012] [Indexed: 01/19/2023]
Abstract
The considerable increase of investment in research and development by the pharmaceutical industry over the past three decades has not added the number of approved new drugs. An important issue ignored by drug discovery practice is the multi-dimensional interaction network between drugs and their targets. Thus, it is essential to view drug actions through the lens of network biology. In the current study, based on the co-expression network of transcription factors and their downstream genes, we proposed a novel approach, called causal co-expression method with module analysis, to screen drugs with specific target and fewer side effects. We presented a causal co-expression method with module analysis and it could be used in analyzing the microarray data of different drug candidates. At first, the differential wiring value (DW) was calculated to find some causal transcription factors (TFs) by combining with differential expression genes in the regulated networks. After the discovery of the causal TFs, co-expression module analysis method was applied to mine molecular pharmacology pathways around these causal TFs at molecular level. We applied our methods to two drug candidates, Argyrin A and Bortezomib, both with anti-cancer activities. We first obtained some differentially expressed transcription factors of cells treated with Argyrin A or Bortezomib. Nearly all these transcription factors are associated with the tumor suppressor protein p27kip1. Furthermore, module analysis showed that Bortezomib inhibited tumor growth not specifically by cell cycle and cell proliferation pathway, but through many basic metabolic processes which result in cell toxicity. In contrast, Argyrin A had influence on cell cycle, and was involved in DNA damage repair at the same time, showing that Argyrin A was a more suitable drug for anti-cancer treatment. Our study revealed that the causal co-expression method with module analysis was effective and can be used as a tool to evaluate drug candidates.
Collapse
Affiliation(s)
- Shuhao Yu
- College of Life Science and Biotechnology, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240, PR China.
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Hempel S, Koseska A, Nikoloski Z, Kurths J. Unraveling gene regulatory networks from time-resolved gene expression data - a measures comparison study. BMC Bioinformatics 2011; 12:292. [PMID: 21771321 PMCID: PMC3161045 DOI: 10.1186/1471-2105-12-292] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2011] [Accepted: 07/19/2011] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Inferring regulatory interactions between genes from transcriptomics time-resolved data, yielding reverse engineered gene regulatory networks, is of paramount importance to systems biology and bioinformatics studies. Accurate methods to address this problem can ultimately provide a deeper insight into the complexity, behavior, and functions of the underlying biological systems. However, the large number of interacting genes coupled with short and often noisy time-resolved read-outs of the system renders the reverse engineering a challenging task. Therefore, the development and assessment of methods which are computationally efficient, robust against noise, applicable to short time series data, and preferably capable of reconstructing the directionality of the regulatory interactions remains a pressing research problem with valuable applications. RESULTS Here we perform the largest systematic analysis of a set of similarity measures and scoring schemes within the scope of the relevance network approach which are commonly used for gene regulatory network reconstruction from time series data. In addition, we define and analyze several novel measures and schemes which are particularly suitable for short transcriptomics time series. We also compare the considered 21 measures and 6 scoring schemes according to their ability to correctly reconstruct such networks from short time series data by calculating summary statistics based on the corresponding specificity and sensitivity. Our results demonstrate that rank and symbol based measures have the highest performance in inferring regulatory interactions. In addition, the proposed scoring scheme by asymmetric weighting has shown to be valuable in reducing the number of false positive interactions. On the other hand, Granger causality as well as information-theoretic measures, frequently used in inference of regulatory networks, show low performance on the short time series analyzed in this study. CONCLUSIONS Our study is intended to serve as a guide for choosing a particular combination of similarity measures and scoring schemes suitable for reconstruction of gene regulatory networks from short time series data. We show that further improvement of algorithms for reverse engineering can be obtained if one considers measures that are rooted in the study of symbolic dynamics or ranks, in contrast to the application of common similarity measures which do not consider the temporal character of the employed data. Moreover, we establish that the asymmetric weighting scoring scheme together with symbol based measures (for low noise level) and rank based measures (for high noise level) are the most suitable choices.
Collapse
Affiliation(s)
- Sabrina Hempel
- Interdisciplinary Center for Dynamics of Complex Systems, University of Potsdam, Campus Golm, Karl-Liebknecht-Str. 24, D-14476 Potsdam, Germany
- Potsdam Institute for Climate Impact Research (PIK), Telegraphenberg A 31, D-14473 Potsdam, Germany
- Department of Physics, Humboldt University of Berlin, Campus Adlershof, Newtonstr. 15, D-12489 Berlin, Germany
| | - Aneta Koseska
- Interdisciplinary Center for Dynamics of Complex Systems, University of Potsdam, Campus Golm, Karl-Liebknecht-Str. 24, D-14476 Potsdam, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, D-14476 Potsdam, Germany
- Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 25, D-14476 Potsdam, Germany
| | - Jürgen Kurths
- Potsdam Institute for Climate Impact Research (PIK), Telegraphenberg A 31, D-14473 Potsdam, Germany
- Department of Physics, Humboldt University of Berlin, Campus Adlershof, Newtonstr. 15, D-12489 Berlin, Germany
- Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen AB243UE, UK
| |
Collapse
|
17
|
Wang XD, Qi YX, Jiang ZL. Reconstruction of transcriptional network from microarray data using combined mutual information and network-assisted regression. IET Syst Biol 2011; 5:95-102. [PMID: 21405197 DOI: 10.1049/iet-syb.2010.0041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Many methods had been developed on inferring transcriptional network from gene expression. However, it is still necessary to design new method that discloses more detailed and exact network information. Using network-assisted regression, the authors combined the averaged three-way mutual information (AMI3) and non-linear ordinary differential equation (ODE) model to infer the transcriptional network, and to obtain both the topological structure and the regulatory dynamics. Synthetic and experimental data were used to evaluate the performance of the above approach. In comparison with the previous methods based on mutual information, AMI3 obtained higher precision with the same sensitivity. To describe the regulatory dynamics between transcription factors and target genes, network-assisted regression and regression without network, respectively, were applied in the steady-state and time series microarray data. The results revealed that comparing with regression without network, network-assisted regression increased the precision, but decreased the fitting goodness. Then, the authors reconstructed the transcriptional network of Escherichia coli and simulated the regulatory dynamics of genes. Furthermore, the authors' approach identified potential transcription factors regulating yeast cell cycle. In conclusion, network-assisted regression, combined AMI3 and ODE model, was a more precisely to infer the topological structure and the regulatory dynamics of transcriptional network from microarray data. [Includes supplementary material].
Collapse
Affiliation(s)
- X-D Wang
- Shanghai Jiao Tong University, Institute of Mechanobiology and Medical Engineering, Shanghai, People's Republic of China
| | | | | |
Collapse
|
18
|
Tan M, Alshalalfa M, Alhajj R, Polat F. Influence of prior knowledge in constraint-based learning of gene regulatory networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:130-142. [PMID: 21071802 DOI: 10.1109/tcbb.2009.58] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Constraint-based structure learning algorithms generally perform well on sparse graphs. Although sparsity is not uncommon, there are some domains where the underlying graph can have some dense regions; one of these domains is gene regulatory networks, which is the main motivation to undertake the study described in this paper. We propose a new constraint-based algorithm that can both increase the quality of output and decrease the computational requirements for learning the structure of gene regulatory networks. The algorithm is based on and extends the PC algorithm. Two different types of information are derived from the prior knowledge; one is the probability of existence of edges, and the other is the nodes that seem to be dependent on a large number of nodes compared to other nodes in the graph. Also a new method based on Gene Ontology for gene regulatory network validation is proposed. We demonstrate the applicability and effectiveness of the proposed algorithms on both synthetic and real data sets.
Collapse
Affiliation(s)
- Mehmet Tan
- Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey.
| | | | | | | |
Collapse
|
19
|
Yano K. Gene expression correlation analysis predicts involvement of high- and low-confidence risk genes in different stages of prostate carcinogenesis. Prostate 2010; 70:1746-59. [PMID: 20564324 DOI: 10.1002/pros.21210] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
BACKGROUND Whole genome association studies have identified many loci associated with the risk of prostate cancer (PC). However, very few of the genes associated with these loci have been related to specific processes of prostate carcinogenesis. Therefore I inferred biological functions associated with these risk genes using gene expression correlation analysis. METHODS PC risk genes reported in the literature were classified as having high (P<10(-6)), medium (P<10(-4)), or low (P<10(-2)) statistical confidence. Correlation coefficients of the expression levels between the risk genes and other genes in cancerous prostates samples were compared against those in normal prostates using a microarray dataset from Gene Expression Omnibus. RESULTS Overall, significant decrease of correlations in PC was observed between the levels of expression of the high-confidence genes and other genes in the microarray dataset, whereas correlation between low-confidence genes and other genes in PC showed smaller decrease. Genes involved in developmental processes were significantly correlated with all risk gene categories. Ectoderm development genes, which may be related to squamous metaplasia, and genes enriched in fetal prostate stem cells (PSCs) showed strong association with the high-confidence genes. The association between the PSC genes and the low-confidence genes was weak, but genes related to neural system genes showed strong association with low-confidence genes. CONCLUSIONS The high-confidence risk genes may be associated with an early stage of prostate carcinogenesis, possibly involving PSCs and squamous metaplasia. The low-confidence genes may be involved in a later stage of carcinogenesis.
Collapse
Affiliation(s)
- Kojiro Yano
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK.
| |
Collapse
|
20
|
Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Mewes HW. CORUM: the comprehensive resource of mammalian protein complexes--2009. Nucleic Acids Res 2009; 38:D497-501. [PMID: 19884131 PMCID: PMC2808912 DOI: 10.1093/nar/gkp914] [Citation(s) in RCA: 509] [Impact Index Per Article: 33.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing ∼16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a ‘Phylogenetic Conservation’ analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html).
Collapse
Affiliation(s)
- Andreas Ruepp
- Institute for Bioinformatics and Systems Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Ingolstädter Landstrasse 1, D-85764 Neuherberg, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Inferring the transcriptional landscape of bovine skeletal muscle by integrating co-expression networks. PLoS One 2009; 4:e7249. [PMID: 19794913 PMCID: PMC2749936 DOI: 10.1371/journal.pone.0007249] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2009] [Accepted: 08/31/2009] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Despite modern technologies and novel computational approaches, decoding causal transcriptional regulation remains challenging. This is particularly true for less well studied organisms and when only gene expression data is available. In muscle a small number of well characterised transcription factors are proposed to regulate development. Therefore, muscle appears to be a tractable system for proposing new computational approaches. METHODOLOGY/PRINCIPAL FINDINGS Here we report a simple algorithm that asks "which transcriptional regulator has the highest average absolute co-expression correlation to the genes in a co-expression module?" It correctly infers a number of known causal regulators of fundamental biological processes, including cell cycle activity (E2F1), glycolysis (HLF), mitochondrial transcription (TFB2M), adipogenesis (PIAS1), neuronal development (TLX3), immune function (IRF1) and vasculogenesis (SOX17), within a skeletal muscle context. However, none of the canonical pro-myogenic transcription factors (MYOD1, MYOG, MYF5, MYF6 and MEF2C) were linked to muscle structural gene expression modules. Co-expression values were computed using developing bovine muscle from 60 days post conception (early foetal) to 30 months post natal (adulthood) for two breeds of cattle, in addition to a nutritional comparison with a third breed. A number of transcriptional landscapes were constructed and integrated into an always correlated landscape. One notable feature was a 'metabolic axis' formed from glycolysis genes at one end, nuclear-encoded mitochondrial protein genes at the other, and centrally tethered by mitochondrially-encoded mitochondrial protein genes. CONCLUSIONS/SIGNIFICANCE The new module-to-regulator algorithm complements our recently described Regulatory Impact Factor analysis. Together with a simple examination of a co-expression module's contents, these three gene expression approaches are starting to illuminate the in vivo transcriptional regulation of skeletal muscle development.
Collapse
|
22
|
He F, Balling R, Zeng AP. Reverse engineering and verification of gene networks: principles, assumptions, and limitations of present methods and future perspectives. J Biotechnol 2009; 144:190-203. [PMID: 19631244 DOI: 10.1016/j.jbiotec.2009.07.013] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Revised: 07/13/2009] [Accepted: 07/16/2009] [Indexed: 12/21/2022]
Abstract
Reverse engineering of gene networks aims at revealing the structure of the gene regulation network in a biological system by reasoning backward directly from experimental data. Many methods have recently been proposed for reverse engineering of gene networks by using gene transcript expression data measured by microarray. Whereas the potentials of the methods have been well demonstrated, the assumptions and limitations behind them are often not clearly stated or not well understood. In this review, we first briefly explain the principles of the major methods, identify the assumptions behind them and pinpoint the limitations and possible pitfalls in applying them to real biological questions. With regard to applications, we then discuss challenges in the experimental verification of gene networks generated from reverse engineering methods. We further propose an optimal experimental design for allocating sampling schedule and possible strategies for reducing the limitations of some of the current reverse engineering methods. Finally, we examine the perspectives for the development of reverse engineering and urge the need to move from revealing network structure to the dynamics of biological systems.
Collapse
Affiliation(s)
- Feng He
- Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany
| | | | | |
Collapse
|
23
|
Michoel T, De Smet R, Joshi A, Van de Peer Y, Marchal K. Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks. BMC SYSTEMS BIOLOGY 2009; 3:49. [PMID: 19422680 PMCID: PMC2684101 DOI: 10.1186/1752-0509-3-49] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 05/07/2009] [Indexed: 12/20/2022]
Abstract
BACKGROUND A myriad of methods to reverse-engineer transcriptional regulatory networks have been developed in recent years. Direct methods directly reconstruct a network of pairwise regulatory interactions while module-based methods predict a set of regulators for modules of coexpressed genes treated as a single unit. To date, there has been no systematic comparison of the relative strengths and weaknesses of both types of methods. RESULTS We have compared a recently developed module-based algorithm, LeMoNe (Learning Module Networks), to a mutual information based direct algorithm, CLR (Context Likelihood of Relatedness), using benchmark expression data and databases of known transcriptional regulatory interactions for Escherichia coli and Saccharomyces cerevisiae. A global comparison using recall versus precision curves hides the topologically distinct nature of the inferred networks and is not informative about the specific subtasks for which each method is most suited. Analysis of the degree distributions and a regulator specific comparison show that CLR is 'regulator-centric', making true predictions for a higher number of regulators, while LeMoNe is 'target-centric', recovering a higher number of known targets for fewer regulators, with limited overlap in the predicted interactions between both methods. Detailed biological examples in E. coli and S. cerevisiae are used to illustrate these differences and to prove that each method is able to infer parts of the network where the other fails. Biological validation of the inferred networks cautions against over-interpreting recall and precision values computed using incomplete reference networks. CONCLUSION Our results indicate that module-based and direct methods retrieve largely distinct parts of the underlying transcriptional regulatory networks. The choice of algorithm should therefore be based on the particular biological problem of interest and not on global metrics which cannot be transferred between organisms. The development of sound statistical methods for integrating the predictions of different reverse-engineering strategies emerges as an important challenge for future research.
Collapse
Affiliation(s)
- Tom Michoel
- Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | | | |
Collapse
|
24
|
Hudson NJ, Reverter A, Dalrymple BP. A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS Comput Biol 2009; 5:e1000382. [PMID: 19412532 PMCID: PMC2671163 DOI: 10.1371/journal.pcbi.1000382] [Citation(s) in RCA: 148] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2008] [Accepted: 04/01/2009] [Indexed: 11/18/2022] Open
Abstract
Transcription factor (TF) regulation is often post-translational. TF
modifications such as reversible phosphorylation and missense mutations, which
can act independent of TF expression level, are overlooked by differential
expression analysis. Using bovine Piedmontese myostatin mutants as
proof-of-concept, we propose a new algorithm that correctly identifies the gene
containing the causal mutation from microarray data alone. The myostatin
mutation releases the brakes on Piedmontese muscle growth by translating a
dysfunctional protein. Compared to a less muscular non-mutant breed we find that
myostatin is not differentially expressed at any of ten developmental time
points. Despite this challenge, the algorithm identifies the myostatin
‘smoking gun’ through a coordinated, simultaneous, weighted
integration of three sources of microarray information: transcript abundance,
differential expression, and differential wiring. By asking the novel question
“which regulator is cumulatively most differentially wired to the
abundant most differentially expressed genes?” it yields the correct
answer, “myostatin”. Our new approach identifies causal
regulatory changes by globally contrasting co-expression network dynamics. The
entirely data-driven ‘weighting’ procedure emphasises
regulatory movement relative to the phenotypically relevant part of the network.
In contrast to other published methods that compare co-expression networks,
significance testing is not used to eliminate connections. Evolution, development, and cancer are governed by regulatory circuits where the
central nodes are transcription factors. Consequently, there is great interest
in methods that can identify the causal mutation/perturbation responsible for
any circuit rewiring. The most widely available high-throughput technology, the
microarray, assays the transcriptome. However, many regulatory perturbations are
post-transcriptional. This means that they are overlooked by traditional
differential gene expression analysis. We hypothesised that by viewing
biological systems as networks one could identify causal mutations and
perturbations by examining those regulators whose position in the network
changes the most. Using muscular myostatin mutant cattle as a proof-of-concept,
we propose an analysis that succeeds based solely on microarray expression data
from just 27 animals. Our analysis differs from competing network approaches in
that we do not use significance testing to eliminate connections. All
connections are contrasted, no matter how weak. Further, the identity of target
genes is maintained throughout the analysis. Finally, the analysis is
‘weighted’ such that movement relative to the phenotypically
most relevant part of the network is emphasised. By identifying the question to
which myostatin is the answer, we present a comparison of network connectivity
that is potentially generalisable.
Collapse
Affiliation(s)
- Nicholas J. Hudson
- Food Futures Flagship and Livestock Industries, Commonwealth Scientific
and Industrial Research Organisation, Queensland Bioscience Precinct, St. Lucia
Brisbane, Queensland, Australia
| | - Antonio Reverter
- Food Futures Flagship and Livestock Industries, Commonwealth Scientific
and Industrial Research Organisation, Queensland Bioscience Precinct, St. Lucia
Brisbane, Queensland, Australia
- * E-mail:
| | - Brian P. Dalrymple
- Food Futures Flagship and Livestock Industries, Commonwealth Scientific
and Industrial Research Organisation, Queensland Bioscience Precinct, St. Lucia
Brisbane, Queensland, Australia
| |
Collapse
|
25
|
Reverter A, Chan EKF. Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks. ACTA ACUST UNITED AC 2008; 24:2491-7. [PMID: 18784117 DOI: 10.1093/bioinformatics/btn482] [Citation(s) in RCA: 215] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION We present PCIT, an algorithm for the reconstruction of gene co-expression networks (GCN) that combines the concept partial correlation coefficient with information theory to identify significant gene to gene associations defining edges in the reconstruction of GCN. The properties of PCIT are examined in the context of the topology of the reconstructed network including connectivity structure, clustering coefficient and sensitivity. RESULTS We apply PCIT to a series of simulated datasets with varying levels of complexity in terms of number of genes and experimental conditions, as well as to three real datasets. Results show that, as opposed to the constant cutoff approach commonly used in the literature, the PCIT algorithm can identify and allow for more moderate, yet not less significant, estimates of correlation (r) to still establish a connection in the GCN. We show that PCIT is more sensitive than established methods and capable of detecting functionally validated gene-gene interactions coming from absolute r values as low as 0.3. These bona fide associations, which often relate to genes with low variation in expression patterns, are beyond the detection limits of conventional fixed-threshold methods, and would be overlooked by studies relying on those methods. AVAILABILITY FORTRAN 90 source code to perform the PCIT algorithm is available as Supplementary File 1.
Collapse
Affiliation(s)
- Antonio Reverter
- CSIRO Livestock Industries, Queensland Bioscience Precinct, 306 Carmody Road, Brisbane, Queensland 4067, Australia.
| | | |
Collapse
|