1
|
TRANSPARENT: a Python tool for designing transcription factor regulatory networks. Soft comput 2023. [DOI: 10.1007/s00500-023-07888-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
AbstractTranscription factors are proteins able to selectively bind DNA short traits, namely transcription factors binding sites, in order to regulate gene expression in terms of both repression and activation. Despite plenty of studies focusing on transcription factors and on the role they play in specific biological tasks or diseases, is available in the literature, to our knowledge there is no tool able to automatically provide a list of transcription factors involved in this task and the associated interaction network through a solid computational analysis. TRANScriPtion fActor REgulatory NeTwork (TRANSPARENT) is a user-friendly Python tool designed to help researchers in studying given biological tasks or given diseases in human, by identifying transcription factors controlling and regulating the expression of genes associated with that task or disease. The tool takes in input a list of gene IDs and provides (1) a set of transcription factors that are significantly associated with the input genes, (2) the correspondent P values (i.e., the probability that this observed association was driven by chance) and (3) a transcription factor network that can be directly visualized through STRING database. The effectiveness and reliability of the tool were assessed by applying it to two different test cases: schizophrenia and autism disorders. The obtained results clearly show that identified TFs, for both datasets, are significantly associated with those disorders, in terms of both gene enrichment and coherence with the literature. The proposed tool TRANSPARENT can be a useful instrument to investigate transcription factor networks and unveil the role that TFs play in given biological tasks and diseases.
Collapse
|
2
|
Corso D, Chemello F, Alessio E, Urso I, Ferrarese G, Bazzega M, Romualdi C, Lanfranchi G, Sales G, Cagnin S. MyoData: An expression knowledgebase at single cell/nucleus level for the discovery of coding-noncoding RNA functional interactions in skeletal muscle. Comput Struct Biotechnol J 2021; 19:4142-4155. [PMID: 34527188 PMCID: PMC8342900 DOI: 10.1016/j.csbj.2021.07.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 07/19/2021] [Accepted: 07/19/2021] [Indexed: 12/22/2022] Open
Abstract
Regulation of gene expression through non-coding RNAs at single myofiber and nucleus resolution. Reinterpretation of KEGG pathways with microRNA and long non-coding RNA activities. miR-149, -214, and let-7e alter mitochondrial shape. The long non-coding RNA Pvt1 is a sponge for miR-27a. miR-208b regulates Sox6; miR-214 regulates both Sox6 and Slc16a3.
Non-coding RNAs represent the largest part of transcribed mammalian genomes and prevalently exert regulatory functions. Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) can modulate the activity of each other. Skeletal muscle is the most abundant tissue in mammals. It is composed of different cell types with myofibers that represent the smallest complete contractile system. Considering that lncRNAs and miRNAs are more cell type-specific than coding RNAs, to understand their function it is imperative to evaluate their expression and action within single myofibers. In this database, we collected gene expression data for coding and non-coding genes in single myofibers and used them to produce interaction networks based on expression correlations. Since biological pathways are more informative than networks based on gene expression correlation, to understand how altered genes participate in the studied phenotype, we integrated KEGG pathways with miRNAs and lncRNAs. The database also integrates single nucleus gene expression data on skeletal muscle in different patho-physiological conditions. We demonstrated that these networks can serve as a framework from which to dissect new miRNA and lncRNA functions to experimentally validate. Some interactions included in the database have been previously experimentally validated using high throughput methods. These can be the basis for further functional studies. Using database information, we demonstrate the involvement of miR-149, -214 and let-7e in mitochondria shaping; the ability of the lncRNA Pvt1 to mitigate the action of miR-27a via sponging; and the regulatory activity of miR-214 on Sox6 and Slc16a3. The MyoData is available at https://myodata.bio.unipd.it.
Collapse
Affiliation(s)
- Davide Corso
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Francesco Chemello
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Enrico Alessio
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Ilenia Urso
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Giulia Ferrarese
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Martina Bazzega
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Chiara Romualdi
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Gerolamo Lanfranchi
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy.,CRIBI Biotechnology Centre, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy.,CIR-Myo Myology Center, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Gabriele Sales
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| | - Stefano Cagnin
- Department of Biology, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy.,CRIBI Biotechnology Centre, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy.,CIR-Myo Myology Center, University of Padova, Via Ugo Bassi 58/b, 35131 Padova, Italy
| |
Collapse
|
3
|
Ji X, Chen S, Li JC, Deng W, Wei Z, Wei H. SSGA and MSGA: two seed-growing algorithms for constructing collaborative subnetworks. Sci Rep 2017; 7:1446. [PMID: 28469138 PMCID: PMC5431152 DOI: 10.1038/s41598-017-01556-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 03/30/2017] [Indexed: 11/09/2022] Open
Abstract
The establishment of a collaborative network of transcription factors (TFs) followed by decomposition and then construction of subnetworks is an effective way to obtain sets of collaborative TFs; each set controls a biological process or a complex trait. We previously developed eight gene association methods for genome-wide coexpression analysis between each TF and all other genomic genes and then constructing collaborative networks of TFs but only one algorithm, called Triple-Link Algorithm, for building collaborative subnetworks. In this study, we developed two more algorithms, Single Seed-Growing Algorithm (SSGA) and Multi-Seed Growing Algorithm (MSGA), for building collaborative subnetworks of TFs by identifying the fully-linked triple-node seeds from a decomposed collaborative network and then growing them into subnetworks with two different strategies. The subnetworks built from the three algorithms described above were comparatively appraised in terms of both functional cohesion and intra-subnetwork association strengths versus inter-subnetwork association strengths. We concluded that SSGA and MSGA, which performed more systemic comparisons and analyses of edge weights and network connectivity during subnetwork construction processes, yielded more functional and cohesive subnetworks than Triple-Link Algorithm. Together, these three algorithms provide alternate approaches for acquiring subnetworks of collaborative TFs. We also presented a framework to outline how to use these three algorithms to obtain collaborative TF sets governing biological processes or complex traits.
Collapse
Affiliation(s)
- Xiaohui Ji
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, Heilongjiang, 150040, P.R. China.,State Key Lab of Forest Genetics and Breeding, Northeast Forestry University, Harbin, Heilongjiang, 150040, P.R. China
| | - Su Chen
- State Key Lab of Forest Genetics and Breeding, Northeast Forestry University, Harbin, Heilongjiang, 150040, P.R. China
| | - Jun Cheng Li
- Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangzhou, 510642, P.R. China
| | - Wenping Deng
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA
| | - Zhigang Wei
- State Key Lab of Forest Genetics and Breeding, Northeast Forestry University, Harbin, Heilongjiang, 150040, P.R. China
| | - Hairong Wei
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA. .,Department of Computer Science, Michigan Technological University, Houghton, MI, 49931, USA. .,Life Science and Technology Institute, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
4
|
Kumari S, Deng W, Gunasekara C, Chiang V, Chen HS, Ma H, Davis X, Wei H. Bottom-up GGM algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways or processes. BMC Bioinformatics 2016; 17:132. [PMID: 26993098 PMCID: PMC4797117 DOI: 10.1186/s12859-016-0981-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 03/09/2016] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Multilayered hierarchical gene regulatory networks (ML-hGRNs) are very important for understanding genetics regulation of biological pathways. However, there are currently no computational algorithms available for directly building ML-hGRNs that regulate biological pathways. RESULTS A bottom-up graphic Gaussian model (GGM) algorithm was developed for constructing ML-hGRN operating above a biological pathway using small- to medium-sized microarray or RNA-seq data sets. The algorithm first placed genes of a pathway at the bottom layer and began to construct a ML-hGRN by evaluating all combined triple genes: two pathway genes and one regulatory gene. The algorithm retained all triple genes where a regulatory gene significantly interfered two paired pathway genes. The regulatory genes with highest interference frequency were kept as the second layer and the number kept is based on an optimization function. Thereafter, the algorithm was used recursively to build a ML-hGRN in layer-by-layer fashion until the defined number of layers was obtained or terminated automatically. CONCLUSIONS We validated the algorithm and demonstrated its high efficiency in constructing ML-hGRNs governing biological pathways. The algorithm is instrumental for biologists to learn the hierarchical regulators associated with a given biological pathway from even small-sized microarray or RNA-seq data sets.
Collapse
Affiliation(s)
- Sapna Kumari
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA
| | - Wenping Deng
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA
| | - Chathura Gunasekara
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA
| | - Vincent Chiang
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA
| | - Huann-Sheng Chen
- Statistical Methodology and Applications Branch, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Rockville, MD, 20850, USA
| | - Hao Ma
- NCCWA, USDA ARS, Kearneysville, WV, 25430, USA
| | - Xin Davis
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC, 27695, USA
| | - Hairong Wei
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
5
|
The systemic amyloid precursor transthyretin (TTR) behaves as a neuronal stress protein regulated by HSF1 in SH-SY5Y human neuroblastoma cells and APP23 Alzheimer's disease model mice. J Neurosci 2014; 34:7253-65. [PMID: 24849358 DOI: 10.1523/jneurosci.4936-13.2014] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Increased neuronal synthesis of transthyretin (TTR) may favorably impact on Alzheimer's disease (AD) because TTR has been shown to inhibit Aβ aggregation and detoxify cell-damaging conformers. The mechanism whereby hippocampal and cortical neurons from AD patients and APP23 AD model mice produce more TTR is unknown. We now show that TTR expression in SH-SY5Y human neuroblastoma cells, primary hippocampal neurons and the hippocampus of APP23 mice, is significantly enhanced by heat shock factor 1 (HSF1). Chromatin immunoprecipitation (ChIP) assays demonstrated occupation of TTR promoter heat shock elements by HSF1 in APP23 hippocampi, primary murine hippocampal neurons, and SH-SY5Y cells, but not in mouse liver, cultured human hepatoma (HepG2) cells, or AC16 cultured human cardiomyocytes. Treating SH-SY5Y human neuroblastoma cells with heat shock or the HSF1 stimulator celastrol increased TTR transcription in parallel with that of HSP40, HSP70, and HSP90. With both treatments, ChIP showed increased occupancy of heat shock elements in the TTR promoter by HSF1. In vivo celastrol increased the HSF1 ChIP signal in hippocampus but not in liver. Transfection of a human HSF1 construct into SH-SY5Y cells increased TTR transcription and protein production, which could be blocked by shHSF1 antisense. The effect is neuron specific. In cultured HepG2 cells, HSF1 was either suppressive or had no effect on TTR expression confirming the differential effects of HSF1 on TTR transcription in different cell types.
Collapse
|
6
|
Lopes FM, Ray SS, Hashimoto RF, Cesar RM. Entropic Biological Score: a cell cycle investigation for GRNs inference. Gene 2014; 541:129-37. [DOI: 10.1016/j.gene.2014.03.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Revised: 02/17/2014] [Accepted: 03/05/2014] [Indexed: 12/21/2022]
|
7
|
Kumari S, Nie J, Chen HS, Ma H, Stewart R, Li X, Lu MZ, Taylor WM, Wei H. Evaluation of gene association methods for coexpression network construction and biological knowledge discovery. PLoS One 2012; 7:e50411. [PMID: 23226279 PMCID: PMC3511551 DOI: 10.1371/journal.pone.0050411] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2012] [Accepted: 10/18/2012] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical. METHODS AND RESULTS In this study, we compared eight gene association methods - Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding's D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson - and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods. CONCLUSIONS We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.
Collapse
Affiliation(s)
- Sapna Kumari
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Jeff Nie
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - Huann-Sheng Chen
- Statistical Methodology and Applications Branch, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Hao Ma
- Division of Animal and Nutritional Sciences, West Virginia University, Morgantown, West Virginia, United States of America
| | - Ron Stewart
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - Xiang Li
- Department of Computer Science, Michigan Technological University, Houghton, Michigan, United States of America
| | - Meng-Zhu Lu
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, P.R. China
| | - William M. Taylor
- Department of Computer Science, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Hairong Wei
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
- Department of Computer Science, Michigan Technological University, Houghton, Michigan, United States of America
- Biotechnology Research Center, Michigan Technological University, Houghton, Michigan, United States of America
- School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
8
|
Alvarez A, Woolf PJ. RegNetB: predicting relevant regulator-gene relationships in localized prostate tumor samples. BMC Bioinformatics 2011; 12:243. [PMID: 21682879 PMCID: PMC3128037 DOI: 10.1186/1471-2105-12-243] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Accepted: 06/17/2011] [Indexed: 11/10/2022] Open
Abstract
Background A central question in cancer biology is what changes cause a healthy cell to form a tumor. Gene expression data could provide insight into this question, but it is difficult to distinguish between a gene that causes a change in gene expression from a gene that is affected by this change. Furthermore, the proteins that regulate gene expression are often themselves not regulated at the transcriptional level. Here we propose a Bayesian modeling framework we term RegNetB that uses mechanistic information about the gene regulatory network to distinguish between factors that cause a change in expression and genes that are affected by the change. We test this framework using human gene expression data describing localized prostate cancer progression. Results The top regulatory relationships identified by RegNetB include the regulation of RLN1, RLN2, by PAX4, the regulation of ACPP (PAP) by JUN, BACH1 and BACH2, and the co-regulation of PGC and GDF15 by MAZ and TAF8. These target genes are known to participate in tumor progression, but the suggested regulatory roles of PAX4, BACH1, BACH2, MAZ and TAF8 in the process is new. Conclusion Integrating gene expression data and regulatory topologies can aid in identifying potentially causal mechanisms for observed changes in gene expression.
Collapse
Affiliation(s)
- Angel Alvarez
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
9
|
Nie J, Stewart R, Zhang H, Thomson JA, Ruan F, Cui X, Wei H. TF-Cluster: a pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM). BMC SYSTEMS BIOLOGY 2011; 5:53. [PMID: 21496241 PMCID: PMC3101171 DOI: 10.1186/1752-0509-5-53] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 04/15/2011] [Indexed: 12/24/2022]
Abstract
BACKGROUND Identifying the key transcription factors (TFs) controlling a biological process is the first step toward a better understanding of underpinning regulatory mechanisms. However, due to the involvement of a large number of genes and complex interactions in gene regulatory networks, identifying TFs involved in a biological process remains particularly difficult. The challenges include: (1) Most eukaryotic genomes encode thousands of TFs, which are organized in gene families of various sizes and in many cases with poor sequence conservation, making it difficult to recognize TFs for a biological process; (2) Transcription usually involves several hundred genes that generate a combination of intrinsic noise from upstream signaling networks and lead to fluctuations in transcription; (3) A TF can function in different cell types or developmental stages. Currently, the methods available for identifying TFs involved in biological processes are still very scarce, and the development of novel, more powerful methods is desperately needed. RESULTS We developed a computational pipeline called TF-Cluster for identifying functionally coordinated TFs in two steps: (1) Construction of a shared coexpression connectivity matrix (SCCM), in which each entry represents the number of shared coexpressed genes between two TFs. This sparse and symmetric matrix embodies a new concept of coexpression networks in which genes are associated in the context of other shared coexpressed genes; (2) Decomposition of the SCCM using a novel heuristic algorithm termed "Triple-Link", which searches the highest connectivity in the SCCM, and then uses two connected TF as a primer for growing a TF cluster with a number of linking criteria. We applied TF-Cluster to microarray data from human stem cells and Arabidopsis roots, and then demonstrated that many of the resulting TF clusters contain functionally coordinated TFs that, based on existing literature, accurately represent a biological process of interest. CONCLUSIONS TF-Cluster can be used to identify a set of TFs controlling a biological process of interest from gene expression data. Its high accuracy in recognizing true positive TFs involved in a biological process makes it extremely valuable in building core GRNs controlling a biological process. The pipeline implemented in Perl can be installed in various platforms.
Collapse
Affiliation(s)
- Jeff Nie
- Morgridge Institute for Research, 330 N. Orchard St., Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, 330 N. Orchard St., Madison, WI 53715, USA
| | - Hang Zhang
- Department of Computer Science, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
| | - James A Thomson
- Morgridge Institute for Research, 330 N. Orchard St., Madison, WI 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin, 600 Highland Ave., Madison, WI 53792, USA
- Department of Cell & Regenerative Biology, University of Wisconsin, 1300 University Ave., Madison, WI 53705, USA
- Department of Molecular, Cellular, & Developmental Biology, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
| | - Fang Ruan
- Program of Computing Science and Engineering, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
| | - Xiaoqi Cui
- Department of Mathematics, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
| | - Hairong Wei
- School of Forest Resources and Environmental Science, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
- Biotechnology Research Center, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
| |
Collapse
|
10
|
Geeven G, Macgillavry HD, Eggers R, Sassen MM, Verhaagen J, Smit AB, de Gunst MCM, van Kesteren RE. LLM3D: a log-linear modeling-based method to predict functional gene regulatory interactions from genome-wide expression data. Nucleic Acids Res 2011; 39:5313-27. [PMID: 21422075 PMCID: PMC3141251 DOI: 10.1093/nar/gkr139] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
All cellular processes are regulated by condition-specific and time-dependent interactions between transcription factors and their target genes. While in simple organisms, e.g. bacteria and yeast, a large amount of experimental data is available to support functional transcription regulatory interactions, in mammalian systems reconstruction of gene regulatory networks still heavily depends on the accurate prediction of transcription factor binding sites. Here, we present a new method, log-linear modeling of 3D contingency tables (LLM3D), to predict functional transcription factor binding sites. LLM3D combines gene expression data, gene ontology annotation and computationally predicted transcription factor binding sites in a single statistical analysis, and offers a methodological improvement over existing enrichment-based methods. We show that LLM3D successfully identifies novel transcriptional regulators of the yeast metabolic cycle, and correctly predicts key regulators of mouse embryonic stem cell self-renewal more accurately than existing enrichment-based methods. Moreover, in a clinically relevant in vivo injury model of mammalian neurons, LLM3D identified peroxisome proliferator-activated receptor γ (PPARγ) as a neuron-intrinsic transcriptional regulator of regenerative axon growth. In conclusion, LLM3D provides a significant improvement over existing methods in predicting functional transcription regulatory interactions in the absence of experimental transcription factor binding data.
Collapse
Affiliation(s)
- Geert Geeven
- Department of Mathematics, Faculty of Sciences, VU University, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|