Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ernst J, Plasterer HL, Simon I, Bar-Joseph Z. Integrating multiple evidence sources to predict transcription factor binding in the human genome. Genome Res 2010;20:526-36. [PMID: 20219943 DOI: 10.1101/gr.096305.109] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

For:	Ernst J, Plasterer HL, Simon I, Bar-Joseph Z. Integrating multiple evidence sources to predict transcription factor binding in the human genome. Genome Res 2010;20:526-36. [PMID: 20219943 DOI: 10.1101/gr.096305.109] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Number

Cited by Other Article(s)

Chatterjee K, Mal S, Ghosh M, Chattopadhyay NR, Roy SD, Chakraborty K, Mukherjee S, Aier M, Choudhuri T. Blood-based DNA methylation in advanced Nasopharyngeal Carcinoma exhibited distinct CpG methylation signature. Sci Rep 2023;13:22086. [PMID: 38086861 PMCID: PMC10716134 DOI: 10.1038/s41598-023-45001-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 10/14/2023] [Indexed: 12/18/2023] Open

Sghaier N, Essemine J, Ayed RB, Gorai M, Ben Marzoug R, Rebai A, Qu M. An Evidence Theory and Fuzzy Logic Combined Approach for the Prediction of Potential ARF-Regulated Genes in Quinoa. PLANTS (BASEL, SWITZERLAND) 2022;12:71. [PMID: 36616201 PMCID: PMC9824623 DOI: 10.3390/plants12010071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 11/26/2022] [Indexed: 06/17/2023]

Xu Y, Chen J, Lyu A, Cheung WK, Zhang L. dynDeepDRIM: a dynamic deep learning model to infer direct regulatory interactions using time-course single-cell gene expression data. Brief Bioinform 2022;23:6720420. [PMID: 36168811 DOI: 10.1093/bib/bbac424] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/02/2022] [Accepted: 09/01/2022] [Indexed: 12/14/2022] Open

Poon SHL, Cheung JJC, Shih KC, Chan YK. A systematic review of multimodal clinical biomarkers in the management of thyroid eye disease. Rev Endocr Metab Disord 2022;23:541-567. [PMID: 35066781 DOI: 10.1007/s11154-021-09702-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/07/2021] [Indexed: 12/25/2022]

Nian X, Li L, Ma X, Li X, Li W, Zhang N, Ohiolei JA, Li L, Dai G, Liu Y, Yan H, Fu B, Xiao S, Jia W. Understanding pathogen–host interplay by expression profiles of lncRNA and mRNA in the liver of Echinococcus multilocularis-infected mice. PLoS Negl Trop Dis 2022;16:e0010435. [PMID: 35639780 PMCID: PMC9187083 DOI: 10.1371/journal.pntd.0010435] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 06/10/2022] [Accepted: 04/20/2022] [Indexed: 11/18/2022] Open

Affiliation(s)

Xiaofeng Nian State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi, P. R. China
Li Li State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
Xusheng Ma State Key Laboratory of Veterinary Etiological Biology, National Foot and Mouth Diseases Reference Laboratory, Key Laboratory of Animal Virology of Ministry of Agriculture, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, P. R. China
Xiurong Li State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
Wenhui Li State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
Nianzhang Zhang State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
John Asekhaen Ohiolei State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
Le Li State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
Guodong Dai State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
Yanhong Liu The Instrument Centre of State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, P. R. China
Hongbin Yan State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China * E-mail: (HY); (SX); (WJ)
Baoquan Fu State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Disease, Yangzhou, Jiangsu, P. R. China
Sa Xiao College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi, P. R. China * E-mail: (HY); (SX); (WJ)
Wanzhong Jia State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Disease, Yangzhou, Jiangsu, P. R. China * E-mail: (HY); (SX); (WJ)

Collapse

Jeong D, Lim S, Lee S, Oh M, Cho C, Seong H, Jung W, Kim S. Construction of Condition-Specific Gene Regulatory Network Using Kernel Canonical Correlation Analysis. Front Genet 2021;12:652623. [PMID: 34093651 PMCID: PMC8172963 DOI: 10.3389/fgene.2021.652623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 03/26/2021] [Indexed: 01/01/2023] Open

Abstract

Gene expression profile or transcriptome can represent cellular states, thus understanding gene regulation mechanisms can help understand how cells respond to external stress. Interaction between transcription factor (TF) and target gene (TG) is one of the representative regulatory mechanisms in cells. In this paper, we present a novel computational method to construct condition-specific transcriptional networks from transcriptome data. Regulatory interaction between TFs and TGs is very complex, specifically multiple-to-multiple relations. Experimental data from TF Chromatin Immunoprecipitation sequencing is useful but produces one-to-multiple relations between TF and TGs. On the other hand, co-expression networks of genes can be useful for constructing condition transcriptional networks, but there are many false positive relations in co-expression networks. In this paper, we propose a novel method to construct a condition-specific and combinatorial transcriptional network, applying kernel canonical correlation analysis (kernel CCA) to identify multiple-to-multiple TF-TG relations in certain biological condition. Kernel CCA is a well-established statistical method for computing the correlation of a group of features vs. another group of features. We, therefore, employed kernel CCA to embed TFs and TGs into a new space where the correlation of TFs and TGs are reflected. To demonstrate the usefulness of our network construction method, we used the blood transcriptome data for the investigation on the response to high fat diet in a human and an arabidopsis data set for the investigation on the response to cold/heat stress. Our method detected not only important regulatory interactions reported in previous studies but also novel TF-TG relations where a module of TF is regulating a module of TGs upon specific stress.

Collapse

Jo K, Santos-Buitrago B, Kim M, Rhee S, Talcott C, Kim S. Logic-based analysis of gene expression data predicts association between TNF, TGFB1 and EGF pathways in basal-like breast cancer. Methods 2020;179:89-100. [PMID: 32445696 DOI: 10.1016/j.ymeth.2020.05.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 04/30/2020] [Accepted: 05/13/2020] [Indexed: 12/16/2022] Open

Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci U S A 2019;116:27151-27158. [PMID: 31822622 DOI: 10.1073/pnas.1911536116] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Zhao Z, Dong Q, Liu X, Wei L, Liu L, Li Y, Wang X. Dynamic transcriptome profiling in DNA damage-induced cellular senescence and transient cell-cycle arrest. Genomics 2019;112:1309-1317. [PMID: 31376528 DOI: 10.1016/j.ygeno.2019.07.020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 04/14/2019] [Accepted: 07/30/2019] [Indexed: 12/13/2022]

Yang HB, Jiang J, Li LL, Yang HQ, Zhang XY. Biomarker identification of thyroid associated ophthalmopathy using microarray data. Int J Ophthalmol 2018;11:1482-1488. [PMID: 30225222 DOI: 10.18240/ijo.2018.09.09] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 01/03/2018] [Indexed: 01/08/2023] Open

Abernathy DG, Kim WK, McCoy MJ, Lake AM, Ouwenga R, Lee SW, Xing X, Li D, Lee HJ, Heuckeroth RO, Dougherty JD, Wang T, Yoo AS. MicroRNAs Induce a Permissive Chromatin Environment that Enables Neuronal Subtype-Specific Reprogramming of Adult Human Fibroblasts. Cell Stem Cell 2018;21:332-348.e9. [PMID: 28886366 DOI: 10.1016/j.stem.2017.08.002] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 06/26/2017] [Accepted: 08/09/2017] [Indexed: 12/19/2022]

Affiliation(s)

Daniel G Abernathy Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA; Program in Developmental, Regenerative, and Stem Cell Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
Woo Kyung Kim Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
Matthew J McCoy Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA; Program in Molecular Genetics & Genomics, Washington University School of Medicine, St. Louis, MO 63110, USA
Allison M Lake Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
Rebecca Ouwenga Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
Seong Won Lee Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
Xiaoyun Xing Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
Daofeng Li Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
Hyung Joo Lee Program in Molecular Genetics & Genomics, Washington University School of Medicine, St. Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
Robert O Heuckeroth Department of Pediatrics, The Perelman School of Medicine at the University of Pennsylvania, and The Children's Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA
Joseph D Dougherty Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
Ting Wang Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
Andrew S Yoo Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.

Collapse

Guo WL, Huang DS. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. MOLECULAR BIOSYSTEMS 2018;13:1827-1837. [PMID: 28718849 DOI: 10.1039/c7mb00155j] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Wang Y, Ung MH, Xia T, Cheng W, Cheng C. Cancer cell line specific co-factors modulate the FOXM1 cistrome. Oncotarget 2017;8:76498-76515. [PMID: 29100329 PMCID: PMC5652723 DOI: 10.18632/oncotarget.20405] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 08/14/2017] [Indexed: 12/11/2022] Open

Chang YM, Ling L, Chang YT, Chang YW, Li WH, Shih ACC, Chen CC. Three TF Co-expression Modules Regulate Pressure-Overload Cardiac Hypertrophy in Male Mice. Sci Rep 2017;7:7560. [PMID: 28790436 PMCID: PMC5548763 DOI: 10.1038/s41598-017-07981-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 07/03/2017] [Indexed: 12/22/2022] Open

Jo K, Jung I, Moon JH, Kim S. Influence maximization in time bounded network identifies transcription factors regulating perturbed pathways. Bioinformatics 2017;32:i128-i136. [PMID: 27307609 PMCID: PMC4908359 DOI: 10.1093/bioinformatics/btw275] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Ruffalo M, Bar-Joseph Z. Genome wide predictions of miRNA regulation by transcription factors. Bioinformatics 2017;32:i746-i754. [PMID: 27587697 DOI: 10.1093/bioinformatics/btw452] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Liu S, Zibetti C, Wan J, Wang G, Blackshaw S, Qian J. Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility. BMC Bioinformatics 2017;18:355. [PMID: 28750606 PMCID: PMC5530957 DOI: 10.1186/s12859-017-1769-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 07/19/2017] [Indexed: 12/04/2022] Open

Abstract

Background

Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events.

Results

We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types.

Conclusion

Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1769-7) contains supplementary material, which is available to authorized users.

Collapse

Trescher S, Münchmeyer J, Leser U. Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization. BMC SYSTEMS BIOLOGY 2017;11:41. [PMID: 28347313 PMCID: PMC5369021 DOI: 10.1186/s12918-017-0419-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 03/08/2017] [Indexed: 12/28/2022]

Abstract

Background

Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation.

Results

Here, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets.

Conclusions

The results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points.

Electronic supplementary material

The online version of this article (doi:10.1186/s12918-017-0419-z) contains supplementary material, which is available to authorized users.

Collapse

Qin Q, Feng J. Imputation for transcription factor binding predictions based on deep learning. PLoS Comput Biol 2017;13:e1005403. [PMID: 28234893 PMCID: PMC5345877 DOI: 10.1371/journal.pcbi.1005403] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 03/10/2017] [Accepted: 02/09/2017] [Indexed: 01/11/2023] Open

Abstract

Understanding the cell-specific binding patterns of transcription factors (TFs) is fundamental to studying gene regulatory networks in biological systems, for which ChIP-seq not only provides valuable data but is also considered as the gold standard. Despite tremendous efforts from the scientific community to conduct TF ChIP-seq experiments, the available data represent only a limited percentage of ChIP-seq experiments, considering all possible combinations of TFs and cell lines. In this study, we demonstrate a method for accurately predicting cell-specific TF binding for TF-cell line combinations based on only a small fraction (4%) of the combinations using available ChIP-seq data. The proposed model, termed TFImpute, is based on a deep neural network with a multi-task learning setting to borrow information across transcription factors and cell lines. Compared with existing methods, TFImpute achieves comparable accuracy on TF-cell line combinations with ChIP-seq data; moreover, TFImpute achieves better accuracy on TF-cell line combinations without ChIP-seq data. This approach can predict cell line specific enhancer activities in K562 and HepG2 cell lines, as measured by massively parallel reporter assays, and predicts the impact of SNPs on TF binding.

Transcription factors play a central role in regulating various cellular processes. They bind to DNA in a cell-specific way. To study where a TF would bind to DNA, ChIP-seq experiment has been developed and widely adopted by the science community to study genome-wide in vivo protein-DNA interactions. However, for each TF, only a limited number of cell types have been explored by ChIP-seq experiments. To study the binding of a TF to a DNA sequence in a cell line without corresponding ChIP-seq data, researchers would check whether there is a motif for the TF in the sequence. However, motif alone contains only sequence information and therefore cannot reflect the cell specificity of TF binding. In this work, we demonstrate how to model the TF binding problem using deep learning and achieve cell specific binding prediction for TF-cell line combinations without ChIP-seq data.

Collapse

Gustafsson M, Gawel DR, Alfredsson L, Baranzini S, Björkander J, Blomgran R, Hellberg S, Eklund D, Ernerudh J, Kockum I, Konstantinell A, Lahesmaa R, Lentini A, Liljenström HRI, Mattson L, Matussek A, Mellergård J, Mendez M, Olsson T, Pujana MA, Rasool O, Serra-Musach J, Stenmarker M, Tripathi S, Viitala M, Wang H, Zhang H, Nestor CE, Benson M. A validated gene regulatory network and GWAS identifies early regulators of T cell-associated diseases. Sci Transl Med 2016;7:313ra178. [PMID: 26560356 DOI: 10.1126/scitranslmed.aad2722] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Affiliation(s)

Mika Gustafsson The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden. Bioinformatics, Department of Physics, Chemistry, and Biology, Linköping University, SE-581 83 Linköping, Sweden.
Danuta R Gawel The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
Lars Alfredsson Institute of Environmental Medicine, Karolinska Institutet, SE-171 77 Solna, Sweden
Sergio Baranzini Department of Neurology, University of California, San Francisco, CA 94158, USA
Janne Björkander Futurum-Academy for Health and Care, County Council of Jönköping, SE-551 85 Jönköping, Sweden
Robert Blomgran Department of Clinical and Experimental Medicine, Division of Microbiology and Molecular Medicine, Linköping University, SE-581 83 Linköping, Sweden
Sandra Hellberg Department of Clinical and Experimental Medicine, Division of Clinical Immunology, Unit of Autoimmunity and Immune Regulation, Linköping University, SE-581 83 Linköping, Sweden
Daniel Eklund Department of Clinical Immunology and Transfusion Medicine, Linköping University, SE-581 83 Linköping, Sweden
Jan Ernerudh Department of Clinical and Experimental Medicine, Division of Clinical Immunology, Unit of Autoimmunity and Immune Regulation, Linköping University, SE-581 83 Linköping, Sweden. Department of Clinical Immunology and Transfusion Medicine, Linköping University, SE-581 83 Linköping, Sweden
Ingrid Kockum Department of Clinical Neurosciences, Karolinska Institutet and Centrum for Molecular Medicine, SE-171 77 Stockholm, Sweden
Aelita Konstantinell The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden. Department of Medical Biology, The Arctic University of Norway, NO-9037 Tromsø, Norway
Riita Lahesmaa Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
Antonio Lentini The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
H Robert I Liljenström The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
Lina Mattson The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
Andreas Matussek Futurum-Academy for Health and Care, County Council of Jönköping, SE-551 85 Jönköping, Sweden
Johan Mellergård Department of Neurology and Department of Clinical and Experimental Medicine, Linköping University, SE-581 83 Linköping, Sweden
Melissa Mendez Laboratorio de Investigación en Enfermedades Infecciosas, LID, Universidad Peruana Cayetano Heredia, Lima PE-15102, Peru
Tomas Olsson Department of Clinical Neurosciences, Karolinska Institutet and Centrum for Molecular Medicine, SE-171 77 Stockholm, Sweden
Miguel A Pujana Program Against Cancer Therapeutic Resistance (ProCURE), Cancer and Systems Biology Unit, Catalan Institute of Oncology, IDIBELL, L'Hospitalet del Llobregat, ES-08908 Barcelona, Spain
Omid Rasool Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
Jordi Serra-Musach Program Against Cancer Therapeutic Resistance (ProCURE), Cancer and Systems Biology Unit, Catalan Institute of Oncology, IDIBELL, L'Hospitalet del Llobregat, ES-08908 Barcelona, Spain
Margaretha Stenmarker Futurum-Academy for Health and Care, County Council of Jönköping, SE-551 85 Jönköping, Sweden
Subhash Tripathi Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
Miro Viitala Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
Hui Wang The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden. Department of Immunology, MD Anderson Cancer Center, Houston, TX 77030, USA
Huan Zhang The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
Colm E Nestor The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
Mikael Benson The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden.

Collapse

Gitter A, Bar-Joseph Z. The SDREM Method for Reconstructing Signaling and Regulatory Response Networks: Applications for Studying Disease Progression. Methods Mol Biol 2016;1303:493-506. [PMID: 26235087 DOI: 10.1007/978-1-4939-2627-5_30] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Tsai DY, Hung KH, Lin IY, Su ST, Wu SY, Chung CH, Wang TC, Li WH, Shih ACC, Lin KI. Uncovering MicroRNA Regulatory Hubs that Modulate Plasma Cell Differentiation. Sci Rep 2015;5:17957. [PMID: 26655851 PMCID: PMC4675970 DOI: 10.1038/srep17957] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/09/2015] [Indexed: 01/08/2023] Open

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast. PLoS Comput Biol 2015;11:e1004418. [PMID: 26291518 PMCID: PMC4546298 DOI: 10.1371/journal.pcbi.1004418] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 06/29/2015] [Indexed: 11/19/2022] Open

Abstract

Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

Identification of transcription factor binding sites based on sequence motifs is typically accompanied by a high false positive rate. Increasing evidence suggests that there are many other factors besides DNA sequence that may affect the binding and interaction of TFs with DNA. Through the integration of sequence motif, chromatin state, and DNA structure properties, we show that TF binding can be better predicted. Moreover, considering chromatin state and DNA structure properties simultaneously yields a significant improvement. While the binding of some TFs can be readily predicted using either chromatin state information or DNA structure, other TFs need both. Thus, our findings provide insights on how different histone modifications and DNA structure properties may influence the binding of a particular TF and thus how TFs regulate gene expression. These features are referred to as sequence “intrinsic properties” because they can be predicted from sequences alone. These intrinsic properties can be used to build a TF binding prediction model that has a similar performance to considering all features. Moreover, the intrinsic property model allows TFBS predictions not only across TFs, but also across DNA-binding domain families that are present in most eukaryotes, suggesting that the model likely can be used across species.

Collapse

Yang J, Ramsey SA. A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites. Bioinformatics 2015;31:3445-50. [PMID: 26130577 DOI: 10.1093/bioinformatics/btv391] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 06/24/2015] [Indexed: 12/13/2022] Open

Imrichová H, Hulselmans G, Atak ZK, Potier D, Aerts S. i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res 2015;43:W57-64. [PMID: 25925574 PMCID: PMC4489282 DOI: 10.1093/nar/gkv395] [Citation(s) in RCA: 125] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 04/15/2015] [Indexed: 12/21/2022] Open

Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state. Nat Commun 2015;6:6683. [PMID: 25865119 PMCID: PMC4403341 DOI: 10.1038/ncomms7683] [Citation(s) in RCA: 294] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 02/16/2015] [Indexed: 12/18/2022] Open

Gong W, Koyano-Nakagawa N, Li T, Garry DJ. Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data. BMC Bioinformatics 2015;16:74. [PMID: 25887857 PMCID: PMC4359553 DOI: 10.1186/s12859-015-0460-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 01/12/2015] [Indexed: 02/07/2023] Open

Abstract

Background

Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion.

Results

We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes.

The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10⁻¹⁰⁰), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately −9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (−9435 to −8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions.

Conclusion

We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0460-0) contains supplementary material, which is available to authorized users.

Collapse

Su D, Wang X, Campbell MR, Song L, Safi A, Crawford GE, Bell DA. Interactions of chromatin context, binding site sequence content, and sequence evolution in stress-induced p53 occupancy and transactivation. PLoS Genet 2015;11:e1004885. [PMID: 25569532 PMCID: PMC4287438 DOI: 10.1371/journal.pgen.1004885] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Accepted: 11/10/2014] [Indexed: 01/10/2023] Open

Abstract

Cellular stresses activate the tumor suppressor p53 protein leading to selective binding to DNA response elements (REs) and gene transactivation from a large pool of potential p53 REs (p53REs). To elucidate how p53RE sequences and local chromatin context interact to affect p53 binding and gene transactivation, we mapped genome-wide binding localizations of p53 and H3K4me3 in untreated and doxorubicin (DXR)-treated human lymphoblastoid cells. We examined the relationships among p53 occupancy, gene expression, H3K4me3, chromatin accessibility (DNase 1 hypersensitivity, DHS), ENCODE chromatin states, p53RE sequence, and evolutionary conservation. We observed that the inducible expression of p53-regulated genes was associated with the steady-state chromatin status of the cell. Most highly inducible p53-regulated genes were suppressed at baseline and marked by repressive histone modifications or displayed CTCF binding. Comparison of p53RE sequences residing in different chromatin contexts demonstrated that weaker p53REs resided in open promoters, while stronger p53REs were located within enhancers and repressed chromatin. p53 occupancy was strongly correlated with similarity of the target DNA sequences to the p53RE consensus, but surprisingly, inversely correlated with pre-existing nucleosome accessibility (DHS) and evolutionary conservation at the p53RE. Occupancy by p53 of REs that overlapped transposable element (TE) repeats was significantly higher (p<10⁻⁷) and correlated with stronger p53RE sequences (p<10⁻¹¹⁰) relative to nonTE-associated p53REs, particularly for MLT1H, LTR10B, and Mer61 TEs. However, binding at these elements was generally not associated with transactivation of adjacent genes. Occupied p53REs located in L2-like TEs were unique in displaying highly negative PhyloP scores (predicted fast-evolving) and being associated with altered H3K4me3 and DHS levels. These results underscore the systematic interaction between chromatin status and p53RE context in the induced transactivation response. This p53 regulated response appears to have been tuned via evolutionary processes that may have led to repression and/or utilization of p53REs originating from primate-specific transposon elements.

It is well established that p53 binds DNA elements near p53 target genes to regulate the response to cellular stress. To assess factors influencing binding to response elements and subsequent gene expression, we have analyzed 2932 p53-occupied response elements (p53REs) in the context of genome-wide chromatin state, DNA accessibility and dynamics, and considered roles for binding-sequence specificity and evolutionary conservation. While p53 occupancy level shows little apparent direct relationship to gene expression change, after grouping expressed genes by their chromatin status at baseline, a relationship between occupancy of p53REs and gene expression change emerged. Analysis of p53RE sequences demonstrated that p53 occupancy was strongly correlated with sequence similarity to p53RE consensus, but surprisingly, was inversely correlated with nucleosome accessibility (DHS) and evolutionary conservation. These data revealed a systematic interaction between p53RE content and chromatin context that affects both quantitative p53 occupancy and the induced transactivation response to exposure. Moreover, this interaction appears to have been tuned via evolutionary events involving transposable elements, which strongly bind p53, but in only a few instances affect gene expression levels. Models of p53-regulated gene expression response that consider both chromatin state and sequence context may prove useful in guiding strategies for cancer prevention or therapy.

Collapse

Jain S, Gitter A, Bar-Joseph Z. Multitask learning of signaling and regulatory networks with application to studying human response to flu. PLoS Comput Biol 2014;10:e1003943. [PMID: 25522349 PMCID: PMC4270428 DOI: 10.1371/journal.pcbi.1003943] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/28/2014] [Indexed: 01/04/2023] Open

Abstract

Reconstructing regulatory and signaling response networks is one of the major goals of systems biology. While several successful methods have been suggested for this task, some integrating large and diverse datasets, these methods have so far been applied to reconstruct a single response network at a time, even when studying and modeling related conditions. To improve network reconstruction we developed MT-SDREM, a multi-task learning method which jointly models networks for several related conditions. In MT-SDREM, parameters are jointly constrained across the networks while still allowing for condition-specific pathways and regulation. We formulate the multi-task learning problem and discuss methods for optimizing the joint target function. We applied MT-SDREM to reconstruct dynamic human response networks for three flu strains: H1N1, H5N1 and H3N2. Our multi-task learning method was able to identify known and novel factors and genes, improving upon prior methods that model each condition independently. The MT-SDREM networks were also better at identifying proteins whose removal affects viral load indicating that joint learning can still lead to accurate, condition-specific, networks. Supporting website with MT-SDREM implementation: http://sb.cs.cmu.edu/mtsdrem

To understand why some flu strains are more virulent than others, researchers attempt to profile and model the molecular human response to these strains and identify similarities and differences between the resulting models. So far, the modeling and analysis part has been done independently for each strain and the results contrasted in a post-processing step. Here we present a new method, termed MT-SDREM, that simultaneously models the response to all strains allowing us to identify both, the core response elements that are shared among the strains, and factors that are uniquely activated or repressed by individual strains. We applied this method to study the human response to three flu strains: H1N1, H3N2 and H5N1. As we show, the method was able to correctly identify several common and known factors regulating immune response to such strains and also identified unique factors for each of the strains. The models reconstructed by the simultaneous analysis method improved upon those generated by methods that model each strain response separately. Our joint models can be used to identify strain specific treatments as well as treatments that are likely to be effective against all three strains.

Collapse

Wise A, Bar-Joseph Z. SMARTS: reconstructing disease response networks from multiple individuals using time series gene expression data. ACTA ACUST UNITED AC 2014;31:1250-7. [PMID: 25480376 DOI: 10.1093/bioinformatics/btu800] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2014] [Accepted: 11/26/2014] [Indexed: 02/02/2023]

Nguyen N, Vo A, Choi I, Won KJ. A stationary wavelet entropy-based clustering approach accurately predicts gene expression. J Comput Biol 2014;22:236-49. [PMID: 25383910 DOI: 10.1089/cmb.2014.0221] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Nguyen N, Vo A, Won KJ. A wavelet approach to detect enriched regions and explore epigenomic landscapes. J Comput Biol 2014;21:846-54. [PMID: 25072902 DOI: 10.1089/cmb.2014.0095] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Nie Y, Cheng X, Chen J, Sun X. Nucleosome organization in the vicinity of transcription factor binding sites in the human genome. BMC Genomics 2014;15:493. [PMID: 24942981 PMCID: PMC4073502 DOI: 10.1186/1471-2164-15-493] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Accepted: 06/10/2014] [Indexed: 12/23/2022] Open

Abstract

Background

The binding of transcription factors (TFs) to specific DNA sequences is an initial and crucial step of transcription. In eukaryotes, this process is highly dependent on the local chromatin state, which can be modified by recruiting chromatin remodelers. However, previous studies have focused mainly on nucleosome occupancy around the TF binding sites (TFBSs) of a few specific TFs. Here, we investigated the nucleosome occupancy profiles around computationally inferred binding sites, based on 519 TF binding motifs, in human GM12878 and K562 cells.

Results

Although high nucleosome occupancy is intrinsically encoded at TFBSs in vitro, nucleosomes are generally depleted at TFBSs in vivo, and approximately a quarter of TFBSs showed well-positioned in vivo nucleosomes on both sides. RNA polymerase near the transcription start site (TSS) has a large effect on the nucleosome occupancy distribution around the binding sites located within one kilobase to the nearest TSS; fuzzier nucleosome positioning was thus observed around these sites. In addition, in contrast to yeast, repressors, rather than activators, were more likely to bind to nucleosomal DNA in the human cells, and nucleosomes around repressor sites were better positioned in vivo. Genes with repressor sites exhibiting well-positioned nucleosomes on both sides, and genes with activator sites occupied by nucleosomes had significantly lower expression, suggesting that actions of activators and repressors are associated with the nucleosome occupancy around their binding sites. It was also interesting to note that most of the binding sites, which were not in the DNase I-hypersensitive regions, were cell-type specific, and higher in vivo nucleosome occupancy were observed at these binding sites.

Conclusions

This study demonstrated that RNA polymerase and the functions of bound TFs affected the local nucleosome occupancy around TFBSs, and nucleosome occupancy patterns around TFBSs were associated with the expression levels of target genes.

Electronic supplementary material

The online version of this article (doi: 10.1186/1471-2164-15-493) contains supplementary material, which is available to authorized users.

Collapse

The E2F transcription factors regulate tumor development and metastasis in a mouse model of metastatic breast cancer. Mol Cell Biol 2014;34:3229-43. [PMID: 24934442 DOI: 10.1128/mcb.00737-14] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Santra T. A bayesian framework that integrates heterogeneous data for inferring gene regulatory networks. Front Bioeng Biotechnol 2014;2:13. [PMID: 25152886 PMCID: PMC4126456 DOI: 10.3389/fbioe.2014.00013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 04/28/2014] [Indexed: 11/29/2022] Open

Nygård S, Reitan T, Clancy T, Nygaard V, Bjørnstad J, Skrbic B, Tønnessen T, Christensen G, Hovig E. Identifying pathogenic processes by integrating microarray data with prior knowledge. BMC Bioinformatics 2014;15:115. [PMID: 24758699 PMCID: PMC4006456 DOI: 10.1186/1471-2105-15-115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Accepted: 04/09/2014] [Indexed: 11/30/2022] Open

Abstract

BACKGROUND

It is of great importance to identify molecular processes and pathways that are involved in disease etiology. Although there has been an extensive use of various high-throughput methods for this task, pathogenic pathways are still not completely understood. Often the set of genes or proteins identified as altered in genome-wide screens show a poor overlap with canonical disease pathways. These findings are difficult to interpret, yet crucial in order to improve the understanding of the molecular processes underlying the disease progression. We present a novel method for identifying groups of connected molecules from a set of differentially expressed genes. These groups represent functional modules sharing common cellular function and involve signaling and regulatory events. Specifically, our method makes use of Bayesian statistics to identify groups of co-regulated genes based on the microarray data, where external information about molecular interactions and connections are used as priors in the group assignments. Markov chain Monte Carlo sampling is used to search for the most reliable grouping.

RESULTS

Simulation results showed that the method improved the ability of identifying correct groups compared to traditional clustering, especially for small sample sizes. Applied to a microarray heart failure dataset the method found one large cluster with several genes important for the structure of the extracellular matrix and a smaller group with many genes involved in carbohydrate metabolism. The method was also applied to a microarray dataset on melanoma cancer patients with or without metastasis, where the main cluster was dominated by genes related to keratinocyte differentiation.

CONCLUSION

Our method found clusters overlapping with known pathogenic processes, but also pointed to new connections extending beyond the classical pathways.

Collapse

Levinson M, Zhou Q. A penalized Bayesian approach to predicting sparse protein-DNA binding landscapes. ACTA ACUST UNITED AC 2014;30:636-43. [PMID: 24115169 DOI: 10.1093/bioinformatics/btt585] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Transcription factor binding sites prediction based on modified nucleosomes. PLoS One 2014;9:e89226. [PMID: 24586611 PMCID: PMC3931712 DOI: 10.1371/journal.pone.0089226] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 01/17/2014] [Indexed: 11/19/2022] Open

Abstract

In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, “modified nucleosomes neighboring” and “modified nucleosomes occupancy”, to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method for TFBS prediction.

Collapse

Gitter A, Bar-Joseph Z. Identifying proteins controlling key disease signaling pathways. Bioinformatics 2013;29:i227-36. [PMID: 23812988 PMCID: PMC3694658 DOI: 10.1093/bioinformatics/btt241] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Yang TH, Wu WS. Inferring functional transcription factor-gene binding pairs by integrating transcription factor binding data with transcription factor knockout data. BMC SYSTEMS BIOLOGY 2013;7 Suppl 6:S13. [PMID: 24565265 PMCID: PMC4029220 DOI: 10.1186/1752-0509-7-s6-s13] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Abstract

Background

Chromatin immunoprecipitation (ChIP) experiments are now the most comprehensive experimental approaches for mapping the binding of transcription factors (TFs) to their target genes. However, ChIP data alone is insufficient for identifying functional binding target genes of TFs for two reasons. First, there is an inherent high false positive/negative rate in ChIP-chip or ChIP-seq experiments. Second, binding signals in the ChIP data do not necessarily imply functionality.

Methods

It is known that ChIP-chip data and TF knockout (TFKO) data reveal complementary information on gene regulation. While ChIP-chip data can provide TF-gene binding pairs, TFKO data can provide TF-gene regulation pairs. Therefore, we propose a novel network approach for identifying functional TF-gene binding pairs by integrating the ChIP-chip data with the TFKO data. In our method, a TF-gene binding pair from the ChIP-chip data is regarded to be functional if it also has high confident curated TFKO TF-gene regulatory relation or deduced hypostatic TF-gene regulatory relation.

Results and conclusions

We first validated our method on a gathered ground truth set. Then we applied our method to the ChIP-chip data to identify functional TF-gene binding pairs. The biological significance of our identified functional TF-gene binding pairs was shown by assessing their functional enrichment, the prevalence of protein-protein interaction, and expression coherence. Our results outperformed the results of three existing methods across all measures. And our identified functional targets of TFs also showed statistical significance over the randomly assigned TF-gene pairs. We also showed that our method is dataset independent and can apply to ChIP-seq data and the E. coli genome. Finally, we provided an example showing the biological applicability of our notion.

Collapse

Chen CC, Xiao S, Xie D, Cao X, Song CX, Wang T, He C, Zhong S. Understanding variation in transcription factor binding by modeling transcription factor genome-epigenome interactions. PLoS Comput Biol 2013;9:e1003367. [PMID: 24339764 PMCID: PMC3854512 DOI: 10.1371/journal.pcbi.1003367] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Accepted: 10/15/2013] [Indexed: 12/20/2022] Open

Abstract

Despite explosive growth in genomic datasets, the methods for studying epigenomic mechanisms of gene regulation remain primitive. Here we present a model-based approach to systematically analyze the epigenomic functions in modulating transcription factor-DNA binding. Based on the first principles of statistical mechanics, this model considers the interactions between epigenomic modifications and a cis-regulatory module, which contains multiple binding sites arranged in any configurations. We compiled a comprehensive epigenomic dataset in mouse embryonic stem (mES) cells, including DNA methylation (MeDIP-seq and MRE-seq), DNA hydroxymethylation (5-hmC-seq), and histone modifications (ChIP-seq). We discovered correlations of transcription factors (TFs) for specific combinations of epigenomic modifications, which we term epigenomic motifs. Epigenomic motifs explained why some TFs appeared to have different DNA binding motifs derived from in vivo (ChIP-seq) and in vitro experiments. Theoretical analyses suggested that the epigenome can modulate transcriptional noise and boost the cooperativity of weak TF binding sites. ChIP-seq data suggested that epigenomic boost of binding affinities in weak TF binding sites can function in mES cells. We showed in theory that the epigenome should suppress the TF binding differences on SNP-containing binding sites in two people. Using personal data, we identified strong associations between H3K4me2/H3K9ac and the degree of personal differences in NFκB binding in SNP-containing binding sites, which may explain why some SNPs introduce much smaller personal variations on TF binding than other SNPs. In summary, this model presents a powerful approach to analyze the functions of epigenomic modifications. This model was implemented into an open source program APEG (Affinity Prediction by Epigenome and Genome, http://systemsbio.ucsd.edu/apeg).

Collapse

Lee H, Flaherty P, Ji HP. Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis. BMC Med Genomics 2013;6:54. [PMID: 24308539 PMCID: PMC3907018 DOI: 10.1186/1755-8794-6-54] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 11/27/2013] [Indexed: 12/12/2022] Open

Abstract

BACKGROUND

Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis.

METHODS

We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities.

RESULTS

A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis.

CONCLUSIONS

We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression.

Collapse

Zhong S, He X, Bar-Joseph Z. Predicting tissue specific transcription factor binding sites. BMC Genomics 2013;14:796. [PMID: 24238150 PMCID: PMC3898213 DOI: 10.1186/1471-2164-14-796] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 11/06/2013] [Indexed: 12/13/2022] Open

LASAGNA-Search: an integrated web tool for transcription factor binding site search and visualization. Biotechniques 2013;54:141-53. [PMID: 23599922 DOI: 10.2144/000113999] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Chen YC, Cheng JH, Tsai ZTY, Tsai HK, Chuang TJ. The impact of trans-regulation on the evolutionary rates of metazoan proteins. Nucleic Acids Res 2013;41:6371-80. [PMID: 23658220 PMCID: PMC3711421 DOI: 10.1093/nar/gkt349] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 04/10/2013] [Accepted: 04/14/2013] [Indexed: 11/13/2022] Open

Association between the PTPN2 gene and Crohn's disease: dissection of potential causal variants. Inflamm Bowel Dis 2013;19:1149-55. [PMID: 23518806 DOI: 10.1097/mib.0b013e318280b181] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Lim JH, Iggo RD, Barker D. Models incorporating chromatin modification data identify functionally important p53 binding sites. Nucleic Acids Res 2013;41:5582-93. [PMID: 23599002 PMCID: PMC3675478 DOI: 10.1093/nar/gkt260] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Nie Y, Liu H, Sun X. The patterns of histone modifications in the vicinity of transcription factor binding sites in human lymphoblastoid cell lines. PLoS One 2013;8:e60002. [PMID: 23527292 PMCID: PMC3602107 DOI: 10.1371/journal.pone.0060002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Accepted: 02/25/2013] [Indexed: 01/12/2023] Open

Abstract

Transcription factor (TF) binding at specific DNA sequences is the fundamental step in transcriptional regulation and is highly dependent on the chromatin structure context, which may be affected by specific histone modifications and variants, known as histone marks. The lack of a global binding map for hundreds of TFs means that previous studies have focused mainly on histone marks at binding sites for several specific TFs. We therefore studied 11 histone marks around computationally-inferred and experimentally-determined TF binding sites (TFBSs), based on 164 and 34 TFs, respectively, in human lymphoblastoid cell lines. For H2A.Z, methylation of H3K4, and acetylation of H3K27 and H3K9, the mark patterns exhibited bimodal distributions and strong pairwise correlations in the 600-bp region around enriched TFBSs, suggesting that these marks mainly coexist within the two nucleosomes proximal to the TF sites. TFs competing with nucleosomes to access DNA at most binding sites, contributes to the bimodal distribution, which is a common feature of histone marks for TF binding. Mark H3K79me2 showed a unimodal distribution on one side of TFBSs and the signals extended up to 4000 bp, indicating a longer-distance pattern. Interestingly, H4K20me1, H3K27me3, H3K36me3 and H3K9me3, which were more diffuse and less enriched surrounding TFBSs, showed unimodal distributions around the enriched TFBSs, suggesting that some TFs may bind to nucleosomal DNA. Besides, asymmetrical distributions of H3K36me3 and H3K9me3 indicated that repressors might establish a repressive chromatin structure in one direction to repress gene expression. In conclusion, this study demonstrated the ranges of histone marks associated with TF binding, and the common features of these marks around the binding sites. These findings have epigenetic implications for future analysis of regulatory elements.

Collapse

Won KJ, Zhang X, Wang T, Ding B, Raha D, Snyder M, Ren B, Wang W. Comparative annotation of functional regions in the human genome using epigenomic data. Nucleic Acids Res 2013;41:4423-32. [PMID: 23482391 PMCID: PMC3632130 DOI: 10.1093/nar/gkt143] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Kim H, Gelenbe E. Reconstruction of large-scale gene regulatory networks using Bayesian model averaging. IEEE Trans Nanobioscience 2013;11:259-65. [PMID: 22987132 DOI: 10.1109/tnb.2012.2214233] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]