1
|
Baumgarten N, Schmidt F, Schulz MH. Improved linking of motifs to their TFs using domain information. Bioinformatics 2020; 36:1655-1662. [PMID: 31742324 PMCID: PMC7703792 DOI: 10.1093/bioinformatics/btz855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 11/08/2019] [Accepted: 11/16/2019] [Indexed: 11/23/2022] Open
Abstract
Motivation A central aim of molecular biology is to identify mechanisms of transcriptional regulation. Transcription factors (TFs), which are DNA-binding proteins, are highly involved in these processes, thus a crucial information is to know where TFs interact with DNA and to be aware of the TFs’ DNA-binding motifs. For that reason, computational tools exist that link DNA-binding motifs to TFs either without sequence information or based on TF-associated sequences, e.g. identified via a chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiment. In this paper, we present MASSIF, a novel method to improve the performance of existing tools that link motifs to TFs relying on TF-associated sequences. MASSIF is based on the idea that a DNA-binding motif, which is correctly linked to a TF, should be assigned to a DNA-binding domain (DBD) similar to that of the mapped TF. Because DNA-binding motifs are in general not linked to DBDs, it is not possible to compare the DBD of a TF and the motif directly. Instead we created a DBD collection, which consist of TFs with a known DBD and an associated motif. This collection enables us to evaluate how likely it is that a linked motif and a TF of interest are associated to the same DBD. We named this similarity measure domain score, and represent it as a P-value. We developed two different ways to improve the performance of existing tools that link motifs to TFs based on TF-associated sequences: (i) using meta-analysis to combine P-values from one or several of these tools with the P-value of the domain score and (ii) filter unlikely motifs based on the domain score. Results We demonstrate the functionality of MASSIF on several human ChIP-seq datasets, using either motifs from the HOCOMOCO database or de novo identified ones as input motifs. In addition, we show that both variants of our method improve the performance of tools that link motifs to TFs based on TF-associated sequences significantly independent of the considered DBD type. Availability and implementation MASSIF is freely available online at https://github.com/SchulzLab/MASSIF. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nina Baumgarten
- Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main 60590, Germany.,German Center for Cardiovascular Regeneration, Partner Site Rhein-Main, Frankfurt am Main 60590, Germany
| | - Florian Schmidt
- High-throughput Genomics & Systems Biology, Cluster of Excellence MMCI, Saarland University.,Research Group Computational Biology, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken 66123, Germany
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main 60590, Germany.,German Center for Cardiovascular Regeneration, Partner Site Rhein-Main, Frankfurt am Main 60590, Germany.,High-throughput Genomics & Systems Biology, Cluster of Excellence MMCI, Saarland University.,Research Group Computational Biology, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken 66123, Germany
| |
Collapse
|
2
|
He D, Jiang Z, Tian Y, Han H, Xia M, Wei W, Zhang L, Chen J. Genetic variants in IL15 promoter affect transcription activity and intramuscular fat deposition in longissimus dorsi muscle of pigs. Anim Genet 2017; 49:19-28. [PMID: 29168191 DOI: 10.1111/age.12611] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/04/2017] [Indexed: 01/11/2023]
Abstract
Intramuscular fat (IMF) content is a key aspect of pork quality. Elucidation of intramuscular adipocyte regulation mechanisms is important for improving IMF content. Intramuscular adipocytes are dispersed among muscle fibers, so they are inclined to be affected by muscle-derived factors. Interleukin-15 is a major muscle-secreted factor. In this study, the genetic and physiological impacts of IL15 on adipogenesis is investigated. The promoter region of IL15 was scanned by comparative sequencing using two DNA pools of high- and low-IMF individuals. Two SNPs, c.-342C>T (ss2137497757) and c.-334G>A (ss2137497756) (the translation start site is designated as +1), were identified with reverse allele distribution in these two groups. Genotyping by allele-specific PCR revealed that the two SNPs were completely linked. The IMF content of TA/TA individuals was lower than that for CG/CG ones, whereas the IL15 expression level was higher in T-A/T-A individuals. Luciferase assaying also revealed that the T-A haplotype promoter had higher transcription activity. Meanwhile, the effect of interleukin-15 on adipocyte differentiation was further assessed in vitro. Results showed that interleukin-15 suppressed preadipocyte proliferation in a dose-dependent manner. The cell cycle of preadipocytes was arrested, and apoptosis was induced. Oil Red O staining and triglyceride quantification indicated that adipocyte differentiation was also inhibited by interleukin-15. The mRNA levels of PPARG and FABP4 decreased markably upon interleukin-15 treatment. Taken together, we identified two completely linked SNPs in the porcine IL15 promoter region that could alter IL15 transcription activity. As interleukin-15 can inhibit porcine adipocyte differentiation, these promoter mutations could affect IMF deposition by producing differential levels of muscle-derived interleukin-15.
Collapse
Affiliation(s)
- D He
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China
| | - Z Jiang
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China
| | - Y Tian
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China
| | - H Han
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China
| | - M Xia
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China
| | - W Wei
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China
| | - L Zhang
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China
| | - J Chen
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, 210095, China
| |
Collapse
|
3
|
Zamanighomi M, Lin Z, Wang Y, Jiang R, Wong WH. Predicting transcription factor binding motifs from DNA-binding domains, chromatin accessibility and gene expression data. Nucleic Acids Res 2017; 45:5666-5677. [PMID: 28472398 PMCID: PMC5449588 DOI: 10.1093/nar/gkx358] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 04/20/2017] [Indexed: 01/08/2023] Open
Abstract
Transcription factors (TFs) play crucial roles in regulating gene expression through interactions with specific DNA sequences. Recently, the sequence motif of almost 400 human TFs have been identified using high-throughput SELEX sequencing. However, there remain a large number of TFs (∼800) with no high-throughput-derived binding motifs. Computational methods capable of associating known motifs to such TFs will avoid tremendous experimental efforts and enable deeper understanding of transcriptional regulatory functions. We present a method to associate known motifs to TFs (MATLAB code is available in Supplementary Materials). Our method is based on a probabilistic framework that not only exploits DNA-binding domains and specificities, but also integrates open chromatin, gene expression and genomic data to accurately infer monomeric and homodimeric binding motifs. Our analysis resulted in the assignment of motifs to 200 TFs with no SELEX-derived motifs, roughly a 50% increase compared to the existing coverage.
Collapse
Affiliation(s)
- Mahdi Zamanighomi
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Zhixiang Lin
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Yong Wang
- Academy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100190, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
4
|
An overview of the prediction of protein DNA-binding sites. Int J Mol Sci 2015; 16:5194-215. [PMID: 25756377 PMCID: PMC4394471 DOI: 10.3390/ijms16035194] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 02/21/2015] [Accepted: 02/27/2015] [Indexed: 02/06/2023] Open
Abstract
Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.
Collapse
|
5
|
Dang XY, Chu WW, Shi HC, Yu SG, Han HY, Gu SH, Chen J. Genetic variants in ABCA1 promoter affect transcription activity and plasma HDL level in pigs. Gene 2014; 555:414-20. [PMID: 25445391 DOI: 10.1016/j.gene.2014.11.041] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 11/11/2014] [Accepted: 11/19/2014] [Indexed: 01/03/2023]
Abstract
Excess accumulation of cholesterol in plasma may result in coronary artery disease. Numerous studies have demonstrated that ATP-binding cassette protein A1 (ABCA1) mediates the efflux of cholesterol and phospholipids to apolipoproteins, a process necessary for plasma high density lipoprotein (HDL) formation. Higher plasma levels of HDL are associated with lower risk for cardiovascular disease. Studies of human disease and animal models had shown that an increased hepatic ABCA1 activity relates to an enhanced plasma HDL level. In this study, we hypothesized that functional mutations in the ABCA1 promoter in pigs may affect gene transcription activity, and consequently the HDL level in plasma. The promoter region of ABCA1 was comparatively scanned by direct sequencing with pool DNA of high- and low-HDL groups (n=30 for each group). Two polymorphisms, c. - 608A>G and c. - 418T>A, were revealed with reverse allele distribution in the two groups. The two polymorphisms were completely linked and formed only G-A or A-T haplotypes when genotyped in a larger population (n=526). Furthermore, we found that the G-A/G-A genotype was associated with higher HDL and ABCA1 mRNA level than A-T/A-T genotype. Luciferase assay also revealed that G-A haplotype promoter had higher activity than A-T haplotype. Single-nucleotide mutant assay showed that c.-418T>A was the causal mutation for ABCA1 transcription activity alteration. Conclusively, we identified two completely linked SNPs in porcine ABCA1 promoter region which have influence on the plasma HDL level by altering ABCA1 gene transcriptional activity.
Collapse
Affiliation(s)
- Xiao-yong Dang
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Wei-wei Chu
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Heng-chuan Shi
- Laboratory Department, Jiangsu Province Official Hospital, Nanjing 210024, PR China
| | - Shi-gang Yu
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Hai-yin Han
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Shu-Hua Gu
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Jie Chen
- College of Animal Science and Technology, Nanjing Agricultural University, Nanjing 210095, PR China.
| |
Collapse
|
6
|
Eichner J, Topf F, Dräger A, Wrzodek C, Wanke D, Zell A. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors. PLoS One 2013; 8:e82238. [PMID: 24349230 PMCID: PMC3861411 DOI: 10.1371/journal.pone.0082238] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 10/21/2013] [Indexed: 11/18/2022] Open
Abstract
One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.
Collapse
Affiliation(s)
- Johannes Eichner
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
- * E-mail:
| | - Florian Topf
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| | - Andreas Dräger
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
- University of California San Diego, La Jolla, California, United States of America
| | - Clemens Wrzodek
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| | - Dierk Wanke
- Center for Plant Physiology Tuebingen (ZMBP), University of Tuebingen, Tübingen, Germany
| | - Andreas Zell
- Center of Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany
| |
Collapse
|
7
|
Brand LH, Henneges C, Schüssler A, Kolukisaoglu HÜ, Koch G, Wallmeroth N, Hecker A, Thurow K, Zell A, Harter K, Wanke D. Screening for protein-DNA interactions by automatable DNA-protein interaction ELISA. PLoS One 2013; 8:e75177. [PMID: 24146751 PMCID: PMC3795721 DOI: 10.1371/journal.pone.0075177] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 08/12/2013] [Indexed: 12/22/2022] Open
Abstract
DNA-binding proteins (DBPs), such as transcription factors, constitute about 10% of the protein-coding genes in eukaryotic genomes and play pivotal roles in the regulation of chromatin structure and gene expression by binding to short stretches of DNA. Despite their number and importance, only for a minor portion of DBPs the binding sequence had been disclosed. Methods that allow the de novo identification of DNA-binding motifs of known DBPs, such as protein binding microarray technology or SELEX, are not yet suited for high-throughput and automation. To close this gap, we report an automatable DNA-protein-interaction (DPI)-ELISA screen of an optimized double-stranded DNA (dsDNA) probe library that allows the high-throughput identification of hexanucleotide DNA-binding motifs. In contrast to other methods, this DPI-ELISA screen can be performed manually or with standard laboratory automation. Furthermore, output evaluation does not require extensive computational analysis to derive a binding consensus. We could show that the DPI-ELISA screen disclosed the full spectrum of binding preferences for a given DBP. As an example, AtWRKY11 was used to demonstrate that the automated DPI-ELISA screen revealed the entire range of in vitro binding preferences. In addition, protein extracts of AtbZIP63 and the DNA-binding domain of AtWRKY33 were analyzed, which led to a refinement of their known DNA-binding consensi. Finally, we performed a DPI-ELISA screen to disclose the DNA-binding consensus of a yet uncharacterized putative DBP, AtTIFY1. A palindromic TGATCA-consensus was uncovered and we could show that the GATC-core is compulsory for AtTIFY1 binding. This specific interaction between AtTIFY1 and its DNA-binding motif was confirmed by in vivo plant one-hybrid assays in protoplasts. Thus, the value and applicability of the DPI-ELISA screen for de novo binding site identification of DBPs, also under automatized conditions, is a promising approach for a deeper understanding of gene regulation in any organism of choice.
Collapse
Affiliation(s)
- Luise H. Brand
- Plant Physiology, Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany
| | - Carsten Henneges
- Cognitive Systems, Center for Bioinformatics, University of Tuebingen, Tuebingen, Germany
| | - Axel Schüssler
- Cognitive Systems, Center for Bioinformatics, University of Tuebingen, Tuebingen, Germany
| | - H. Üner Kolukisaoglu
- Plant Physiology, Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany
- Center for Life Science Automation, Rostock, Germany
| | - Grit Koch
- Center for Life Science Automation, Rostock, Germany
| | - Niklas Wallmeroth
- Plant Physiology, Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany
| | - Andreas Hecker
- Plant Physiology, Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany
| | | | - Andreas Zell
- Cognitive Systems, Center for Bioinformatics, University of Tuebingen, Tuebingen, Germany
| | - Klaus Harter
- Plant Physiology, Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany
| | - Dierk Wanke
- Plant Physiology, Center for Plant Molecular Biology, University of Tuebingen, Tuebingen, Germany
- * E-mail:
| |
Collapse
|
8
|
Schröder A, Wollnik J, Wrzodek C, Dräger A, Bonin M, Burk O, Thomas M, Thasler WE, Zanger UM, Zell A. Inferring statin-induced gene regulatory relationships in primary human hepatocytes. Bioinformatics 2011; 27:2473-7. [DOI: 10.1093/bioinformatics/btr416] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|