1
|
Levitsky VG, Raditsa VV, Tsukanov AV, Mukhin AM, Zhimulev IF, Merkulova TI. Asymmetry of Motif Conservation Within Their Homotypic Pairs Distinguishes DNA-Binding Domains of Target Transcription Factors in ChIP-Seq Data. Int J Mol Sci 2025; 26:386. [PMID: 39796242 PMCID: PMC11720554 DOI: 10.3390/ijms26010386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 01/03/2025] [Indexed: 01/13/2025] Open
Abstract
Transcription factors (TFs) are the main regulators of eukaryotic gene expression. The cooperative binding of at least two TFs to genomic DNA is a major mechanism of transcription regulation. Massive analysis of the co-occurrence of overrepresented pairs of motifs for different target TFs studied in ChIP-seq experiments can clarify the mechanisms of TF cooperation. We categorized the target TFs from M. musculus ChIP-seq and A. thaliana ChIP-seq/DAP-seq experiments according to the structure of their DNA-binding domains (DBDs) into classes. We studied homotypic pairs of motifs, using the same recognition model for each motif. Asymmetric and symmetric pairs consist of motifs of remote and close recognition scores. We found that asymmetric pairs of motifs predominate for all TF classes. TFs from the murine/plant 'Basic helix-loop-helix (bHLH)', 'Basic leucine zipper (bZIP)', and 'Tryptophan cluster' classes and murine 'p53 domain' and 'Rel homology region' classes showed the highest enrichment of asymmetric homotypic pairs of motifs. Pioneer TFs, despite their DBD types, have a higher significance of asymmetry within homotypic pairs of motifs compared to other TFs. Asymmetry within homotypic CEs is a promising new feature decrypting the mechanisms of gene transcription regulation.
Collapse
Affiliation(s)
- Victor G. Levitsky
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia; (V.V.R.); (A.V.T.); (A.M.M.); (T.I.M.)
- Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Vladimir V. Raditsa
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia; (V.V.R.); (A.V.T.); (A.M.M.); (T.I.M.)
| | - Anton V. Tsukanov
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia; (V.V.R.); (A.V.T.); (A.M.M.); (T.I.M.)
- Institute of Molecular and Cellular Biology, Novosibirsk 630090, Russia;
| | - Aleksey M. Mukhin
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia; (V.V.R.); (A.V.T.); (A.M.M.); (T.I.M.)
| | - Igor F. Zhimulev
- Institute of Molecular and Cellular Biology, Novosibirsk 630090, Russia;
| | - Tatyana I. Merkulova
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia; (V.V.R.); (A.V.T.); (A.M.M.); (T.I.M.)
- Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| |
Collapse
|
2
|
Tsukanov AV, Mironova VV, Levitsky VG. Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis. FRONTIERS IN PLANT SCIENCE 2022; 13:938545. [PMID: 35968123 PMCID: PMC9373801 DOI: 10.3389/fpls.2022.938545] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 07/05/2022] [Indexed: 05/15/2023]
Abstract
Position weight matrix (PWM) is the traditional motif model representing the transcription factor (TF) binding sites. It proposes that the positions contribute independently to TFs binding affinity, although this hypothesis does not fit the data perfectly. This explains why PWM hits are missing in a substantial fraction of ChIP-seq peaks. To study various modes of the direct binding of plant TFs, we compiled the benchmark collection of 111 ChIP-seq datasets for Arabidopsis thaliana, and applied the traditional PWM, and two alternative motif models BaMM and SiteGA, proposing the dependencies of the positions. The variation in the stringency of the recognition thresholds for the models proposed that the hits of PWM, BaMM, and SiteGA models are associated with the sites of high/medium, any, and low affinity, respectively. At the medium recognition threshold, about 60% of ChIP-seq peaks contain PWM hits consisting of conserved core consensuses, while BaMM and SiteGA provide hits for an additional 15% of peaks in which a weaker core consensus is compensated through intra-motif dependencies. The presence/absence of these dependencies in the motifs of alternative/traditional models was confirmed by the dependency logo DepLogo visualizing the position-wise partitioning of the alignments of predicted sites. We exemplify the detailed analysis of ChIP-seq profiles for plant TFs CCA1, MYC2, and SEP3. Gene ontology (GO) enrichment analysis revealed that among the three motif models, the SiteGA had the highest portions of genes with the significantly enriched GO terms among all predicted genes. We showed that both alternative motif models provide for traditional PWM greater extensions in predicted sites for TFs MYC2/SEP3 with condition/tissue specific functions, compared to those for TF CCA1 with housekeeping functions. Overall, the combined application of standard and alternative motif models is beneficial to detect various modes of the direct TF-DNA interactions in the maximal portion of ChIP-seq loci.
Collapse
Affiliation(s)
- Anton V. Tsukanov
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
| | - Victoria V. Mironova
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
- Department of Plant Systems Physiology, Radboud Institute for Biological and Environmental Sciences (RIBES), Radboud University, Nijmegen, Netherlands
| | - Victor G. Levitsky
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
- Department of Natural Science, Novosibirsk State University, Novosibirsk, Russia
- *Correspondence: Victor G. Levitsky
| |
Collapse
|
3
|
Tsukanov AV, Levitsky VG, Merkulova TI. Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites. Vavilovskii Zhurnal Genet Selektsii 2021; 25:7. [PMID: 34547062 PMCID: PMC8408018 DOI: 10.18699/vj21.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/10/2021] [Accepted: 01/12/2021] [Indexed: 11/24/2022] Open
Abstract
The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS)
is the positional weight matrix (PWM). However, this model does not take into account dependencies between
nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe,
can do as much. However, application of these models was usually limited only to comparing their recognition
accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This
pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their
classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered
PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a
significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was
26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of
predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks
containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe,
respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity.
We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq
datasets under study.
Collapse
Affiliation(s)
- A V Tsukanov
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
4
|
Levitsky V, Zemlyanskaya E, Oshchepkov D, Podkolodnaya O, Ignatieva E, Grosse I, Mironova V, Merkulova T. A single ChIP-seq dataset is sufficient for comprehensive analysis of motifs co-occurrence with MCOT package. Nucleic Acids Res 2020; 47:e139. [PMID: 31750523 PMCID: PMC6868382 DOI: 10.1093/nar/gkz800] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 08/12/2019] [Accepted: 09/09/2019] [Indexed: 01/20/2023] Open
Abstract
Recognition of composite elements consisting of two transcription factor binding sites gets behind the studies of tissue-, stage- and condition-specific transcription. Genome-wide data on transcription factor binding generated with ChIP-seq method facilitate an identification of composite elements, but the existing bioinformatics tools either require ChIP-seq datasets for both partner transcription factors, or omit composite elements with motifs overlapping. Here we present an universal Motifs Co-Occurrence Tool (MCOT) that retrieves maximum information about overrepresented composite elements from a single ChIP-seq dataset. This includes homo- and heterotypic composite elements of four mutual orientations of motifs, separated with a spacer or overlapping, even if recognition of motifs within composite element requires various stringencies. Analysis of 52 ChIP-seq datasets for 18 human transcription factors confirmed that for over 60% of analyzed datasets and transcription factors predicted co-occurrence of motifs implied experimentally proven protein-protein interaction of respecting transcription factors. Analysis of 164 ChIP-seq datasets for 57 mammalian transcription factors showed that abundance of predicted composite elements with an overlap of motifs compared to those with a spacer more than doubled; and they had 1.5-fold increase of asymmetrical pairs of motifs with one more conservative 'leading' motif and another one 'guided'.
Collapse
Affiliation(s)
- Victor Levitsky
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.,Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Elena Zemlyanskaya
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.,Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Dmitry Oshchepkov
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Olga Podkolodnaya
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Elena Ignatieva
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.,Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Ivo Grosse
- Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia.,Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.,German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Leipzig, Germany
| | - Victoria Mironova
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.,Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Tatyana Merkulova
- Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia.,Department of Molecular Genetics, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| |
Collapse
|
5
|
Spatial specificity of auxin responses coordinates wood formation. Nat Commun 2018; 9:875. [PMID: 29491423 PMCID: PMC5830446 DOI: 10.1038/s41467-018-03256-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 01/31/2018] [Indexed: 12/21/2022] Open
Abstract
Spatial organization of signalling events of the phytohormone auxin is fundamental for maintaining a dynamic transition from plant stem cells to differentiated descendants. The cambium, the stem cell niche mediating wood formation, fundamentally depends on auxin signalling but its exact role and spatial organization is obscure. Here we show that, while auxin signalling levels increase in differentiating cambium descendants, a moderate level of signalling in cambial stem cells is essential for cambium activity. We identify the auxin-dependent transcription factor ARF5/MONOPTEROS to cell-autonomously restrict the number of stem cells by directly attenuating the activity of the stem cell-promoting WOX4 gene. In contrast, ARF3 and ARF4 function as cambium activators in a redundant fashion from outside of WOX4-expressing cells. Our results reveal an influence of auxin signalling on distinct cambium features by specific signalling components and allow the conceptual integration of plant stem cell systems with distinct anatomies. Auxin activity controls plant stem cell function. Here the authors show that in the cambium, moderate auxin activity restricts cambial stem cell number via ARF5-dependent repression of the stem‐cell‐promoting factor WOX4, while ARF3 and ARF4 promote cambial activity outside of the WOX4‐expression domain.
Collapse
|
6
|
Levitsky VG, Oshchepkov DY, Klimova NV, Ignatieva EV, Vasiliev GV, Merkulov VM, Merkulova TI. Hidden heterogeneity of transcription factor binding sites: A case study of SF-1. Comput Biol Chem 2016; 64:19-32. [PMID: 27235721 DOI: 10.1016/j.compbiolchem.2016.04.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 04/19/2016] [Accepted: 04/19/2016] [Indexed: 01/15/2023]
Abstract
Steroidogenic factor 1 (SF-1) belongs to a small group of the transcription factors that bind DNA only as a monomer. Three different approaches-Sitecon, SiteGA, and oPWM-constructed using the same training sample of experimentally confirmed SF-1 binding sites have been used to recognize these sites. The appropriate prediction thresholds for recognition models have been selected. Namely, the thresholds concordant by false positive or negative rates for various methods were used to optimize the discrimination of steroidogenic gene promoters from the datasets of non-specific promoters. After experimental verification, the models were used to analyze the ChIP-seq data for SF-1. It has been shown that the sets of sites recognized by different models overlap only partially and that an integration of these models allows for identification of SF-1 sites in up to 80% of the ChIP-seq loci. The structures of the sites detected using the three recognition models in the ChIP-seq peaks falling within the [-5000, +5000] region relative to the transcription start sites (TSS) extracted from the FANTOM5 project have been analyzed. The MATLIGN classified the frequency matrices for the sites predicted by oPWM, Sitecon, and SiteGA into two groups. The first group is described by oPWM/Sitecon and the second, by SiteGA. Gene ontology (GO) analysis has been used to clarify the differences between the sets of genes carrying different variants of SF-1 binding sites. Although this analysis in general revealed a considerable overlap in GO terms for the genes carrying the binding sites predicted by oPWM, Sitecon, or SiteGA, only the last method elicited notable trend to terms related to negative regulation and apoptosis. The results suggest that the SF-1 binding sites are different in both their structure and the functional annotation of the set of target genes correspond to the predictions by oPWM+Sitecon and SiteGA. Further application of Homer software for de novo identification of enriched motifs in ChIP-Seq data for SF-1ChIP-seq dataset gave the data similar to oPWM+Sitecon.
Collapse
Affiliation(s)
- V G Levitsky
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia.
| | - D Yu Oshchepkov
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - N V Klimova
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - E V Ignatieva
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia
| | - G V Vasiliev
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - V M Merkulov
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Merkulova
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
7
|
Evtushenko EV, Levitsky VG, Elisafenko EA, Gunbin KV, Belousov AI, Šafář J, Doležel J, Vershinin AV. The expansion of heterochromatin blocks in rye reflects the co-amplification of tandem repeats and adjacent transposable elements. BMC Genomics 2016; 17:337. [PMID: 27146967 PMCID: PMC4857426 DOI: 10.1186/s12864-016-2667-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 04/25/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A prominent and distinctive feature of the rye (Secale cereale) chromosomes is the presence of massive blocks of subtelomeric heterochromatin, the size of which is correlated with the copy number of tandem arrays. The rapidity with which these regions have formed over the period of speciation remains unexplained. RESULTS Using a BAC library created from the short arm telosome of rye chromosome 1R we uncovered numerous arrays of the pSc200 and pSc250 tandem repeat families which are concentrated in subtelomeric heterochromatin and identified the adjacent DNA sequences. The arrays show significant heterogeneity in monomer organization. 454 reads were used to gain a representation of the expansion of these tandem repeats across the whole rye genome. The presence of multiple, relatively short monomer arrays, coupled with the mainly star-like topology of the monomer phylogenetic trees, was taken as indicative of a rapid expansion of the pSc200 and pSc250 arrays. The evolution of subtelomeric heterochromatin appears to have included a significant contribution of illegitimate recombination. The composition of transposable elements (TEs) within the regions flanking the pSc200 and pSc250 arrays differed markedly from that in the genome a whole. Solo-LTRs were strongly enriched, suggestive of a history of active ectopic exchange. Several DNA motifs were over-represented within the LTR sequences. CONCLUSION The large blocks of subtelomeric heterochromatin have arisen from the combined activity of TEs and the expansion of the tandem repeats. The expansion was likely based on a highly complex network of recombination mechanisms.
Collapse
Affiliation(s)
- E V Evtushenko
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - E A Elisafenko
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
| | - K V Gunbin
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - A I Belousov
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia
| | - J Šafář
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - J Doležel
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - A V Vershinin
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia.
| |
Collapse
|
8
|
Turnaev II, Rasskazov DA, Arkova OV, Ponomarenko MP, Ponomarenko PM, Savinkova LK, Kolchanov NA. Hypothetical SNP markers that significantly affect the affinity of the TATA-binding protein to VEGFA, ERBB2, IGF1R, FLT1, KDR, and MET oncogene promoters as chemotherapy targets. Mol Biol 2016. [DOI: 10.1134/s0026893316010209] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Zemlyanskaya EV, Levitsky VG, Oshchepkov DY, Grosse I, Mironova VV. The Interplay of Chromatin Landscape and DNA-Binding Context Suggests Distinct Modes of EIN3 Regulation in Arabidopsis thaliana. FRONTIERS IN PLANT SCIENCE 2016; 7:2044. [PMID: 28119721 PMCID: PMC5220190 DOI: 10.3389/fpls.2016.02044] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Accepted: 12/21/2016] [Indexed: 05/08/2023]
Abstract
The plant hormone ethylene regulates numerous developmental processes and stress responses. Ethylene signaling proceeds via a linear pathway, which activates transcription factor (TF) EIN3, a primary transcriptional regulator of ethylene response. EIN3 influences gene expression upon binding to a specific sequence in gene promoters. This interaction, however, might be considerably affected by additional co-factors. In this work, we perform whole genome bioinformatics study to identify the impact of epigenetic factors in EIN3 functioning. The analysis of publicly available ChIP-Seq data on EIN3 binding in Arabidopsis thaliana showed bimodality of distribution of EIN3 binding regions (EBRs) in gene promoters. Besides a sharp peak in close proximity to transcription start site, which is a common binding region for a wide variety of TFs, we found an additional extended peak in the distal promoter region. We characterized all EBRs with respect to the epigenetic status appealing to previously published genome-wide map of nine chromatin states in A. thaliana. We found that the implicit distal peak was associated with a specific chromatin state (referred to as chromatin state 4 in the primary source), which was just poorly represented in the pronounced proximal peak. Intriguingly, EBRs corresponding to this chromatin state 4 were significantly associated with ethylene response, unlike the others representing the overwhelming majority of EBRs related to the explicit proximal peak. Moreover, we found that specific EIN3 binding sequences predicted with previously described model were enriched in the EBRs mapped to the chromatin state 4, but not to the rest ones. These results allow us to conclude that the interplay of genetic and epigenetic factors might cause the distinct modes of EIN3 regulation.
Collapse
Affiliation(s)
- Elena V. Zemlyanskaya
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences (SB RAS), NovosibirskRussia
- Department of Natural Sciences, Novosibirsk State UniversityNovosibirsk, Russia
- *Correspondence: Elena V. Zemlyanskaya,
| | - Victor G. Levitsky
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences (SB RAS), NovosibirskRussia
- Department of Natural Sciences, Novosibirsk State UniversityNovosibirsk, Russia
| | - Dmitry Y. Oshchepkov
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences (SB RAS), NovosibirskRussia
| | - Ivo Grosse
- Department of Natural Sciences, Novosibirsk State UniversityNovosibirsk, Russia
- Institute of Computer Science, Martin Luther University Halle-WittenbergHalle(Saale), Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-LeipzigLeipzig, Germany
| | - Victoria V. Mironova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences (SB RAS), NovosibirskRussia
- Department of Natural Sciences, Novosibirsk State UniversityNovosibirsk, Russia
| |
Collapse
|
10
|
Ponomarenko PM, Ponomarenko MP. Sequence-based prediction of transcription upregulation by auxin in plants. J Bioinform Comput Biol 2015; 13:1540009. [PMID: 25666655 DOI: 10.1142/s0219720015400090] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Auxin is one of the main regulators of growth and development in plants. Prediction of auxin response based on gene sequence is of high importance. We found the TGTCNC consensus of 111 known natural and artificially mutated auxin response elements (AuxREs) with measured auxin-caused relative increase in genes' transcription levels, so-called either "a response to auxin" or "an auxin response." This consensus was identical to the most cited AuxRE motif. Also, we found several DNA sequence features that correlate with auxin-caused increase in genes' transcription levels, namely: number of matches with TGTCNC, homology score based on nucleotide frequencies at the consensus positions, abundances of five trinucleotides and five B-helical DNA features around these known AuxREs. We combined these correlations using a four-step empirical model of auxin response based on a gene's sequence with four steps, namely: (1) search for AuxREs with no auxin; (2) stop at the found AuxRE; (3) repression of the basal transcription of the gene having this AuxRE; and (4) manifold increase of this gene's transcription in response to auxin. Independently measured increases in transcription levels in response to auxin for 70 Arabidopsis genes were found to significantly correlate with predictions of this equation (r = 0.44, p < 0.001) as well as with TATA-binding protein (TBP)'s affinity to promoters of these genes and with nucleosome packing of these promoters (both, p < 0.025). Finally, we improved our equation for prediction of a gene's transcription increase in response to auxin by taking into account TBP-binding and nucleosome packing (r = 0.53, p < 10(-6)). Fisher's F-test validated the significant impact of both TBP/promoter-affinity and promoter nucleosome on auxin response in addition to those of AuxRE, F = 4.07, p < 0.025. It means that both TATA-box and nucleosome should be taken into account to recognize transcription factor binding sites upon DNA sequences: in the case of the TATA-less nucleosome-rich promoters, recognition scores must be higher than in the case of the TATA-containing nucleosome-free promoters at the same transcription activity.
Collapse
Affiliation(s)
- Petr M Ponomarenko
- Children's Hospital Los Angeles, 4640 Hollywood Blvd, Los Angeles, CA 90027, USA
| | | |
Collapse
|
11
|
Ignatieva EV, Podkolodnaya OA, Orlov YL, Vasiliev GV, Kolchanov NA. Regulatory genomics: Combined experimental and computational approaches. RUSS J GENET+ 2015. [DOI: 10.1134/s1022795415040067] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
12
|
Mironova VV, Omelyanchuk NA, Wiebe DS, Levitsky VG. Computational analysis of auxin responsive elements in the Arabidopsis thaliana L. genome. BMC Genomics 2014; 15 Suppl 12:S4. [PMID: 25563792 PMCID: PMC4331925 DOI: 10.1186/1471-2164-15-s12-s4] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Auxin responsive elements (AuxRE) were found in upstream regions of target genes for ARFs (Auxin response factors). While Chip-seq data for most of ARFs are still unavailable, prediction of potential AuxRE is restricted by consensus models that detect too many false positive sites. Using sequence analysis of experimentally proven AuxREs, we revealed both an extended nucleotide context pattern for AuxRE itself and three distinct types of its coupling motifs (Y-patch, AuxRE-like, and ABRE-like), which together with AuxRE may form the composite elements. Computational analysis of the genome-wide distribution of the predicted AuxREs and their impact on auxin responsive gene expression allowed us to conclude that: (1) AuxREs are enriched around the transcription start site with the maximum density in 5'UTR; (2) AuxREs mediate auxin responsive up-regulation, not down-regulation. (3) Directly oriented single AuxREs and reverse multiple AuxREs are mostly associated with auxin responsiveness. In the composite AuxRE elements associated with auxin response, ABRE-like and Y-patch are 5'-flanking or overlapping AuxRE, whereas AuxRE-like motif is 3'-flanking. The specificity in location and orientation of the coupling elements suggests them as potential binding sites for ARFs partners.
Collapse
|
13
|
Martínez-Nava GA, Torres-Poveda K, Lagunas-Martínez A, Bahena-Román M, Zurita-Díaz MA, Ortíz-Flores E, García-Carrancá A, Madrid-Marina V, Burguete-García AI. Cervical cancer-associated promoter polymorphism affects akna expression levels. Genes Immun 2014; 16:43-53. [PMID: 25373726 DOI: 10.1038/gene.2014.60] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Revised: 09/02/2014] [Accepted: 09/24/2014] [Indexed: 12/17/2022]
Abstract
Cervical cancer (CC) is responsible for >260,000 deaths worldwide each year. Efforts are being focused on identifying genetic susceptibility factors, especially in genes related to the immune response. Akna has been proposed to be one of them, but data regarding its functional role in the disease is scarce. Supporting the notion of akna as a CC susceptibility gene, we found two polymorphisms associated with squamous intraepithelial lesion (SIL) and CC; moreover, we identified an association between high akna expression levels and CC and SIL, but its direction differs in each disease stage. To show the potential existence of a cis-acting polymorphism, we assessed akna allelic expression imbalance for the alleles of the -1372C>A polymorphism. We found that, regardless of the study group, the number of transcripts derived from the A allele was significantly higher than those from the C allele. Our results support the hypothesis that akna is a CC susceptibility genetic factor and suggest that akna transcriptional regulation has a role in the disease. We anticipate our study to be a starting point for in vitro evaluation of akna transcriptional regulation and for the identification of transcription factors and cis-elements regulating AKNA function that are involved in carcinogenesis.
Collapse
Affiliation(s)
- G A Martínez-Nava
- 193;rea de Infecciones Crónicas y Cáncer, Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Mexico
| | - K Torres-Poveda
- 193;rea de Infecciones Crónicas y Cáncer, Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Mexico
| | - A Lagunas-Martínez
- 193;rea de Infecciones Crónicas y Cáncer, Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Mexico
| | - M Bahena-Román
- 193;rea de Infecciones Crónicas y Cáncer, Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Mexico
| | - M A Zurita-Díaz
- 193;rea de Infecciones Crónicas y Cáncer, Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Mexico
| | - E Ortíz-Flores
- 193;rea de Infecciones Crónicas y Cáncer, Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Mexico
| | - A García-Carrancá
- Unidad de Investigación Biomédica en Cáncer, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México and Instituto Nacional de Cancerología, Secretaría de Salud, Distrito Federal, Mexico
| | - V Madrid-Marina
- 193;rea de Infecciones Crónicas y Cáncer, Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Mexico
| | - A I Burguete-García
- 193;rea de Infecciones Crónicas y Cáncer, Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Mexico
| |
Collapse
|
14
|
Kim CK, Kim JA, Choi JW, Jeong IS, Moon YS, Park DS, Seol YJ, Kim YK, Kim YH, Kim YK. A Multi-Layered Screening Method to Identify Plant Regulatory Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:293-303. [PMID: 26355777 DOI: 10.1109/tcbb.2013.2296308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We used a seven-step process to identify genes involved in glucosinolate biosynthesis and metabolism in the Chinese cabbage (Brassica rapa). We constructed an annotated data set with 34,570 unigenes from B. rapa and predicted 11,526 glucosinolate-related candidate genes using expression profiles generated across nine stages of development on a 47k-gene microarray. Using our multi-layered screening method, we screened 392 transcription factors, 843 pathway genes, and 4,162 ortholog genes associated with glucosinolate-related biosynthesis. Finally, we identified five genes by comparison of the pathway-network genes including the transcription-factor genes and the ortholog-ontology genes. The five genes were anchored to the chromosomes of B. rapa to characterize their genetic-map positions, and phylogenetic reconstruction with homologous genes was performed. These anchored genes were verified by reverse-transcription polymerase chain reaction. While the five genes identified by our multi-layered screen require further characterization and validation, our study demonstrates the power of multi-layered screening after initial identification of genes on microarrays.
Collapse
|
15
|
Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data. BMC Genomics 2014; 15:80. [PMID: 24472686 PMCID: PMC4234207 DOI: 10.1186/1471-2164-15-80] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 01/25/2014] [Indexed: 02/07/2023] Open
Abstract
Background ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models. Results Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets. Conclusions The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-80) contains supplementary material, which is available to authorized users.
Collapse
|
16
|
Disclosing the crosstalk among DNA methylation, transcription factors, and histone marks in human pluripotent cells through discovery of DNA methylation motifs. Genome Res 2013; 23:2013-29. [PMID: 24149073 PMCID: PMC3847772 DOI: 10.1101/gr.155960.113] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Gene expression regulation is gated by promoter methylation states modulating transcription factor binding. The known DNA methylation/unmethylation mechanisms are sequence unspecific, but different cells with the same genome have different methylomes. Thus, additional processes bringing specificity to the methylation/unmethylation mechanisms are required. Searching for such processes, we demonstrated that CpG methylation states are influenced by the sequence context surrounding the CpGs. We used such a property to develop a CpG methylation motif discovery algorithm. The newly discovered motifs reveal “methylation/unmethylation factors” that could recruit the “methylation/unmethylation machinery” to the loci specified by the motifs. Our methylation motif discovery algorithm provides a synergistic approach to the differently methylated region algorithms. Since our algorithm searches for commonly methylated regions inside the same sample, it requires only a single sample to operate. The motifs that were found discriminate between hypomethylated and hypermethylated regions. The hypomethylation-associated motifs have a high CG content, their targets appear in conserved regions near transcription start sites, they tend to co-occur within transcription factor binding sites, they are involved in breaking the H3K4me3/H3K27me3 bivalent balance, and they transit the enhancers from repressive H3K27me3 to active H3K27ac during ES cell differentiation. The new methylation motifs characterize the pluripotent state shared between ES and iPS cells. Additionally, we found a collection of motifs associated with the somatic memory inherited by the iPS from the initial fibroblast cells, thus revealing the existence of epigenetic somatic memory on a fine methylation scale.
Collapse
|
17
|
Vorontsov IE, Kulakovskiy IV, Makeev VJ. Jaccard index based similarity measure to compare transcription factor binding site models. Algorithms Mol Biol 2013; 8:23. [PMID: 24074225 PMCID: PMC3851813 DOI: 10.1186/1748-7188-8-23] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.
Collapse
|
18
|
Levitsky VG, Babenko VN, Vershinin AV. The roles of the monomer length and nucleotide context of plant tandem repeats in nucleosome positioning. J Biomol Struct Dyn 2013; 32:115-26. [PMID: 23384242 DOI: 10.1080/07391102.2012.755796] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Similar to regularly spaced nucleosomes in chromatin, long tandem DNA arrays are composed of regularly alternating monomers that have almost identical primary DNA structures. Such a similarity in the structural organization makes these arrays especially interesting for studying the role of intrinsic DNA preferences in nucleosome positioning. We have studied the nucleosome formation potential of DNA tandem repeat families with different monomer lengths (ML). In total, 165 plant tandem repeat families from the PlantSat database (http://w3lamc.umbr.cas.cz/PlantSat/) were divided into two classes based on the number of nucleosome repeats in one DNA monomer. For predicting nucleosome formation potential, we developed the Phase method, which combines the advantages of multiple bioinformatics models. The Phase method was able to distinguish interfamily differences and intrafamily monomer variation and identify the influence of nucleotide context on nucleosome formation potential. Three main types of nucleosome arrangement in DNA tandem repeat arrays--regular, partially regular (partial), and flexible--were distinguished among a great variety of Phase profiles. The regular type, in which all nucleosomes of the monomer array are positioned in a context-dependent manner, is the most representative type of the class 1 families, with ML equal to or a multiple of the nucleosome repeat length (NRL). In the partially regular type, nucleotide context influences the positioning of only a subset of nucleosomes. The influence of the nucleotide context on nucleosome positioning has the least effect in the flexible type, which contains the greatest number of families (65). The majority of these families belong to class 2 and have nonmultiple ML to NRL ratios.
Collapse
Affiliation(s)
- Victor G Levitsky
- a Laboratory of Molecular Genetics Systems , Institute of Cytology and Genetics , Novosibirsk , 630090 , Russia
| | | | | |
Collapse
|
19
|
Kulakovskiy I, Levitsky V, Oshchepkov D, Bryzgalov L, Vorontsov I, Makeev V. From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites. J Bioinform Comput Biol 2013; 11:1340004. [PMID: 23427986 DOI: 10.1142/s0219720013400040] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) became a method of choice to locate DNA segments bound by different regulatory proteins. ChIP-Seq produces extremely valuable information to study transcriptional regulation. The wet-lab workflow is often supported by downstream computational analysis including construction of models of nucleotide sequences of transcription factor binding sites in DNA, which can be used to detect binding sites in ChIP-Seq data at a single base pair resolution. The most popular TFBS model is represented by positional weight matrix (PWM) with statistically independent positional weights of nucleotides in different columns; such PWMs are constructed from a gapless multiple local alignment of sequences containing experimentally identified TFBSs. Modern high-throughput techniques, including ChIP-Seq, provide enough data for careful training of advanced models containing more parameters than PWM. Yet, many suggested multiparametric models often provide only incremental improvement of TFBS recognition quality comparing to traditional PWMs trained on ChIP-Seq data. We present a novel computational tool, diChIPMunk, that constructs TFBS models as optimal dinucleotide PWMs, thus accounting for correlations between nucleotides neighboring in input sequences. diChIPMunk utilizes many advantages of ChIPMunk, its ancestor algorithm, accounting for ChIP-Seq base coverage profiles ("peak shape") and using the effective subsampling-based core procedure which allows processing of large datasets. We demonstrate that diPWMs constructed by diChIPMunk outperform traditional PWMs constructed by ChIPMunk from the same ChIP-Seq data. Software website: http://autosome.ru/dichipmunk/
Collapse
Affiliation(s)
- Ivan Kulakovskiy
- Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russia.
| | | | | | | | | | | |
Collapse
|
20
|
Weng K, Hu H, Xu AG, Khaitovich P, Somel M. Mechanisms of dietary response in mice and primates: a role for EGR1 in regulating the reaction to human-specific nutritional content. PLoS One 2012; 7:e43915. [PMID: 22937124 PMCID: PMC3427207 DOI: 10.1371/journal.pone.0043915] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2012] [Accepted: 07/27/2012] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Humans have a widely different diet from other primate species, and are dependent on its high nutritional content. The molecular mechanisms responsible for adaptation to the human diet are currently unknown. Here, we addressed this question by investigating whether the gene expression response observed in mice fed human and chimpanzee diets involves the same regulatory mechanisms as expression differences between humans and chimpanzees. RESULTS Using mouse and primate transcriptomic data, we identified the transcription factor EGR1 (early growth response 1) as a putative regulator of diet-related differential gene expression between human and chimpanzee livers. Specifically, we predict that EGR1 regulates the response to the high caloric content of human diets. However, we also show that close to 90% of the dietary response to the primate diet found in mice, is not observed in primates. This might be explained by changes in tissue-specific gene expression between taxa. CONCLUSION Our results suggest that the gene expression response to the nutritionally rich human diet is partially mediated by the transcription factor EGR1. While this EGR1-driven response is conserved between mice and primates, the bulk of the mouse response to human and chimpanzee dietary differences is not observed in primates. This result highlights the rapid evolution of diet-related expression regulation and underscores potential limitations of mouse models in dietary studies.
Collapse
Affiliation(s)
- Kai Weng
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Haiyang Hu
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
| | - Augix Guohua Xu
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Philipp Khaitovich
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Mehmet Somel
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China
- Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
| |
Collapse
|
21
|
Levitsky VG, Oshchepkov DY, Ershov NI, Bryzgalov LO, Antontseva EV, Vasiliev GV, Merkulova TI, Kolchanov NA. Development of computational methods to search for FoxA transcription factor binding sites, their experimental verification and application to the analysis of ChIP-seq data. DOKL BIOCHEM BIOPHYS 2011; 436:12-5. [PMID: 21369894 DOI: 10.1134/s1607672911010054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Indexed: 11/22/2022]
Affiliation(s)
- V G Levitsky
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, pr. Akademika Lavrent'eva 10, Novosibirsk 630090, Russia
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Oshchepkov DY, Levitsky VG. In silico prediction of transcriptional factor-binding sites. Methods Mol Biol 2011; 760:251-67. [PMID: 21780002 DOI: 10.1007/978-1-61779-176-5_16] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The recognition of transcription factor binding sites (TFBSs) is the first step on the way to deciphering the DNA regulatory code. A large variety of computational approaches and corresponding in silico tools for TFBS recognition are available, each having their own advantages and shortcomings. This chapter provides a brief tutorial to assist end users in the application of these tools for functional characterization of genes.
Collapse
Affiliation(s)
- Dmitry Y Oshchepkov
- Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.
| | | |
Collapse
|
23
|
Evans KJ. Most transcription factor binding sites are in a few mosaic classes of the human genome. BMC Genomics 2010; 11:286. [PMID: 20459624 PMCID: PMC2881025 DOI: 10.1186/1471-2164-11-286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2010] [Accepted: 05/06/2010] [Indexed: 12/02/2022] Open
Abstract
Background Many algorithms for finding transcription factor binding sites have concentrated on the characterisation of the binding site itself: and these algorithms lead to a large number of false positive sites. The DNA sequence which does not bind has been modeled only to the extent necessary to complement this formulation. Results We find that the human genome may be described by 19 pairs of mosaic classes, each defined by its base frequencies, (or more precisely by the frequencies of doublets), so that typically a run of 10 to 100 bases belongs to the same class. Most experimentally verified binding sites are in the same four pairs of classes. In our sample of seventeen transcription factors — taken from different families of transcription factors — the average proportion of sites in this subset of classes was 75%, with values for individual factors ranging from 48% to 98%. By contrast these same classes contain only 26% of the bases of the genome and only 31% of occurrences of the motifs of these factors — that is places where one might expect the factors to bind. These results are not a consequence of the class composition in promoter regions. Conclusions This method of analysis will help to find transcription factor binding sites and assist with the problem of false positives. These results also imply a profound difference between the mosaic classes.
Collapse
Affiliation(s)
- Kenneth J Evans
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|