1
|
Tsukanov AV, Levitsky VG, Merkulova TI. Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites. Vavilovskii Zhurnal Genet Selektsii 2021; 25:7. [PMID: 34547062 PMCID: PMC8408018 DOI: 10.18699/vj21.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/10/2021] [Accepted: 01/12/2021] [Indexed: 11/24/2022] Open
Abstract
The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS)
is the positional weight matrix (PWM). However, this model does not take into account dependencies between
nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe,
can do as much. However, application of these models was usually limited only to comparing their recognition
accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This
pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their
classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered
PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a
significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was
26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of
predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks
containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe,
respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity.
We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq
datasets under study.
Collapse
Affiliation(s)
- A V Tsukanov
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
2
|
Zykova TY, Levitsky VG, Zhimulev IF. Architecture of Promoters of House-Keeping Genes in Polytene Chromosome Interbands of Drosophila melanogaster. DOKL BIOCHEM BIOPHYS 2019; 485:95-100. [PMID: 31201623 DOI: 10.1134/s1607672919020029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Indexed: 12/22/2022]
Abstract
This is the first study to investigate the molecular-genetic organization of polytene chromosome interbands located on both molecular and cytological maps of Drosophila genome. The majority of the studied interbands contained one gene with a single transcription initiation site; the remaining interbands contained one gene with several alternative promoters, two or more unidirectional genes, and "head-to-head" arranged genes. In addition, intricately arranged interbands containing three or more genes in both unidirectional and bidirectional orientation were found. Insulator proteins, ORC, P-insertions, DNase I hypersensitive sites, and other open chromatin structures were situated in the promoter region of the genes located in the interbands. This area is critical for the formation of the interband, an open chromatin region in which gene transcription and replication are combined.
Collapse
Affiliation(s)
- T Yu Zykova
- Institute of Molecular and Cellular Biology, Siberian Branch, Russian Academy of Sciences, 630090, Novosibirsk, Russia.
| | - V G Levitsky
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, 630090, Novosibirsk, Russia.,Novosibirsk State University, 630090, Novosibirsk, Russia
| | - I F Zhimulev
- Institute of Molecular and Cellular Biology, Siberian Branch, Russian Academy of Sciences, 630090, Novosibirsk, Russia.,Novosibirsk State University, 630090, Novosibirsk, Russia
| |
Collapse
|
3
|
Zykova TY, Popova OO, Khoroshko VA, Levitsky VG, Lavrov SA, Zhimulev IF. Genetic Organization of Open Chromatin Domains Situated in Polytene Chromosome Interbands in Drosophila. DOKL BIOCHEM BIOPHYS 2019; 483:297-301. [PMID: 30607724 DOI: 10.1134/s1607672918060078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
New data on the organization of genes entirely located in the open domains for chromatin transcription and occupying only one chromosome structure (interband) were obtained. The characteristic features of these genes are the small size (on average, 1-2 kb), depletion of the replicative complex proteins in the regulatory region, and the presence of specific motifs for binding transcription factors, as compared to the genes occupying two structures (interband and gray band). The biological function of these genes is associated primarily with the processes of gene expression and RNA metabolism.
Collapse
Affiliation(s)
- T Yu Zykova
- Institute of Molecular and Cell Biology, Siberian Branch, Russian Academy of Sciences, Novosibirsk, 630090, Russia.
| | - O O Popova
- Institute of Molecular and Cell Biology, Siberian Branch, Russian Academy of Sciences, Novosibirsk, 630090, Russia
| | - V A Khoroshko
- Institute of Molecular and Cell Biology, Siberian Branch, Russian Academy of Sciences, Novosibirsk, 630090, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, 630090, Russia.,Novosibirsk State University, Novosibirsk, 630090, Russia
| | - S A Lavrov
- Institute of Molecular Genetics, Russian Academy of Sciences, pl. Akademika Kurchatova 46, Moscow, 123182, Russia
| | - I F Zhimulev
- Institute of Molecular and Cell Biology, Siberian Branch, Russian Academy of Sciences, Novosibirsk, 630090, Russia.,Novosibirsk State University, Novosibirsk, 630090, Russia
| |
Collapse
|
4
|
Levitsky VG, Oshchepkov DY, Klimova NV, Ignatieva EV, Vasiliev GV, Merkulov VM, Merkulova TI. Hidden heterogeneity of transcription factor binding sites: A case study of SF-1. Comput Biol Chem 2016; 64:19-32. [PMID: 27235721 DOI: 10.1016/j.compbiolchem.2016.04.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 04/19/2016] [Accepted: 04/19/2016] [Indexed: 01/15/2023]
Abstract
Steroidogenic factor 1 (SF-1) belongs to a small group of the transcription factors that bind DNA only as a monomer. Three different approaches-Sitecon, SiteGA, and oPWM-constructed using the same training sample of experimentally confirmed SF-1 binding sites have been used to recognize these sites. The appropriate prediction thresholds for recognition models have been selected. Namely, the thresholds concordant by false positive or negative rates for various methods were used to optimize the discrimination of steroidogenic gene promoters from the datasets of non-specific promoters. After experimental verification, the models were used to analyze the ChIP-seq data for SF-1. It has been shown that the sets of sites recognized by different models overlap only partially and that an integration of these models allows for identification of SF-1 sites in up to 80% of the ChIP-seq loci. The structures of the sites detected using the three recognition models in the ChIP-seq peaks falling within the [-5000, +5000] region relative to the transcription start sites (TSS) extracted from the FANTOM5 project have been analyzed. The MATLIGN classified the frequency matrices for the sites predicted by oPWM, Sitecon, and SiteGA into two groups. The first group is described by oPWM/Sitecon and the second, by SiteGA. Gene ontology (GO) analysis has been used to clarify the differences between the sets of genes carrying different variants of SF-1 binding sites. Although this analysis in general revealed a considerable overlap in GO terms for the genes carrying the binding sites predicted by oPWM, Sitecon, or SiteGA, only the last method elicited notable trend to terms related to negative regulation and apoptosis. The results suggest that the SF-1 binding sites are different in both their structure and the functional annotation of the set of target genes correspond to the predictions by oPWM+Sitecon and SiteGA. Further application of Homer software for de novo identification of enriched motifs in ChIP-Seq data for SF-1ChIP-seq dataset gave the data similar to oPWM+Sitecon.
Collapse
Affiliation(s)
- V G Levitsky
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia.
| | - D Yu Oshchepkov
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - N V Klimova
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - E V Ignatieva
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia
| | - G V Vasiliev
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - V M Merkulov
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Merkulova
- Federal State Research Center Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia; Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
5
|
Evtushenko EV, Levitsky VG, Elisafenko EA, Gunbin KV, Belousov AI, Šafář J, Doležel J, Vershinin AV. The expansion of heterochromatin blocks in rye reflects the co-amplification of tandem repeats and adjacent transposable elements. BMC Genomics 2016; 17:337. [PMID: 27146967 PMCID: PMC4857426 DOI: 10.1186/s12864-016-2667-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 04/25/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A prominent and distinctive feature of the rye (Secale cereale) chromosomes is the presence of massive blocks of subtelomeric heterochromatin, the size of which is correlated with the copy number of tandem arrays. The rapidity with which these regions have formed over the period of speciation remains unexplained. RESULTS Using a BAC library created from the short arm telosome of rye chromosome 1R we uncovered numerous arrays of the pSc200 and pSc250 tandem repeat families which are concentrated in subtelomeric heterochromatin and identified the adjacent DNA sequences. The arrays show significant heterogeneity in monomer organization. 454 reads were used to gain a representation of the expansion of these tandem repeats across the whole rye genome. The presence of multiple, relatively short monomer arrays, coupled with the mainly star-like topology of the monomer phylogenetic trees, was taken as indicative of a rapid expansion of the pSc200 and pSc250 arrays. The evolution of subtelomeric heterochromatin appears to have included a significant contribution of illegitimate recombination. The composition of transposable elements (TEs) within the regions flanking the pSc200 and pSc250 arrays differed markedly from that in the genome a whole. Solo-LTRs were strongly enriched, suggestive of a history of active ectopic exchange. Several DNA motifs were over-represented within the LTR sequences. CONCLUSION The large blocks of subtelomeric heterochromatin have arisen from the combined activity of TEs and the expansion of the tandem repeats. The expansion was likely based on a highly complex network of recombination mechanisms.
Collapse
Affiliation(s)
- E V Evtushenko
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - E A Elisafenko
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
| | - K V Gunbin
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - A I Belousov
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia
| | - J Šafář
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - J Doležel
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - A V Vershinin
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia.
| |
Collapse
|
6
|
Aitnazarov RB, Ignatieva EV, Bazarova NE, Levitsky VG, Knyazev SP, Gon Y, Yudin NS. Dissecting the role of single nucleotide polymorphism of lymphotoxin beta gene during pig domestication using bioinformatic and experimental approaches. Vavilovskii Zhurnal Genet Selektsii 2016. [DOI: 10.18699/vj15.088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
7
|
Matushkin YG, Levitsky VG, Orlov YL, Likhoshvai VA, Kolchanov NA. Translation efficiency in yeasts correlates with nucleosome formation in promoters. J Biomol Struct Dyn 2012; 31:96-102. [PMID: 22803765 DOI: 10.1080/07391102.2012.691366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Elongation efficiency index (EEI) was suggested earlier to estimate gene expression efficiency by nucleotide context of coding sequence in unicellular organisms. We have analyzed association between EEI and nucleosome formation potential (NFP) in 5' regulatory regions upstream translation initiation site (TIS) from two yeast species. Theoretical estimations of NFP based on DNA sequence were obtained by Recon method. Experimental estimation of nucleosome occupancy was obtained by high-throughput sequencing data of nucleosomal DNA in Saccharomyces cerevisiae . For the sample of all genes correlation coefficient was calculated between two vectors: vector of NFP values for fixed position relative to TIS and vector of EEI values. Profiles of correlation coefficients of NFP and EEI were counted in (-600; +600) regions relative to TIS for gene sequences extracted from GenBank. We found regions of strong negative dependence between NFP and EEI for all genes as well as for 10% highly expressed genes in Schizosaccharomyces pombe (10% of EEI-highest genes). At the same time, we found positive dependence between NFP and EEI for all genes and for low expressed genes in S. cerevisiae (10% of EEI-lowest genes). The association between NFP and EEI could be explained by evolutionary selection of context characteristics of nucleotide sequences for gene expression optimization.
Collapse
Affiliation(s)
- Yu G Matushkin
- Institute of Cytology and Genetics SB RAS, Lavrentiev ave. 10, Novosibirsk, 630090, Russia.
| | | | | | | | | |
Collapse
|
8
|
Levitsky VG, Oshchepkov DY, Ershov NI, Bryzgalov LO, Antontseva EV, Vasiliev GV, Merkulova TI, Kolchanov NA. Development of computational methods to search for FoxA transcription factor binding sites, their experimental verification and application to the analysis of ChIP-seq data. DOKL BIOCHEM BIOPHYS 2011; 436:12-5. [PMID: 21369894 DOI: 10.1134/s1607672911010054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Indexed: 11/22/2022]
Affiliation(s)
- V G Levitsky
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, pr. Akademika Lavrent'eva 10, Novosibirsk 630090, Russia
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Merkulova TI, Oshchepkov DY, Ignatieva EV, Ananko EA, Levitsky VG, Vasiliev GV, Klimova NV, Merkulov VM, Kolchanov NA. Bioinformatical and experimental approaches to investigation of transcription factor binding sites in vertebrate genes. Biochemistry Moscow 2007; 72:1187-93. [DOI: 10.1134/s000629790711003x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
10
|
Klimova NV, Levitsky VG, Ignatieva EV, Vasiliev GV, Kobzev VF, Busygina TV, Merkulova TI, Kolchanov NA. Potential binding sites for SF-1: Recognition by the SiteGA method, experimental verification, and search for new target genes. Mol Biol 2006. [DOI: 10.1134/s0026893306030125] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Khomicheva IV, Levitsky VG, Omelyanchuk NA, Savinskaya SA, Kolchanov NA. Pattern of locally positioned dinucleotides correlates with MicroRNA abundance in plants. Biophysics (Nagoya-shi) 2006. [DOI: 10.1134/s0006350906070025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
12
|
Abstract
A program for constructing nucleosome formation potential profile was applied for investigation of exons, introns, and repetitive sequences. The program is available at http://wwwmgs.bionet.nsc.ru/mgs/programs/recon/. We have demonstrated that introns and repetitive sequences exhibit higher nucleosome formation potentials than exons. This fact may be explained by functional saturation of exons with genetic code, hindering the localization of efficient nucleosome positioning sites.
Collapse
Affiliation(s)
- V G Levitsky
- Laboratory of Theoretical Genetics, Institute of Cytology & Genetics, 630090, Lavrentiev ave. 10, Novosibirsk, Russia
| | | | | | | |
Collapse
|
13
|
Abstract
MOTIVATION A rapid growth in the number of genes with known sequences calls for developing automated tools for their classification and analysis. It became clear that nucleosome packaging of eukaryotic DNA is very important for gene functioning. Automated computer tools for characterization of nucleosome packaging density could be useful for studying of gene regulation and genome annotation. RESULTS A program for constructing nucleosome formation potential profiles of eukaryotic DNA sequences was developed. Nucleosome packaging density was analyzed for different functional types of human promoters. It was found that in promoters of tissue-specific genes, the nucleosome formation potential was essentially higher than in genes expressed in many tissues, or housekeeping genes. Hence, capability of nucleosome positioning in the promoter region may serve as a factor regulating gene expression. AVAILABILITY The program for nucleosome sites recognition is included into the GeneExpress system; section 'DNA Nucleosomal Organization', http://wwwmgs.bionet.nsc.ru/mgs/programs/recon/.
Collapse
Affiliation(s)
- V G Levitsky
- Laboratory of Theoretical Genetics, Institute of Cytology & Genetics, 630090, Lavrentiev Ave. 10, Novosibirsk, Russia.
| | | | | | | |
Collapse
|
14
|
Podkolodnaya OA, Levitsky VG, Podkolodnyi NL. Mol Biol 2001; 35:802-809. [DOI: 10.1023/a:1013221915217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
15
|
|
16
|
Abstract
MOTIVATION Despite the growing volume of data on primary nucleotide sequences, the regulatory regions remain a major puzzle with regard to their function. Numerous recognising programs considering a diversity of properties of regulatory regions have been developed. The system proposed here allows the specific contextual, conformational and physico-chemical properties to be revealed based on analysis of extended DNA regions. RESULTS The Internet-accessible computer system RegScan, designed to analyse the extended regulatory regions of eukaryotic genes, has been developed. The computer system comprises the following software: (i) programs for classification dividing a set of promoters into TATA-containing and TATA-less promoters and promoters with and without CpG islands; (ii) programs for constructing (a) nucleotide frequency profiles, (b) sequence complexity profiles and (c) profiles of conformational and physico-chemical properties; (iii) the program for constructing the sets of degenerate oligonucleotide motifs of a specified length; and (iv) the program searching for and visualising repeats in nucleotide sequences. The system has allowed us to demonstrate the following characteristic patterns of vertebrate promoter regions: the TATA box region is flanked by regions with an increased G+C content and increased bending stiffness, the TATA box content is asymmetric and promoter regions are saturated with both direct and inverted repeats. AVAILABILITY The computer system RegScan is available via the Internet at http://www.mgs.bionet.nsc. ru/Systems/RegScan, http://www.cbil.upenn.edu/mgs/systems/r egscan/.
Collapse
Affiliation(s)
- V N Babenko
- Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Lavrentyev Avenue, 10, Novosibirsk, 630090, Russia.
| | | | | | | | | | | |
Collapse
|
17
|
Abstract
MOTIVATION Chromatin structure plays the crucial role in proper gene functioning. Therefore, it is very important to investigate nucleosomal DNA properties and recognize genome nucleosome positioning sequences. Nevertheless, applying different sequence analysis methods separately is insufficient for complete nucleosomal DNA description. One of the most probable reasons for that is the weakness of nucleosome positioning signals. The present paper offers a set of methods to reveal the most important nucleosomal DNA characteristics and to show a common pattern of nucleosome site properties. RESULTS A complex approach was used to determine conformational and physicochemical properties that are most significant for nucleosome binding site description. The integrated database of nucleosomal DNA properties is compiled. This database comprises different sections for description of DNA characteristics. Revealing significant DNA characteristics allows the classification of various samples of site sequences and the generation of programs for site recognition. AVAILABILITY The current version of the database is available at http://wwwmgs.bionet.nsc. ru/system/BDNAvideo/. C-code of the recognition program may be found in the section FEATURE. WWW-available programs for testing arbitrary sequences are accessible at http://wwwmgs.bionet.nsc. ru/Programs/bDNA/NA_bDNA.htm/. The links to the mirror site(s) can be found at http://wwwmgs.bionet.nsc.ru/mgs/links/mirrors.html+ ++.
Collapse
Affiliation(s)
- V G Levitsky
- Laboratory of Theoretical Genetics, Institute of Cytology & Genetics, 630090, Lavrentieva 10, Novosibirsk, Russia.
| | | | | | | | | |
Collapse
|