1
|
Pop RT, Pisante A, Nagy D, Martin PCN, Mikheeva L, Hayat A, Ficz G, Zabet NR. Identification of mammalian transcription factors that bind to inaccessible chromatin. Nucleic Acids Res 2023; 51:8480-8495. [PMID: 37486787 PMCID: PMC10484684 DOI: 10.1093/nar/gkad614] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/11/2023] [Indexed: 07/26/2023] Open
Abstract
Transcription factors (TFs) are proteins that affect gene expression by binding to regulatory regions of DNA in a sequence specific manner. The binding of TFs to DNA is controlled by many factors, including the DNA sequence, concentration of TF, chromatin accessibility and co-factors. Here, we systematically investigated the binding mechanism of hundreds of TFs by analysing ChIP-seq data with our explainable statistical model, ChIPanalyser. This tool uses as inputs the DNA sequence binding motif; the capacity to distinguish between strong and weak binding sites; the concentration of TF; and chromatin accessibility. We found that approximately one third of TFs are predicted to bind the genome in a DNA accessibility independent fashion, which includes TFs that can open the chromatin, their co-factors and TFs with similar motifs. Our model predicted this to be the case when the TF binds to its strongest binding regions in the genome, and only a small number of TFs have the capacity to bind dense chromatin at their weakest binding regions, such as CTCF, USF2 and CEBPB. Our study demonstrated that the binding of hundreds of human and mouse TFs is predicted by ChIPanalyser with high accuracy and showed that many TFs can bind dense chromatin.
Collapse
Affiliation(s)
- Romana T Pop
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, UK
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, Oslo, Norway
| | - Alessandra Pisante
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, UK
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Dorka Nagy
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, UK
- National Heart and Lung Institute, Imperial College London, London SW3 6LY, UK
| | | | | | - Ateequllah Hayat
- Institute of Medical and Biomedical Education, St George's, University of London, Cranmer Terrace, Tooting SW17 0RE, London
| | - Gabriella Ficz
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Nicolae Radu Zabet
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, UK
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| |
Collapse
|
2
|
Fioresi R, Demurtas P, Perini G. Deep learning for MYC binding site recognition. FRONTIERS IN BIOINFORMATICS 2022; 2:1015993. [PMID: 36544623 PMCID: PMC9760990 DOI: 10.3389/fbinf.2022.1015993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 11/24/2022] [Indexed: 12/07/2022] Open
Abstract
Motivation: The definition of the genome distribution of the Myc transcription factor is extremely important since it may help predict its transcriptional activity particularly in the context of cancer. Myc is among the most powerful oncogenes involved in the occurrence and development of more than 80% of different types of pediatric and adult cancers. Myc regulates thousands of genes which can be in part different, depending on the type of tissues and tumours. Myc distribution along the genome has been determined experimentally through chromatin immunoprecipitation This approach, although powerful, is very time consuming and cannot be routinely applied to tumours of individual patients. Thus, it becomes of paramount importance to develop in silico tools that can effectively and rapidly predict its distribution on a given cell genome. New advanced computational tools (DeeperBind) can then be successfully employed to determine the function of Myc in a specific tumour, and may help to devise new directions and approaches to experiments first and personalized and more effective therapeutic treatments for a single patient later on. Results: The use of DeeperBind with DeepRAM on Colab platform (Google) can effectively predict the binding sites for the MYC factor with an accuracy above 0.96 AUC, when trained with multiple cell lines. The analysis of the filters in DeeperBind trained models shows, besides the consensus sequence CACGTG classically associated to the MYC factor, also the other consensus sequences G/C box or TGGGA, respectively bound by the SP1 and MIZ-1 transcription factors, which are known to mediate the MYC repressive response. Overall, our findings suggest a stronger synergy between the machine learning tools as DeeperBind and biological experiments, which may reduce the time consuming experiments by providing a direction to guide them.
Collapse
|
3
|
Chathoth KT, Mikheeva LA, Crevel G, Wolfe JC, Hunter I, Beckett-Doyle S, Cotterill S, Dai H, Harrison A, Zabet NR. The role of insulators and transcription in 3D chromatin organization of flies. Genome Res 2022; 32:682-698. [PMID: 35354608 PMCID: PMC8997359 DOI: 10.1101/gr.275809.121] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 02/17/2022] [Indexed: 11/25/2022]
Abstract
The DNA in many organisms, including humans, is shown to be organized in topologically associating domains (TADs). In Drosophila, several architectural proteins are enriched at TAD borders, but it is still unclear whether these proteins play a functional role in the formation and maintenance of TADs. Here, we show that depletion of BEAF-32, Cp190, Chro, and Dref leads to changes in TAD organization and chromatin loops. Their depletion predominantly affects TAD borders located in regions moderately enriched in repressive modifications and depleted in active ones, whereas TAD borders located in euchromatin are resilient to these knockdowns. Furthermore, transcriptomic data has revealed hundreds of genes displaying differential expression in these knockdowns and showed that the majority of differentially expressed genes are located within reorganized TADs. Our work identifies a novel and functional role for architectural proteins at TAD borders in Drosophila and a link between TAD reorganization and subsequent changes in gene expression.
Collapse
Affiliation(s)
- Keerthi T Chathoth
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom
| | - Liudmila A Mikheeva
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom.,Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, United Kingdom.,Department of Mathematical Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom
| | - Gilles Crevel
- Department Basic Medical Sciences, St. Georges University London, London SW17 0RE, United Kingdom
| | - Jareth C Wolfe
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom.,Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, United Kingdom.,School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, United Kingdom
| | - Ioni Hunter
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom
| | - Saskia Beckett-Doyle
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom
| | - Sue Cotterill
- Department Basic Medical Sciences, St. Georges University London, London SW17 0RE, United Kingdom
| | - Hongsheng Dai
- Department of Mathematical Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom
| | - Andrew Harrison
- Department of Mathematical Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom
| | - Nicolae Radu Zabet
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, United Kingdom.,Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, United Kingdom
| |
Collapse
|
4
|
Garbuzov FE, Gursky VV. Nonequilibrium model of short-range repression in gene transcription regulation. Phys Rev E 2021; 104:014407. [PMID: 34412298 DOI: 10.1103/physreve.104.014407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 06/24/2021] [Indexed: 11/07/2022]
Abstract
Transcription factors are proteins that regulate gene activity by activating or repressing gene transcription. A special class of transcriptional repressors operates via a short-range mechanism, making local DNA regions inaccessible to binding by activators, and thus providing an indirect repressive action on the target gene. This mechanism is commonly modeled assuming that repressors interact with DNA under thermodynamic equilibrium and neglecting some configurations of the gene regulatory region. We elaborate on a more general nonequilibrium model of short-range repression using the graph formalism for transitions between gene states, and we apply analytical calculations to compare it with the equilibrium model in terms of the repression strength and expression noise. In contrast to the equilibrium approach, the new model allows us to separate two basic mechanisms of short-range repression. The first mechanism is associated with the recruiting of factors that mediate chromatin condensation, and the second one concerns the blocking of factors that mediate chromatin loosening. The nonequilibrium model demonstrates better performance on previously published gene expression data obtained for transcription factors controlling Drosophila development, and furthermore it predicts that the first repression mechanism is the most favorable in this system. The presented approach can be scaled to larger gene networks and can be used to infer specific modes and parameters of transcriptional regulation from gene expression data.
Collapse
Affiliation(s)
- F E Garbuzov
- Ioffe Institute, 26 Polytekhnicheskaya, St. Petersburg 194021, Russia
| | - V V Gursky
- Ioffe Institute, 26 Polytekhnicheskaya, St. Petersburg 194021, Russia
| |
Collapse
|
5
|
Martin PC, Zabet NR. Dissecting the binding mechanisms of transcription factors to DNA using a statistical thermodynamics framework. Comput Struct Biotechnol J 2020; 18:3590-3605. [PMID: 33304457 PMCID: PMC7708957 DOI: 10.1016/j.csbj.2020.11.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 11/02/2020] [Accepted: 11/04/2020] [Indexed: 01/22/2023] Open
Abstract
Transcription Factors (TFs) bind to DNA and control activity of target genes. Here, we present ChIPanalyser, a user-friendly, versatile and powerful R/Bioconductor package predicting and modelling the binding of TFs to DNA. ChIPanalyser performs similarly to state-of-the-art tools, but is an explainable model and provides biological insights into binding mechanisms of TFs. We focused on investigating the binding mechanisms of three TFs that are known architectural proteins CTCF, BEAF-32 and su(Hw) in three Drosophila cell lines (BG3, Kc167 and S2). While CTCF preferentially binds only to a subset of high affinity sites located mainly in open chromatin, BEAF-32 binds to most of its high affinity binding sites available in open chromatin. In contrast, su(Hw) binds to both open chromatin and also partially closed chromatin. Most importantly, differences in TF binding profiles between cell lines for these TFs are mainly driven by differences in DNA accessibility and not by differences in TF concentrations between cell lines. Finally, we investigated binding of Hox TFs in Drosophila and found that Ubx binds only in open chromatin, while Abd-B and Dfd are capable to bind in both open and partially closed chromatin. Overall, our results show that TFs display different binding mechanisms and that our model is able to recapitulate their specific binding behaviour.
Collapse
Affiliation(s)
- Patrick C.N. Martin
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, UK
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Nicolae Radu Zabet
- School of Life Sciences, University of Essex, Colchester CO4 3SQ, UK
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| |
Collapse
|
6
|
Santana-Garcia W, Rocha-Acevedo M, Ramirez-Navarro L, Mbouamboua Y, Thieffry D, Thomas-Chollier M, Contreras-Moreira B, van Helden J, Medina-Rivera A. RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding. Comput Struct Biotechnol J 2019; 17:1415-1428. [PMID: 31871587 PMCID: PMC6906655 DOI: 10.1016/j.csbj.2019.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 09/22/2019] [Accepted: 09/25/2019] [Indexed: 02/06/2023] Open
Abstract
Gene regulatory regions contain short and degenerated DNA binding sites recognized by transcription factors (TFBS). When TFBS harbor SNPs, the DNA binding site may be affected, thereby altering the transcriptional regulation of the target genes. Such regulatory SNPs have been implicated as causal variants in Genome-Wide Association Study (GWAS) studies. In this study, we describe improved versions of the programs Variation-tools designed to predict regulatory variants, and present four case studies to illustrate their usage and applications. In brief, Variation-tools facilitate i) obtaining variation information, ii) interconversion of variation file formats, iii) retrieval of sequences surrounding variants, and iv) calculating the change on predicted transcription factor affinity scores between alleles, using motif scanning approaches. Notably, the tools support the analysis of haplotypes. The tools are included within the well-maintained suite Regulatory Sequence Analysis Tools (RSAT, http://rsat.eu), and accessible through a web interface that currently enables analysis of five metazoa and ten plant genomes. Variation-tools can also be used in command-line with any locally-installed Ensembl genome. Users can input personal collections of variants and motifs, providing flexibility in the analysis.
Collapse
Key Words
- Binding motifs
- CEU, Northern Europeans from Utah
- CRM, Cis-Regulatory Module
- GWAS, Genome Wide Association Studies
- LD, Linkage Disequilibrium
- MPRA, Massively Parallel Reporter Assays: MPRA
- PSSM, Position Specific Scoring Matrix
- Position specific scoring matrix
- ROC, Receiver Operating Characteristic
- RSAT, Regulatory Sequence Analysis Tools
- Regulatory variants
- SNP, Single Nucleotide Polymorphism
- SNPs
- SOIs, SNPs of Interest
- TF, Transcription Factor
- TFBS, Transcription Factor Binding Site
- Transcription factors
- eQTL, Expression Quantitative Trait Loci
- rsID, Reference SNP Identifier
Collapse
Affiliation(s)
- Walter Santana-Garcia
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Maria Rocha-Acevedo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Lucia Ramirez-Navarro
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Yvon Mbouamboua
- Fondation Congolaise pour la Recherche Médicale, Brazzaville, People’s Republic of Congo
- Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
| | - Denis Thieffry
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Morgane Thomas-Chollier
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | | | - Jacques van Helden
- Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
- CNRS, Institut Français de Bioinformatique, IFB-core, UMS 3601, Evry, France
- Corresponding authors at: Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México (Medina-Rivera). Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France (J. van Heldenf).
| | - Alejandra Medina-Rivera
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
- Corresponding authors at: Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México (Medina-Rivera). Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France (J. van Heldenf).
| |
Collapse
|
7
|
Barr K, Reinitz J, Radulescu O. An in silico analysis of robust but fragile gene regulation links enhancer length to robustness. PLoS Comput Biol 2019; 15:e1007497. [PMID: 31730659 PMCID: PMC6881076 DOI: 10.1371/journal.pcbi.1007497] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 11/27/2019] [Accepted: 10/22/2019] [Indexed: 12/31/2022] Open
Abstract
Organisms must ensure that expression of genes is directed to the appropriate tissues at the correct times, while simultaneously ensuring that these gene regulatory systems are robust to perturbation. This idea is captured by a mathematical concept called r-robustness, which says that a system is robust to a perturbation in up to r - 1 randomly chosen parameters. r-robustness implies that the biological system has a small number of sensitive parameters and that this number can be used as a robustness measure. In this work we use this idea to investigate the robustness of gene regulation using a sequence level model of the Drosophila melanogaster gene even-skipped. We consider robustness with respect to mutations of the enhancer sequence and with respect to changes of the transcription factor concentrations. We find that gene regulation is r-robust with respect to mutations in the enhancer sequence and identify a number of sensitive nucleotides. In both natural and in silico predicted enhancers, the number of nucleotides that are sensitive to mutation correlates negatively with the length of the sequence, meaning that longer sequences are more robust. The exact degree of robustness obtained is dependent not only on DNA sequence, but also on the local concentration of regulatory factors. We find that gene regulation can be remarkably sensitive to changes in transcription factor concentrations at the boundaries of expression features, while it is robust to perturbation elsewhere.
Collapse
Affiliation(s)
- Kenneth Barr
- Department of Genetic Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - John Reinitz
- Departments of Statistics, Ecology & Evolution, Molecular Genetics & Cell Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Ovidiu Radulescu
- LPHI UMR CNRS 5235, University of Montpellier, Montpellier, France
| |
Collapse
|
8
|
Naseri G, Behrend J, Rieper L, Mueller-Roeber B. COMPASS for rapid combinatorial optimization of biochemical pathways based on artificial transcription factors. Nat Commun 2019; 10:2615. [PMID: 31197154 PMCID: PMC6565718 DOI: 10.1038/s41467-019-10224-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 04/26/2019] [Indexed: 02/08/2023] Open
Abstract
Balanced expression of multiple genes is central for establishing new biosynthetic pathways or multiprotein cellular complexes. Methods for efficient combinatorial assembly of regulatory sequences (promoters) and protein coding sequences are therefore highly wanted. Here, we report a high-throughput cloning method, called COMPASS for COMbinatorial Pathway ASSembly, for the balanced expression of multiple genes in Saccharomyces cerevisiae. COMPASS employs orthogonal, plant-derived artificial transcription factors (ATFs) and homologous recombination-based cloning for the generation of thousands of individual DNA constructs in parallel. The method relies on a positive selection of correctly assembled pathway variants from both, in vivo and in vitro cloning procedures. To decrease the turnaround time in genomic engineering, COMPASS is equipped with multi-locus CRISPR/Cas9-mediated modification capacity. We demonstrate the application of COMPASS by generating cell libraries producing β-carotene and co-producing β-ionone and biosensor-responsive naringenin. COMPASS will have many applications in synthetic biology projects that require gene expression balancing. Metabolic engineering requires the balancing of gene expression to obtain optimal output. Here the authors present COMPASS – COMbinatorial Pathway ASSembly – which uses plant-derived artificial transcription factors and cloning of thousands of DNA constructs in parallel to rapidly optimise pathways.
Collapse
Affiliation(s)
- Gita Naseri
- University of Potsdam, Cell2Fab Research Unit, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany.,University of Potsdam, Department Molecular Biology, Karl-Liebknecht-Str. 24-25, House 20, 14476, Potsdam, Germany
| | - Jessica Behrend
- University of Potsdam, Cell2Fab Research Unit, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
| | - Lisa Rieper
- University of Potsdam, Cell2Fab Research Unit, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
| | - Bernd Mueller-Roeber
- University of Potsdam, Department Molecular Biology, Karl-Liebknecht-Str. 24-25, House 20, 14476, Potsdam, Germany. .,Max-Planck Institute of Molecular Plant Physiology, Plant Signalling Group, Am Mühlenberg 1, D-14476, Potsdam-Golm, Germany. .,Center of Plant Systems Biology and Biotechnology (CPSBB), Department Plant Development, Ruski Blvd. 139, 4000, Plovdiv, Bulgaria.
| |
Collapse
|
9
|
Chathoth KT, Zabet NR. Chromatin architecture reorganization during neuronal cell differentiation in Drosophila genome. Genome Res 2019; 29:613-625. [PMID: 30709849 PMCID: PMC6442379 DOI: 10.1101/gr.246710.118] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Accepted: 01/23/2019] [Indexed: 12/14/2022]
Abstract
The organization of the genome into topologically associating domains (TADs) was shown to have a regulatory role in development and cellular function, but the mechanism involved in TAD establishment is still unclear. Here, we present the first high-resolution contact map of Drosophila neuronal cells (BG3) and identify different classes of TADs by comparing this to genome organization in embryonic cells (Kc167). We find that only some TADs are conserved in both cell lines, whereas the rest are cell-type–specific. This is supported by a change in the enrichment of architectural proteins at TAD borders, with BEAF-32 present in embryonic cells and CTCF in neuronal cells. Furthermore, we observe strong divergent transcription, together with RNA Polymerase II occupancy and an increase in DNA accessibility at the TAD borders. TAD borders that are specific to neuronal cells are enriched in enhancers controlled by neuronal-specific transcription factors. Our results suggest that TADs are dynamic across developmental stages and reflect the interplay between insulators, transcriptional states, and enhancer activities.
Collapse
Affiliation(s)
- Keerthi T Chathoth
- School of Biological Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Nicolae Radu Zabet
- School of Biological Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| |
Collapse
|
10
|
Lu R, Rogan PK. Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations. F1000Res 2018; 7:1933. [PMID: 31001412 PMCID: PMC6464064 DOI: 10.12688/f1000research.17363.2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/28/2019] [Indexed: 12/20/2022] Open
Abstract
Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML). Methods: Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were selected within DNase I-accessible intervals of corresponding promoter sequences using information theory-based position weight matrices (iPWMs) for each TF. Features from information-dense clusters of TFBSs were input to ML classifiers which predict these gene targets along with their accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed in silico to examine their impact on TFBS clustering and predict changes in gene regulation. Results: The glucocorticoid receptor gene ( NR3C1), whose regulation has been extensively studied, was selected to test this approach. SLC25A32 and TANK exhibited the most similar expression patterns to NR3C1. A Decision Tree classifier exhibited the best performance in detecting such genes, based on Area Under the Receiver Operating Characteristic curve (ROC). TF target gene prediction was confirmed using siRNA knockdown, which was more accurate than CRISPR/CAS9 inactivation. TFBS mutation analyses revealed that accurate target gene prediction required at least 1 information-dense TFBS cluster. Conclusions: ML based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.
Collapse
Affiliation(s)
- Ruipeng Lu
- Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
| | - Peter K. Rogan
- Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada
- Biochemistry, University of Western Ontario, London, Ontario, N6A 5C1, Canada
- Cytognomix, London, Ontario, N5X 3X5, Canada
| |
Collapse
|
11
|
Ma X, Ezer D, Adryan B, Stevens TJ. Canonical and single-cell Hi-C reveal distinct chromatin interaction sub-networks of mammalian transcription factors. Genome Biol 2018; 19:174. [PMID: 30359306 PMCID: PMC6203279 DOI: 10.1186/s13059-018-1558-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 10/04/2018] [Indexed: 12/20/2022] Open
Abstract
Background Transcription factor (TF) binding to regulatory DNA sites is a key determinant of cell identity within multi-cellular organisms and has been studied extensively in relation to site affinity and chromatin modifications. There has been a strong focus on the inference of TF-gene regulatory networks and TF-TF physical interaction networks. Here, we present a third type of TF network, the spatial network of co-localized TF binding sites within the three-dimensional genome. Results Using published canonical Hi-C data and single-cell genome structures, we assess the spatial proximity of a genome-wide array of potential TF-TF co-localizations in human and mouse cell lines. For individual TFs, the abundance of occupied binding sites shows a positive correspondence with their clustering in three dimensions, and this is especially apparent for weak TF binding sites and at enhancer regions. An analysis between different TF proteins identifies significantly proximal pairs, which are enriched in reported physical interactions. Furthermore, clustering of different TFs based on proximity enrichment identifies two partially segregated co-localization sub-networks, involving different TFs in different cell types. Using data from both human lymphoblastoid cells and mouse embryonic stem cells, we find that these sub-networks are enriched within, but not exclusive to, different chromosome sub-compartments that have been identified previously in Hi-C data. Conclusions This suggests that the association of TFs within spatial networks is closely coupled to gene regulatory networks. This applies to both differentiated and undifferentiated cells and is a potential causal link between lineage-specific TF binding and chromosome sub-compartment segregation. Electronic supplementary material The online version of this article (10.1186/s13059-018-1558-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyan Ma
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Daphne Ezer
- The Alan Turing Institute for Data Science, British Library, 96 Euston Rd, Kings Cross, London, NW1 2DB, UK.,Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK
| | - Boris Adryan
- Merck KGaA, Chief Digital Office, 64293, Darmstadt, Germany
| | - Tim J Stevens
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Avenue, Cambridge, CB2 0QH, UK.
| |
Collapse
|
12
|
Khamis AM, Motwalli O, Oliva R, Jankovic BR, Medvedeva YA, Ashoor H, Essack M, Gao X, Bajic VB. A novel method for improved accuracy of transcription factor binding site prediction. Nucleic Acids Res 2018; 46:e72. [PMID: 29617876 PMCID: PMC6037060 DOI: 10.1093/nar/gky237] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Revised: 03/01/2018] [Accepted: 03/20/2018] [Indexed: 12/12/2022] Open
Abstract
Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.
Collapse
Affiliation(s)
- Abdullah M Khamis
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Olaa Motwalli
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Romina Oliva
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
- Department of Sciences and Technologies, University ‘Parthenope’ of Naples, Centro Direzionale Isola C4 80143, Naples, Italy
| | - Boris R Jankovic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Yulia A Medvedeva
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
- Institute of Bioengineering, Research Centre of Biotechnology, Russian Academy of Science, 117312 Moscow, Russia
- Department of Computational Biology, Vavilov Institute of General Genetics, Russian Academy of Science, 119991 Moscow, Russia
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Moscow Region, Russia
| | - Haitham Ashoor
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Magbubah Essack
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| |
Collapse
|
13
|
Hettich J, Gebhardt JCM. Transcription factor target site search and gene regulation in a background of unspecific binding sites. J Theor Biol 2018; 454:91-101. [PMID: 29870697 PMCID: PMC6103292 DOI: 10.1016/j.jtbi.2018.05.037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/29/2018] [Accepted: 05/30/2018] [Indexed: 11/02/2022]
Abstract
Response time and transcription level are vital parameters of gene regulation. They depend on how fast transcription factors (TFs) find and how efficient they occupy their specific target sites. It is well known that target site search is accelerated by TF binding to and sliding along unspecific DNA and that unspecific associations alter the occupation frequency of a gene. However, whether target site search time and occupation frequency can be optimized simultaneously is mostly unclear. We developed a transparent and intuitively accessible state-based formalism to calculate search times to target sites on and occupation frequencies of promoters of arbitrary state structure. Our formalism is based on dissociation rate constants experimentally accessible in live cell experiments. To demonstrate our approach, we consider promoters activated by a single TF, by two coactivators or in the presence of a competitive inhibitor. We find that target site search time and promoter occupancy differentially vary with the unspecific dissociation rate constant. Both parameters can be harmonized by adjusting the specific dissociation rate constant of the TF. However, while measured DNA residence times of various eukaryotic TFs correspond to a fast search time, the occupation frequencies of target sites are generally low. Cells might tolerate low target site occupancies as they enable timely gene regulation in response to a changing environment.
Collapse
Affiliation(s)
- J Hettich
- Institute of Biophysics, Ulm University, Albert-Einstein-Allee 11, Ulm 89081, Germany
| | - J C M Gebhardt
- Institute of Biophysics, Ulm University, Albert-Einstein-Allee 11, Ulm 89081, Germany.
| |
Collapse
|
14
|
Bottani S, Zabet NR, Wendel JF, Veitia RA. Gene Expression Dominance in Allopolyploids: Hypotheses and Models. TRENDS IN PLANT SCIENCE 2018; 23:393-402. [PMID: 29433919 DOI: 10.1016/j.tplants.2018.01.002] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 01/11/2018] [Accepted: 01/15/2018] [Indexed: 05/23/2023]
Abstract
The classical example of nonadditive contributions of the two parents to allopolyploids is nucleolar dominance, which entails silencing of one parental set of ribosomal RNA genes. This has been observed for many other loci. The prevailing explanation for this genome-wide expression disparity is that the two merged genomes differ in their transposable element (TE) complement and in their level of TE-mediated repression of gene expression. Alternatively, and not exclusively, gene expression dominance may arise from mismatches between trans effectors and their targets. Here, we explore quantitative models of regulatory mismatches leading to gene expression dominance. We also suggest that, when pairs of merged genomes are similar from one allopolyploidization event to another, gene-level and genome dominance patterns should also be similar.
Collapse
Affiliation(s)
- Samuel Bottani
- Matière et Systèmes Complexes, UMR 7057, Paris 75013, France; Université Paris Diderot-Paris VII, 75205 Paris Cedex 13, France; These authors contributed equally to this work
| | - Nicolae Radu Zabet
- School of Biological Sciences, University of Essex, Colchester CO4 3SQ, UK; These authors contributed equally to this work
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Reiner A Veitia
- Université Paris Diderot-Paris VII, 75205 Paris Cedex 13, France; Institut Jacques Monod, Université Paris Diderot, CNRS UMR7592, Paris 75013, France.
| |
Collapse
|
15
|
Li J, Sagendorf JM, Chiu TP, Pasi M, Perez A, Rohs R. Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding. Nucleic Acids Res 2018; 45:12877-12887. [PMID: 29165643 PMCID: PMC5728407 DOI: 10.1093/nar/gkx1145] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 10/30/2017] [Indexed: 12/18/2022] Open
Abstract
Uncovering the mechanisms that affect the binding specificity of transcription factors (TFs) is critical for understanding the principles of gene regulation. Although sequence-based models have been used successfully to predict TF binding specificities, we found that including DNA shape information in these models improved their accuracy and interpretability. Previously, we developed a method for modeling DNA binding specificities based on DNA shape features extracted from Monte Carlo (MC) simulations. Prediction accuracies of our models, however, have not yet been compared to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) data or Molecular Dynamics (MD) simulations. Here, we integrated DNA shape information extracted from MC or MD simulations and XRC data into predictive models of TF binding and compared their performance. Models that incorporated structural information consistently showed improved performance over sequence-based models regardless of data source. Furthermore, we derived and validated nine additional DNA shape features beyond our original set of four features. The expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width, is available in our R/Bioconductor package DNAshapeR and enables a comprehensive structural description of the double helix on a genome-wide scale.
Collapse
Affiliation(s)
- Jinsen Li
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Jared M Sagendorf
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Tsu-Pei Chiu
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Marco Pasi
- Centre for Biomolecular Sciences and School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, UK
| | - Alberto Perez
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA
| | - Remo Rohs
- Computational Biology and Bioinformatics Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
16
|
Abstract
Most biological mechanisms involve more than one type of biomolecule, and hence operate not solely at the level of either genome, transcriptome, proteome, metabolome or ionome. Datasets resulting from single-omic analysis are rapidly increasing in throughput and quality, rendering multi-omic studies feasible. These should offer a comprehensive, structured and interactive overview of a biological mechanism. However, combining single-omic datasets in a meaningful manner has so far proved challenging, and the discovery of new biological information lags behind expectation. One reason is that experiments conducted in different laboratories can typically not to be combined without restriction. Second, the interpretation of multi-omic datasets represents a significant challenge by nature, as the biological datasets are heterogeneous not only for technical, but also for biological, chemical, and physical reasons. Here, multi-layer network theory and methods of artificial intelligence might contribute to solve these problems. For the efficient application of machine learning however, biological datasets need to become more systematic, more precise - and much larger. We conclude our review with basic guidelines for the successful set-up of a multi-omic experiment.
Collapse
|
17
|
Li Q, Wang Y, Lai Y, Xu P, Yang Z. HspB5 correlates with poor prognosis in colorectal cancer and prompts epithelial-mesenchymal transition through ERK signaling. PLoS One 2017; 12:e0182588. [PMID: 28796798 PMCID: PMC5552184 DOI: 10.1371/journal.pone.0182588] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 07/20/2017] [Indexed: 01/18/2023] Open
Abstract
Alpha B-crystallin (HspB5) is abnormally expressed in tumor tissues and portends a poor prognosis in cancer patients. However, the role of HspB5 in colorectal cancer (CRC) is still unclear. Seventy CRC patients and 40 healthy volunteers were sampled from August 2012 to March 2015 in order to determine the clinical significance of HspB5. In vitro cellular studies were used to validate its molecular mechanisms in CRC. Our clinical data indicated that HspB5 was up-regulated, and had a positive association with TNM stage CRC patients. The expression level of HspB5 in CRC patients was closely correlated with MMP7 and E-cadherin, two core epithelial–mesenchymal transition (EMT) gene products. The in vitro studies revealed that high HspB5 expression could prompt tumor cell proliferation and invasion, as well as EMT. Gene-microarray analysis suggested three significant signaling pathways (PI3K, p38 and ERK) were involved in HspB5-induced EMT. Signal transduction pathway inhibitors and HspB5 gene knockdown models suggested that HspB5 promotes CRC tumorigenesis and EMT progression through ERK signaling pathways. In summary, HspB5 maybe trigger the EMT in CRC by activating the ERK signaling pathway. It is a potential tumor biomarker for CRC diagnosis and prognosis.
Collapse
Affiliation(s)
- Qinghua Li
- Songjiang Hospital Affiliated Shanghai First People’s Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Yanlan Wang
- Songjiang Hospital Affiliated Shanghai First People’s Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Yuexing Lai
- Songjiang Hospital Affiliated Shanghai First People’s Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Ping Xu
- Songjiang Hospital Affiliated Shanghai First People’s Hospital, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Songjiang Hospital Affiliated to Nanjing Medical University, Nanjing, China
- * E-mail: (ZW); (PX)
| | - Zhiwen Yang
- Songjiang Hospital Affiliated Shanghai First People’s Hospital, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Songjiang Hospital Affiliated to Nanjing Medical University, Nanjing, China
- * E-mail: (ZW); (PX)
| |
Collapse
|
18
|
Vainshtein Y, Rippe K, Teif VB. NucTools: analysis of chromatin feature occupancy profiles from high-throughput sequencing data. BMC Genomics 2017; 18:158. [PMID: 28196481 PMCID: PMC5309995 DOI: 10.1186/s12864-017-3580-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 02/10/2017] [Indexed: 12/21/2022] Open
Abstract
Background Biomedical applications of high-throughput sequencing methods generate a vast amount of data in which numerous chromatin features are mapped along the genome. The results are frequently analysed by creating binary data sets that link the presence/absence of a given feature to specific genomic loci. However, the nucleosome occupancy or chromatin accessibility landscape is essentially continuous. It is currently a challenge in the field to cope with continuous distributions of deep sequencing chromatin readouts and to integrate the different types of discrete chromatin features to reveal linkages between them. Results Here we introduce the NucTools suite of Perl scripts as well as MATLAB- and R-based visualization programs for a nucleosome-centred downstream analysis of deep sequencing data. NucTools accounts for the continuous distribution of nucleosome occupancy. It allows calculations of nucleosome occupancy profiles averaged over several replicates, comparisons of nucleosome occupancy landscapes between different experimental conditions, and the estimation of the changes of integral chromatin properties such as the nucleosome repeat length. Furthermore, NucTools facilitates the annotation of nucleosome occupancy with other chromatin features like binding of transcription factors or architectural proteins, and epigenetic marks like histone modifications or DNA methylation. The applications of NucTools are demonstrated for the comparison of several datasets for nucleosome occupancy in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). Conclusions The typical workflows of data processing and integrative analysis with NucTools reveal information on the interplay of nucleosome positioning with other features such as for example binding of a transcription factor CTCF, regions with stable and unstable nucleosomes, and domains of large organized chromatin K9me2 modifications (LOCKs). As potential limitations and problems we discuss how inter-replicate variability of MNase-seq experiments can be addressed. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3580-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yevhen Vainshtein
- Functional Genomics Group, Fraunhofer Institute for Interfacial Engineering and Biotechnology IGB, Nobelstraße 12, 70569, Stuttgart, Germany.
| | - Karsten Rippe
- Research Group Genome Organization & Function, German Cancer Research Center (DKFZ) and Bioquant, Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Vladimir B Teif
- School of Biological Sciences, University of Essex, Wivenhoe Park, CO4 3SQ, Colchester, UK.
| |
Collapse
|
19
|
Kuznetsov VA. Mathematical Modeling of Avidity Distribution and Estimating General Binding Properties of Transcription Factors from Genome-Wide Binding Profiles. Methods Mol Biol 2017; 1613:193-276. [PMID: 28849563 DOI: 10.1007/978-1-4939-7027-8_9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The shape of the experimental frequency distributions (EFD) of diverse molecular interaction events quantifying genome-wide binding is often skewed to the rare but abundant quantities. Such distributions are systematically deviated from standard power-law functions proposed by scale-free network models suggesting that more explanatory and predictive probabilistic model(s) are needed. Identification of the mechanism-based data-driven statistical distributions that provide an estimation and prediction of binding properties of transcription factors from genome-wide binding profiles is the goal of this analytical survey. Here, we review and develop an analytical framework for modeling, analysis, and prediction of transcription factor (TF) DNA binding properties detected at the genome scale. We introduce a mixture probabilistic model of binding avidity function that includes nonspecific and specific binding events. A method for decomposition of specific and nonspecific TF-DNA binding events is proposed. We show that the Kolmogorov-Waring (KW) probability function (PF), modeling the steady state TF binding-dissociation stochastic process, fits well with the EFD for diverse TF-DNA binding datasets. Furthermore, this distribution predicts total number of TF-DNA binding sites (BSs), estimating specificity and sensitivity as well as other basic statistical features of DNA-TF binding when the experimental datasets are noise-rich and essentially incomplete. The KW distribution fits equally well to TF-DNA binding activity for different TFs including ERE, CREB, STAT1, Nanog, and Oct4. Our analysis reveals that the KW distribution and its generalized form provides the family of power-law-like distributions given in terms of hypergeometric series functions, including standard and generalized Pareto and Waring distributions, providing flexible and common skewed forms of the transcription factor binding site (TFBS) avidity distribution function. We suggest that the skewed binding events may be due to a wide range of evolutionary processes of creating weak avidity TFBS associated with random mutations, while the rare high-avidity binding sites (i.e., high-avidity evolutionarily conserved canonical e-boxes) rarely occurred. These, however, may be positively selected in microevolution.
Collapse
Affiliation(s)
- Vladimir A Kuznetsov
- Bioinformatics Institute, Agency of Science, Technology and Research, 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Singapore. .,School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore.
| |
Collapse
|
20
|
Li Y, Wang L, Zhou J, Li F. Transcription factor organic cation transporter 1 (OCT-1) affects the expression of porcine Klotho (KL) gene. PeerJ 2016; 4:e2186. [PMID: 27478698 PMCID: PMC4950547 DOI: 10.7717/peerj.2186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/07/2016] [Indexed: 01/22/2023] Open
Abstract
Klotho (KL), originally discovered as an aging suppressor, is a membrane protein that shares sequence similarity with the β-glucosidase enzymes. Recent reports showed Klotho might play a role in adipocyte maturation and systemic glucose metabolism. However, little is known about the transcription factors involved in regulating the expression of porcine KL gene. Deletion fragment analysis identified KL-D2 (−418 bp to −3 bp) as the porcine KL core promoter. MARC0022311SNP (A or G) in KL intron 1 was detected in Landrace × DIV pigs using the Porcine SNP60 BeadChip. The pGL-D2-A and pGL-D2-G were constructed with KL-D2 and the intron fragment of different alleles and relative luciferase activity of pGL3-D2-G was significantly higher than that of pGL3-D2-A in the PK cells and ST cells. This was possibly the result of a change in KL binding ability with transcription factor organic cation transporter 1 (OCT-1), which was confirmed using electrophoretic mobility shift assays (EMSA) and chromatin immune-precipitation (ChIP). Moreover, OCT-1 regulated endogenous KL expression by RNA interference experiments. Our study indicates SNP MARC0022311 affects porcine KL expression by regulating its promoter activity via OCT-1.
Collapse
Affiliation(s)
- Yan Li
- Key Laboratory of Pig Genetics and Breeding of Ministry of Agriculture & Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
| | - Lei Wang
- Key Laboratory of Pig Genetics and Breeding of Ministry of Agriculture & Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
| | - Jiawei Zhou
- Key Laboratory of Pig Genetics and Breeding of Ministry of Agriculture & Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
| | - Fenge Li
- Key Laboratory of Pig Genetics and Breeding of Ministry of Agriculture & Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
| |
Collapse
|
21
|
Bottani S, Veitia RA. Hill function-based models of transcriptional switches: impact of specific, nonspecific, functional and nonfunctional binding. Biol Rev Camb Philos Soc 2016; 92:953-963. [PMID: 27061969 DOI: 10.1111/brv.12262] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Revised: 02/12/2016] [Accepted: 02/16/2016] [Indexed: 12/25/2022]
Abstract
We explore minimalist models of transcription in which we take into account that a cis-regulatory sequence is embedded in, and interacts with, a complex genome. The classical Hill equation is the simplest way to represent a transcriptional response. However, it may overlook the fact that a transcription factor (TF) establishes specific and nonspecific nonfunctional interactions with chromatin. Classical papers have shown that nonfunctional binding (not leading to transcription) may influence gene expression. We examine how the presence of additional binding sites for a TF, besides those on the gene(s) of interest, affect the shape and parameters of the transcriptional response. We consider two conditions: at equilibrium and at steady-state. In many cases the TF level is determined by the position of the cell within a spatial or temporal gradient. We show that such gradients can be adjusted by evolutionary selection to compensate for the alteration of the gene transcription response by the presence of nonfunctional binding sites. Finally, we analyse how the transcriptional response is affected by a decrease in TF concentration, as in cases of haploinsufficiency. We show that the nonlinearity of the transcriptional response as a function of [TF] exacerbates the effect of a decrease in the latter, at least for weakly expressed TFs. Although decades of work on TFs have led to the impression that almost everything is known about the control of gene expression, we show that even the simplest models of transcription control have not delivered all their secrets yet.
Collapse
Affiliation(s)
- Samuel Bottani
- Matière et Systèmes Complexes CNRS UMR 7057, 75013 Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, 75013 Paris, France
| | - Reiner A Veitia
- Université Paris Diderot, Sorbonne Paris Cité, 75013 Paris, France.,Institut Jacques Monod, CNRS UMR 7592, 75013 Paris, France
| |
Collapse
|
22
|
Ma X, Ezer D, Navarro C, Adryan B. Reliable scaling of position weight matrices for binding strength comparisons between transcription factors. BMC Bioinformatics 2015; 16:265. [PMID: 26289072 PMCID: PMC4545934 DOI: 10.1186/s12859-015-0666-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 07/08/2015] [Indexed: 01/05/2023] Open
Abstract
Background Scoring DNA sequences against Position Weight Matrices (PWMs) is a widely adopted method to identify putative transcription factor binding sites. While common bioinformatics tools produce scores that can reflect the binding strength between a specific transcription factor and the DNA, these scores are not directly comparable between different transcription factors. Other methods, including p-value associated approaches (Touzet H, Varré J-S. Efficient and accurate p-value computation for position weight matrices. Algorithms Mol Biol. 2007;2(1510.1186):1748–7188), provide more rigorous ways to identify potential binding sites, but their results are difficult to interpret in terms of binding energy, which is essential for the modeling of transcription factor binding dynamics and enhancer activities. Results Here, we provide two different ways to find the scaling parameter λ that allows us to infer binding energy from a PWM score. The first approach uses a PWM and background genomic sequence as input to estimate λ for a specific transcription factor, which we applied to show that λ distributions for different transcription factor families correspond with their DNA binding properties. Our second method can reliably convert λ between different PWMs of the same transcription factor, which allows us to directly compare PWMs that were generated by different approaches. Conclusion These two approaches provide computationally efficient ways to scale PWM scores and estimate the strength of transcription factor binding sites in quantitative studies of binding dynamics. Their results are consistent with each other and previous reports in most of cases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0666-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyan Ma
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. .,Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK.
| | - Daphne Ezer
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. .,Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK.
| | - Carmen Navarro
- Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK. .,Department of Computer Science and Artificial Intelligence, University of Granada, Periodista Daniel Saucedo Aranda, Granada, Spain.
| | - Boris Adryan
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. .,Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK.
| |
Collapse
|