1
|
Choi J, Lee EA. Analysis of REST binding sites with canonical and non-canonical motifs in human cell lines. BMC Med Genomics 2024; 17:92. [PMID: 38632583 PMCID: PMC11025195 DOI: 10.1186/s12920-024-01860-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 03/28/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Repressor element 1 (RE1) silencing transcription factor (REST) is a transcriptional repressor abundantly expressed in aging human brains. It is known to regulate genes associated with oxidative stress, inflammation, and neurological disorders by binding to a canonical form of sequence motif and its non-canonical variations. Although analysis of genomic sequence motifs is crucial to understand transcriptional regulation by transcription factors (TFs), a comprehensive characterization of various forms of RE1 motifs in human cell lines has not been performed. RESULTS Here, we analyzed 23 ENCODE REST ChIP-seq datasets from diverse human cell lines and identified a non-redundant set of 68,975 loci with ChIP-seq peaks. Our systematic characterization of these binding sites revealed that the canonical form of REST binding motif was found primarily in ChIP-seq peaks shared across multiple cell lines, while non-canonical forms of motifs were identified in both cell-line-specific binding sites and those shared across cell lines. Remarkably, we observed a notable prevalence of non-canonical motifs that corresponded to half segments of the canonical motif. Furthermore, our analysis unveiled the presence of cell-line-specific REST binding patterns, as evidenced by the clustering of ChIP-seq experiments according to their respective cell lines. This observation underscores the cell-line specificity of REST binding at certain genomic loci, implying intricate cell-line-specific regulatory mechanisms. CONCLUSIONS Overall, our study provides a comprehensive characterization of REST binding motifs in human cell lines and genome-wide RE1 motif profiles. These findings contribute to a deeper understanding of REST-mediated transcriptional regulation and highlight the importance of considering cell-line-specific effects in future investigations.
Collapse
Affiliation(s)
- Jaejoon Choi
- Division of Genetics and Genomics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA.
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
2
|
Tendolkar A, Mazo-Vargas A, Livraghi L, Hanly JJ, Van Horne KC, Gilbert LE, Martin A. Cis-regulatory modes of Ultrabithorax inactivation in butterfly forewings. eLife 2024; 12:RP90846. [PMID: 38261357 PMCID: PMC10945631 DOI: 10.7554/elife.90846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024] Open
Abstract
Hox gene clusters encode transcription factors that drive regional specialization during animal development: for example the Hox factor Ubx is expressed in the insect metathoracic (T3) wing appendages and differentiates them from T2 mesothoracic identities. Hox transcriptional regulation requires silencing activities that prevent spurious activation and regulatory crosstalks in the wrong tissues, but this has seldom been studied in insects other than Drosophila, which shows a derived Hox dislocation into two genomic clusters that disjoined Antennapedia (Antp) and Ultrabithorax (Ubx). Here, we investigated how Ubx is restricted to the hindwing in butterflies, amidst a contiguous Hox cluster. By analysing Hi-C and ATAC-seq data in the butterfly Junonia coenia, we show that a Topologically Associated Domain (TAD) maintains a hindwing-enriched profile of chromatin opening around Ubx. This TAD is bordered by a Boundary Element (BE) that separates it from a region of joined wing activity around the Antp locus. CRISPR mutational perturbation of this BE releases ectopic Ubx expression in forewings, inducing homeotic clones with hindwing identities. Further mutational interrogation of two non-coding RNA encoding regions and one putative cis-regulatory module within the Ubx TAD cause rare homeotic transformations in both directions, indicating the presence of both activating and repressing chromatin features. We also describe a series of spontaneous forewing homeotic phenotypes obtained in Heliconius butterflies, and discuss their possible mutational basis. By leveraging the extensive wing specialization found in butterflies, our initial exploration of Ubx regulation demonstrates the existence of silencing and insulating sequences that prevent its spurious expression in forewings.
Collapse
Affiliation(s)
- Amruta Tendolkar
- Department of Biological Sciences, The George Washington UniversityWashington, DCUnited States
| | - Anyi Mazo-Vargas
- Department of Biological Sciences, The George Washington UniversityWashington, DCUnited States
| | - Luca Livraghi
- Department of Biological Sciences, The George Washington UniversityWashington, DCUnited States
| | - Joseph J Hanly
- Department of Biological Sciences, The George Washington UniversityWashington, DCUnited States
- Smithsonian Tropical Research InstitutePanama CityPanama
| | - Kelsey C Van Horne
- Department of Biological Sciences, The George Washington UniversityWashington, DCUnited States
| | - Lawrence E Gilbert
- Department of Integrative Biology, University of Texas – AustinAustinUnited States
| | - Arnaud Martin
- Department of Biological Sciences, The George Washington UniversityWashington, DCUnited States
| |
Collapse
|
3
|
Tsukanov AV, Mironova VV, Levitsky VG. Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis. FRONTIERS IN PLANT SCIENCE 2022; 13:938545. [PMID: 35968123 PMCID: PMC9373801 DOI: 10.3389/fpls.2022.938545] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 07/05/2022] [Indexed: 05/15/2023]
Abstract
Position weight matrix (PWM) is the traditional motif model representing the transcription factor (TF) binding sites. It proposes that the positions contribute independently to TFs binding affinity, although this hypothesis does not fit the data perfectly. This explains why PWM hits are missing in a substantial fraction of ChIP-seq peaks. To study various modes of the direct binding of plant TFs, we compiled the benchmark collection of 111 ChIP-seq datasets for Arabidopsis thaliana, and applied the traditional PWM, and two alternative motif models BaMM and SiteGA, proposing the dependencies of the positions. The variation in the stringency of the recognition thresholds for the models proposed that the hits of PWM, BaMM, and SiteGA models are associated with the sites of high/medium, any, and low affinity, respectively. At the medium recognition threshold, about 60% of ChIP-seq peaks contain PWM hits consisting of conserved core consensuses, while BaMM and SiteGA provide hits for an additional 15% of peaks in which a weaker core consensus is compensated through intra-motif dependencies. The presence/absence of these dependencies in the motifs of alternative/traditional models was confirmed by the dependency logo DepLogo visualizing the position-wise partitioning of the alignments of predicted sites. We exemplify the detailed analysis of ChIP-seq profiles for plant TFs CCA1, MYC2, and SEP3. Gene ontology (GO) enrichment analysis revealed that among the three motif models, the SiteGA had the highest portions of genes with the significantly enriched GO terms among all predicted genes. We showed that both alternative motif models provide for traditional PWM greater extensions in predicted sites for TFs MYC2/SEP3 with condition/tissue specific functions, compared to those for TF CCA1 with housekeeping functions. Overall, the combined application of standard and alternative motif models is beneficial to detect various modes of the direct TF-DNA interactions in the maximal portion of ChIP-seq loci.
Collapse
Affiliation(s)
- Anton V. Tsukanov
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
| | - Victoria V. Mironova
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
- Department of Plant Systems Physiology, Radboud Institute for Biological and Environmental Sciences (RIBES), Radboud University, Nijmegen, Netherlands
| | - Victor G. Levitsky
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
- Department of Natural Science, Novosibirsk State University, Novosibirsk, Russia
- *Correspondence: Victor G. Levitsky
| |
Collapse
|
4
|
Tsukanov AV, Levitsky VG, Merkulova TI. Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites. Vavilovskii Zhurnal Genet Selektsii 2021; 25:7. [PMID: 34547062 PMCID: PMC8408018 DOI: 10.18699/vj21.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/10/2021] [Accepted: 01/12/2021] [Indexed: 11/24/2022] Open
Abstract
The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS)
is the positional weight matrix (PWM). However, this model does not take into account dependencies between
nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe,
can do as much. However, application of these models was usually limited only to comparing their recognition
accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This
pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their
classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered
PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a
significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was
26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of
predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks
containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe,
respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity.
We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq
datasets under study.
Collapse
Affiliation(s)
- A V Tsukanov
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
5
|
Biswas A, Narlikar L. A universal framework for detecting cis-regulatory diversity in DNA regulatory regions. Genome Res 2021; 31:1646-1662. [PMID: 34285090 PMCID: PMC8415372 DOI: 10.1101/gr.274563.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 07/09/2021] [Indexed: 12/02/2022]
Abstract
High-throughput sequencing-based assays measure different biochemical activities pertaining to gene regulation, genome-wide. These activities include transcription factor (TF)–DNA binding, enhancer activity, open chromatin, and more. A major goal is to understand underlying sequence components, or motifs, that can explain the measured activity. It is usually not one motif but a combination of motifs bound by cooperatively acting proteins that confers activity to such regions. Furthermore, regions can be diverse, governed by different combinations of TFs/motifs. Current approaches do not take into account this issue of combinatorial diversity. We present a new statistical framework, cisDIVERSITY, which models regions as diverse modules characterized by combinations of motifs while simultaneously learning the motifs themselves. Because cisDIVERSITY does not rely on knowledge of motifs, modules, cell type, or organism, it is general enough to be applied to regions reported by most high-throughput assays. For example, in enhancer predictions resulting from different assays—GRO-cap, STARR-seq, and those measuring chromatin structure—cisDIVERSITY discovers distinct modules and combinations of TF binding sites, some specific to the assay. From protein–DNA binding data, cisDIVERSITY identifies potential cofactors of the profiled TF, whereas from ATAC-seq data, it identifies tissue-specific regulatory modules. Finally, analysis of single-cell ATAC-seq data suggests that regions open in one cell-state encode information about future states, with certain modules staying open and others closing down in the next time point.
Collapse
Affiliation(s)
- Anushua Biswas
- CSIR-National Chemical Laboratory, Academy of Scientific and Innovative Research
| | - Leelavati Narlikar
- CSIR-National Chemical Laboratory, Academy of Scientific and Innovative Research
| |
Collapse
|
6
|
Biswas A, Narlikar L. Resolving diverse protein-DNA footprints from exonuclease-based ChIP experiments. Bioinformatics 2021; 37:i367-i375. [PMID: 34252930 PMCID: PMC8275329 DOI: 10.1093/bioinformatics/btab274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
MOTIVATION High-throughput chromatin immunoprecipitation (ChIP) sequencing-based assays capture genomic regions associated with the profiled transcription factor (TF). ChIP-exo is a modified protocol, which uses lambda exonuclease to digest DNA close to the TF-DNA complex, in order to improve on the positional resolution of the TF-DNA contact. Because the digestion occurs in the 5'-3' orientation, the protocol produces directional footprints close to the complex, on both sides of the double stranded DNA. Like all ChIP-based methods, ChIP-exo reports a mixture of different regions associated with the TF: those bound directly to the TF as well as via intermediaries. However, the distribution of footprints are likely to be indicative of the complex forming at the DNA. RESULTS We present ExoDiversity, which uses a model-based framework to learn a joint distribution over footprints and motifs, thus resolving the mixture of ChIP-exo footprints into diverse binding modes. It uses no prior motif or TF information and automatically learns the number of different modes from the data. We show its application on a wide range of TFs and organisms/cell-types. Because its goal is to explain the complete set of reported regions, it is able to identify co-factor TF motifs that appear in a small fraction of the dataset. Further, ExoDiversity discovers small nucleotide variations within and outside canonical motifs, which co-occur with variations in footprints, suggesting that the TF-DNA structural configuration at those regions is likely to be different. Finally, we show that detected modes have specific DNA shape features and conservation signals, giving insights into the structure and function of the putative TF-DNA complexes. AVAILABILITY AND IMPLEMENTATION The code for ExoDiversity is available on https://github.com/NarlikarLab/exoDIVERSITY. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anushua Biswas
- Department of Chemical Engineering, CSIR-National Chemical Laboratory, Pune 411008, India.,Academy of Scientific and Innovative Research, Ghaziabad 201002, India
| | - Leelavati Narlikar
- Department of Chemical Engineering, CSIR-National Chemical Laboratory, Pune 411008, India.,Academy of Scientific and Innovative Research, Ghaziabad 201002, India
| |
Collapse
|
7
|
The Genome-Wide Binding Profile for Human RE1 Silencing Transcription Factor Unveils a Unique Genetic Circuitry in Hippocampus. J Neurosci 2021; 41:6582-6595. [PMID: 34210779 DOI: 10.1523/jneurosci.2059-20.2021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 05/12/2021] [Accepted: 06/16/2021] [Indexed: 12/18/2022] Open
Abstract
Early studies in mouse neurodevelopment led to the discovery of the RE1 Silencing Transcription Factor (REST) and its role as a master repressor of neuronal gene expression. Recently, REST was reported to also repress neuronal genes in the human adult brain. These genes were found to be involved in pro-apoptotic pathways; and their repression, associated with increased REST levels during aging, were found to be neuroprotective and conserved across species. However, direct genome-wide REST binding profiles for REST in adult brain have not been identified for any species. Here, we apply this approach to mouse and human hippocampus. We find an expansion of REST binding sites in the human hippocampus that are lacking in both mouse hippocampus and other human non-neuronal cell types. The unique human REST binding sites are associated with genes involved in innate immunity processes and inflammation signaling which, on the basis of histology and recent public transcriptomic analyses, suggest that these new target genes are repressed in glia. We propose that the increases in REST expression in mid-adulthood presage the beginning of brain aging, and that human REST function has evolved to protect the longevity and function of both neurons and glia in human brain.SIGNIFICANCE STATEMENT The RE1 Silencing Transcription Factor (REST) repressor has served historically as a model for gene regulation during mouse neurogenesis. Recent studies of REST have also suggested a conserved role for REST repressor function across lower species during aging. However, direct genome-wide studies for REST have been lacking for human brain. Here, we perform the first genome-wide analysis of REST binding in both human and mouse hippocampus. The majority of REST-occupied genes in human hippocampus are distinct from those in mouse. Further, the REST-associated genes unique to human hippocampus represent a new set related to innate immunity and inflammation, where their gene dysregulation has been implicated in aging-related neuropathology, such as Alzheimer's disease.
Collapse
|
8
|
Sreekumar L, Kumari K, Guin K, Bakshi A, Varshney N, Thimmappa BC, Narlikar L, Padinhateeri R, Siddharthan R, Sanyal K. Orc4 spatiotemporally stabilizes centromeric chromatin. Genome Res 2021; 31:607-621. [PMID: 33514624 PMCID: PMC8015856 DOI: 10.1101/gr.265900.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 01/27/2021] [Indexed: 11/24/2022]
Abstract
The establishment of centromeric chromatin and its propagation by the centromere-specific histone CENPA is mediated by epigenetic mechanisms in most eukaryotes. DNA replication origins, origin binding proteins, and replication timing of centromere DNA are important determinants of centromere function. The epigenetically regulated regional centromeres in the budding yeast Candida albicans have unique DNA sequences that replicate earliest in every chromosome and are clustered throughout the cell cycle. In this study, the genome-wide occupancy of the replication initiation protein Orc4 reveals its abundance at all centromeres in C. albicans Orc4 is associated with four different DNA sequence motifs, one of which coincides with tRNA genes (tDNA) that replicate early and cluster together in space. Hi-C combined with genome-wide replication timing analyses identify that early replicating Orc4-bound regions interact with themselves stronger than with late replicating Orc4-bound regions. We simulate a polymer model of chromosomes of C. albicans and propose that the early replicating and highly enriched Orc4-bound sites preferentially localize around the clustered kinetochores. We also observe that Orc4 is constitutively localized to centromeres, and both Orc4 and the helicase Mcm2 are essential for cell viability and CENPA stability in C. albicans Finally, we show that new molecules of CENPA are recruited to centromeres during late anaphase/telophase, which coincides with the stage at which the CENPA-specific chaperone Scm3 localizes to the kinetochore. We propose that the spatiotemporal localization of Orc4 within the nucleus, in collaboration with Mcm2 and Scm3, maintains centromeric chromatin stability and CENPA recruitment in C. albicans.
Collapse
Affiliation(s)
- Lakshmi Sreekumar
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| | - Kiran Kumari
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai 400076, India
- IITB-Monash Research Academy, Mumbai 400076, India
- Department of Chemical Engineering, Monash University, Melbourne 3800, Australia
| | - Krishnendu Guin
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| | - Asif Bakshi
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| | - Neha Varshney
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| | - Bhagya C Thimmappa
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| | - Leelavati Narlikar
- Department of Chemical Engineering, CSIR-National Chemical Laboratory, Pune 411008, India
| | - Ranjith Padinhateeri
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Rahul Siddharthan
- The Institute of Mathematical Sciences/HBNI, Taramani, Chennai 600113, India
| | - Kaustuv Sanyal
- Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
- Graduate School of Frontier Biosciences, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
9
|
Eggeling R. Disentangling transcription factor binding site complexity. Nucleic Acids Res 2019; 46:e121. [PMID: 30085218 PMCID: PMC6237759 DOI: 10.1093/nar/gky683] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 07/17/2018] [Indexed: 12/15/2022] Open
Abstract
The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities via multiple weight matrices. However, both orthogonal approaches have limitations when learning from in vivo data where binding sites of other factors in close proximity can interfere with motif discovery for the protein of interest. In this work, we demonstrate how intra-motif complexity can, purely by analyzing the statistical properties of a given set of TF-binding sites, be distinguished from complexity arising from an intermix with motifs of co-binding TFs or other artifacts. In addition, we study the related question whether intra-motif complexity is represented more effectively by dependencies, heterogeneities or variants in between. Benchmarks demonstrate the effectiveness of both methods for their respective tasks and applications on motif discovery output from recent tools detect and correct many undesirable artifacts. These results further suggest that the prevalence of intra-motif dependencies may have been overestimated in previous studies on in vivo data and should thus be reassessed.
Collapse
Affiliation(s)
- Ralf Eggeling
- Department of Computer Science, University of Helsinki, Gustaf-Hällströmin katu 2b, FIN-00140 Helsinki, Finland
| |
Collapse
|