1
|
Weingarten-Gabbay S, Nir R, Lubliner S, Sharon E, Kalma Y, Weinberger A, Segal E. Systematic interrogation of human promoters. Genome Res 2019; 29:171-183. [PMID: 30622120 PMCID: PMC6360817 DOI: 10.1101/gr.236075.118] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 12/05/2018] [Indexed: 12/19/2022]
Abstract
Despite much research, our understanding of the architecture and cis-regulatory elements of human promoters is still lacking. Here, we devised a high-throughput assay to quantify the activity of approximately 15,000 fully designed sequences that we integrated and expressed from a fixed location within the human genome. We used this method to investigate thousands of native promoters and preinitiation complex (PIC) binding regions followed by in-depth characterization of the sequence motifs underlying promoter activity, including core promoter elements and TF binding sites. We find that core promoters drive transcription mostly unidirectionally and that sequences originating from promoters exhibit stronger activity than those originating from enhancers. By testing multiple synthetic configurations of core promoter elements, we dissect the motifs that positively and negatively regulate transcription as well as the effect of their combinations and distances, including a 10-bp periodicity in the optimal distance between the TATA and the initiator. By comprehensively screening 133 TF binding sites, we find that in contrast to core promoters, TF binding sites maintain similar activity levels in both orientations, supporting a model by which divergent transcription is driven by two distinct unidirectional core promoters sharing bidirectional TF binding sites. Finally, we find a striking agreement between the effect of binding site multiplicity of individual TFs in our assay and their tendency to appear in homotypic clusters throughout the genome. Overall, our study systematically assays the elements that drive expression in core and proximal promoter regions and sheds light on organization principles of regulatory regions in the human genome.
Collapse
Affiliation(s)
- Shira Weingarten-Gabbay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ronit Nir
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Shai Lubliner
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eilon Sharon
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yael Kalma
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Adina Weinberger
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel.,Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
2
|
Chromatin-enriched lncRNAs can act as cell-type specific activators of proximal gene transcription. Nat Struct Mol Biol 2017. [PMID: 28628087 PMCID: PMC5682930 DOI: 10.1038/nsmb.3424] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
We recently described a new class of long noncoding RNA defined by especially tight chromatin association, whose presence is strongly correlated with expression of nearby genes in HEK293 cells. Here we critically examine the generality and cis-enhancer mechanism of this class of chromatin enriched RNA (cheRNA). CheRNA are largely cell-type specific, and remain the most effective chromatin signature for predicting cis-gene transcription in all cell types examined. Targeted depletion of three cheRNAs decreases gene expression of their neighbors, indicating potential co-activator function. Single-molecule FISH of one cheRNA-distal target gene pair suggests spatial overlap consistent with a role in chromosome looping. In another example, the cheRNA HIDALGO stimulates the fetal hemoglobin HBG1 gene during erythroid differentiation by promoting contacts to a downstream enhancer. Our results suggest that many cheRNAs activate proximal, lineage-specific gene transcription.
Collapse
|
3
|
Westermark PO. Linking Core Promoter Classes to Circadian Transcription. PLoS Genet 2016; 12:e1006231. [PMID: 27504829 PMCID: PMC4978467 DOI: 10.1371/journal.pgen.1006231] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 07/08/2016] [Indexed: 01/09/2023] Open
Abstract
Circadian rhythms in transcription are generated by rhythmic abundances and DNA binding activities of transcription factors. Propagation of rhythms to transcriptional initiation involves the core promoter, its chromatin state, and the basal transcription machinery. Here, I characterize core promoters and chromatin states of genes transcribed in a circadian manner in mouse liver and in Drosophila. It is shown that the core promoter is a critical determinant of circadian mRNA expression in both species. A distinct core promoter class, strong circadian promoters (SCPs), is identified in mouse liver but not Drosophila. SCPs are defined by specific core promoter features, and are shown to drive circadian transcriptional activities with both high averages and high amplitudes. Data analysis and mathematical modeling further provided evidence for rhythmic regulation of both polymerase II recruitment and pause release at SCPs. The analysis provides a comprehensive and systematic view of core promoters and their link to circadian mRNA expression in mouse and Drosophila, and thus reveals a crucial role for the core promoter in regulated, dynamic transcription.
Collapse
Affiliation(s)
- Pål O. Westermark
- Institute for Theoretical Biology, Charité –Universitätsmedizin Berlin, Berlin, Germany
- * E-mail:
| |
Collapse
|
4
|
Kim YC, Cui J, Luo J, Xiao F, Downs B, Wang SM. Exome-based Variant Detection in Core Promoters. Sci Rep 2016; 6:30716. [PMID: 27464681 PMCID: PMC4964598 DOI: 10.1038/srep30716] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 07/06/2016] [Indexed: 01/10/2023] Open
Abstract
Core promoter controls the initiation of transcription. Core promoter sequence change can disrupt transcriptional regulation, lead to impairment of gene expression and ultimately diseases. Therefore, comprehensive characterization of core promoters is essential to understand normal and abnormal gene expression in biomedical studies. Here we report the development of EVDC (Exome-based Variant Detection in Core promoters) method for genome-scale analysis of core-promoter sequence variation. This method is based on the fact that exome sequences contain the sequences not only from coding exons but also from non-coding region including core promoters generated by random fragmentation in exome sequencing process. Using exome data from three cell types of CD4+ T cells, CD19+ B cells and neutrophils of a single individual, we characterized the features of core promoter-mapped exome sequences, and analysed core-promoter variation in this individual genome. We also compared the core promoters between YRI (Yoruba in Ibadan, Nigeria) and the CEU (Utah residents of European decedent) populations using the exome data generated by the 1000 Genome project, and observed much higher variation in YRI population than in CEU population. Our study demonstrates that the EVDC method provides a simple but powerful means for genome-wile de novo characterization of core promoter sequence variation.
Collapse
Affiliation(s)
- Yeong C Kim
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, Omaha, NE 68198, USA
| | - Jian Cui
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, Omaha, NE 68198, USA
| | - Jiangtao Luo
- Department of Biostatistics, College of Public Health, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Fengxia Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, Omaha, NE 68198, USA
| | - Bradley Downs
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, Omaha, NE 68198, USA
| | - San Ming Wang
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, Omaha, NE 68198, USA
| |
Collapse
|
5
|
Hashimoto T, Sherwood RI, Kang DD, Rajagopal N, Barkal AA, Zeng H, Emons BJM, Srinivasan S, Jaakkola T, Gifford DK. A synergistic DNA logic predicts genome-wide chromatin accessibility. Genome Res 2016; 26:1430-1440. [PMID: 27456004 PMCID: PMC5052050 DOI: 10.1101/gr.199778.115] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 07/20/2016] [Indexed: 01/27/2023]
Abstract
Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution. We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.
Collapse
Affiliation(s)
- Tatsunori Hashimoto
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | - Richard I Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Daniel D Kang
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | - Nisha Rajagopal
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | - Amira A Barkal
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA; Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Haoyang Zeng
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | - Bart J M Emons
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Sharanya Srinivasan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA; Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | - David K Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
6
|
Fernandez-Valverde SL, Degnan BM. Bilaterian-like promoters in the highly compact Amphimedon queenslandica genome. Sci Rep 2016; 6:22496. [PMID: 26931148 PMCID: PMC4773876 DOI: 10.1038/srep22496] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Accepted: 02/15/2016] [Indexed: 12/13/2022] Open
Abstract
The regulatory systems underlying animal development must have evolved prior to the emergence of eumetazoans (cnidarians and bilaterians). Although representatives of earlier-branching animals - sponges ctenophores and placozoans - possess most of the developmental transcription factor families present in eumetazoans, the DNA regulatory elements that these transcription factors target remain uncharted. Here we characterise the core promoter sequences, U1 snRNP-binding sites (5' splice sites; 5'SSs) and polyadenylation sites (PASs) in the sponge Amphimedon queenslandica. Similar to unicellular opisthokonts, Amphimedon's genes are tightly packed in the genome and have small introns. In contrast, its genes possess metazoan-like core promoters populated with binding motifs previously deemed to be specific to vertebrates, including Nrf-1 and Krüppel-like elements. Also as in vertebrates, Amphimedon's PASs and 5'SSs are depleted downstream and upstream of transcription start sites, respectively, consistent with non-elongating transcripts being short-lived; PASs and 5'SSs are more evenly distributed in bidirectional promoters in Amphimedon. The presence of bilaterian-like regulatory DNAs in sponges is consistent with these being early and essential innovations of the metazoan gene regulatory repertoire.
Collapse
Affiliation(s)
| | - Bernard M Degnan
- School of Biological Sciences, The University of Queensland, Brisbane 4072, Australia
| |
Collapse
|