1
|
Khetan S, Bulyk ML. Overlapping binding sites underlie TF genomic occupancy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.05.583629. [PMID: 38496549 PMCID: PMC10942454 DOI: 10.1101/2024.03.05.583629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Sequence-specific DNA binding by transcription factors (TFs) is a crucial step in gene regulation. However, current high-throughput in vitro approaches cannot reliably detect lower affinity TF-DNA interactions, which play key roles in gene regulation. Here, we developed PADIT-seq ( p rotein a ffinity to D NA by in vitro transcription and RNA seq uencing) to assay TF binding preferences to all 10-bp DNA sequences at far greater sensitivity than prior approaches. The expanded catalogs of low affinity DNA binding sites for the human TFs HOXD13 and EGR1 revealed that nucleotides flanking high affinity DNA binding sites create overlapping lower affinity sites that together modulate TF genomic occupancy in vivo . Formation of such extended recognition sequences stems from an inherent property of TF binding sites to interweave each other and expands the genomic sequence space for identifying noncoding variants that directly alter TF binding. One-Sentence Summary Overlapping DNA binding sites underlie TF genomic occupancy through their inherent propensity to interweave each other.
Collapse
|
2
|
Bishop TR, Onal P, Xu Z, Zheng M, Gunasinghe H, Nien CY, Small S, Datta RR. Multi-level regulation of even-skipped stripes by the ubiquitous factor Zelda. Development 2023; 150:dev201860. [PMID: 37934130 PMCID: PMC10730019 DOI: 10.1242/dev.201860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 10/26/2023] [Indexed: 11/08/2023]
Abstract
The zinc-finger protein Zelda (Zld) is a key activator of zygotic transcription in early Drosophila embryos. Here, we study Zld-dependent regulation of the seven-striped pattern of the pair-rule gene even-skipped (eve). Individual stripes are regulated by discrete enhancers that respond to broadly distributed activators; stripe boundaries are formed by localized repressors encoded by the gap genes. The strongest effects of Zld are on stripes 2, 3 and 7, which are regulated by two enhancers in a 3.8 kb genomic fragment that includes the eve basal promoter. We show that Zld facilitates binding of the activator Bicoid and the gap repressors to this fragment, consistent with its proposed role as a pioneer protein. To test whether the effects of Zld are direct, we mutated all canonical Zld sites in the 3.8 kb fragment, which reduced expression but failed to phenocopy the abolishment of stripes caused by removing Zld in trans. We show that Zld also indirectly regulates the eve stripes by establishing specific gap gene expression boundaries, which provides the embryonic spacing required for proper stripe activation.
Collapse
Affiliation(s)
- Timothy R. Bishop
- Department of Biology, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Pinar Onal
- Department of Molecular Biology and Genetics, Ihsan Dogramaci Bilkent University, Universiteler Mahallesi, 06800 Ankara, Turkey
| | - Zhe Xu
- Department of Biology, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Michael Zheng
- Department of Biology, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Himari Gunasinghe
- Department of Biology, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Chung-Yi Nien
- Department of Life Sciences, National Central University, Taoyuan 32001, Taiwan
| | - Stephen Small
- Department of Biology, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Rhea R. Datta
- Department of Biology, Hamilton College, 198 College Hill Rd., Clinton, NY 13323, USA
| |
Collapse
|
3
|
Martin V, Zhuang F, Zhang Y, Pinheiro K, Gordân R. High-throughput data and modeling reveal insights into the mechanisms of cooperative DNA-binding by transcription factor proteins. Nucleic Acids Res 2023; 51:11600-11612. [PMID: 37889068 PMCID: PMC10681739 DOI: 10.1093/nar/gkad872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 09/21/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Cooperative DNA-binding by transcription factor (TF) proteins is critical for eukaryotic gene regulation. In the human genome, many regulatory regions contain TF-binding sites in close proximity to each other, which can facilitate cooperative interactions. However, binding site proximity does not necessarily imply cooperative binding, as TFs can also bind independently to each of their neighboring target sites. Currently, the rules that drive cooperative TF binding are not well understood. In addition, it is oftentimes difficult to infer direct TF-TF cooperativity from existing DNA-binding data. Here, we show that in vitro binding assays using DNA libraries of a few thousand genomic sequences with putative cooperative TF-binding events can be used to develop accurate models of cooperativity and to gain insights into cooperative binding mechanisms. Using factors ETS1 and RUNX1 as our case study, we show that the distance and orientation between ETS1 sites are critical determinants of cooperative ETS1-ETS1 binding, while cooperative ETS1-RUNX1 interactions show more flexibility in distance and orientation and can be accurately predicted based on the affinity and sequence/shape features of the binding sites. The approach described here, combining custom experimental design with machine-learning modeling, can be easily applied to study the cooperative DNA-binding patterns of any TFs.
Collapse
Affiliation(s)
- Vincentius Martin
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Farica Zhuang
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Yuning Zhang
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
- Program in Computational Biology & Bioinformatics, Durham, NC 27708, USA
| | - Kyle Pinheiro
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
| | - Raluca Gordân
- Department of Computer Science, Durham, NC 27708, USA
- Center for Genomic & Computational Biology, Durham, NC 27708, USA
- Department of Biostatistics & Bioinformatics, Department of Molecular Genetics and Microbiology, Department of Cell Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
4
|
Jimmy JL, Karn R, Kumari S, Sruthilaxmi CB, Pooja S, Emerson IA, Babu S. Rice WRKY13 TF protein binds to motifs in the promoter region to regulate downstream disease resistance-related genes. Funct Integr Genomics 2023; 23:249. [PMID: 37474674 DOI: 10.1007/s10142-023-01167-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 06/22/2023] [Accepted: 07/03/2023] [Indexed: 07/22/2023]
Abstract
In plants, pathogen resistance is brought about by the binding of certain transcription factor (TF) proteins to the cis-elements of certain target genes. These cis-elements are present upstream in the motif of the promoters of each gene. This ensures the binding of a specific TF to a specific promoter, therefore regulating the expression of that gene. Therefore, the study of each promoter sequence of all the rice genes would help identify the target genes of a specific TF. Rice 1 kb upstream promoter sequences of 55,986 annotated genes were analyzed using the Perl program algorithm to detect WRKY13 binding motifs (bm). The resulting genes were grouped using Gene Ontology and gene set enrichment analysis. A gene with more than 4 TF bm in their promoter was selected. Ten genes reported to have a role in rice disease resistance were selected for further analysis. Cis-acting regulatory element analysis was carried out to find the cis-elements and confirm the presence of the corresponding motifs in the promoter sequences of these genes. The 3D structure of WRKY13 TF and the corresponding ten genes were built, and the interacting residues were determined. The binding capacity of WRKY13 to the promoter of these selected genes was analyzed using docking studies. WRKY13 was considered for docking analysis based on the prior reports of autoregulation. Molecular dynamic simulations provided more details regarding the interactions. Expression data revealed the expression of the genes that helped provide the mechanism of interaction. Further co-expression network helped to characterize the interaction of these selected disease resistance-related genes with the WRKY13 TF protein. This study suggests downstream target genes that are regulated by the WRKY13 TF. The molecular mechanism involving the gene network regulated by WRKY13 TF in disease resistance against rice fungal pathogens is explored.
Collapse
Affiliation(s)
- John Lilly Jimmy
- School of Bio Science and Technology, Vellore Institute of Technology, Vellore, 632014, India.
| | - Rohit Karn
- School of Bio Science and Technology, Vellore Institute of Technology, Vellore, 632014, India
| | - Sweta Kumari
- School of Bio Science and Technology, Vellore Institute of Technology, Vellore, 632014, India
| | | | - Singh Pooja
- School of Science, Monash University Malaysia, Bandar Sunway, Selangor, Malaysia
| | - Isaac Arnold Emerson
- School of Bio Science and Technology, Vellore Institute of Technology, Vellore, 632014, India
| | - Subramanian Babu
- VIT School of Agricultural Innovations and Advanced Learning, Vellore Institute of Technology, Vellore, 632014, India
| |
Collapse
|
5
|
Shahein A, López-Malo M, Istomin I, Olson EJ, Cheng S, Maerkl SJ. Systematic analysis of low-affinity transcription factor binding site clusters in vitro and in vivo establishes their functional relevance. Nat Commun 2022; 13:5273. [PMID: 36071116 PMCID: PMC9452512 DOI: 10.1038/s41467-022-32971-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 08/25/2022] [Indexed: 11/10/2022] Open
Abstract
Binding to binding site clusters has yet to be characterized in depth, and the functional relevance of low-affinity clusters remains uncertain. We characterized transcription factor binding to low-affinity clusters in vitro and found that transcription factors can bind concurrently to overlapping sites, challenging the notion of binding exclusivity. Furthermore, small clusters with binding sites an order of magnitude lower in affinity give rise to high mean occupancies at physiologically-relevant transcription factor concentrations. To assess whether the observed in vitro occupancies translate to transcriptional activation in vivo, we tested low-affinity binding site clusters in a synthetic and native gene regulatory network in S. cerevisiae. In both systems, clusters of low-affinity binding sites generated transcriptional output comparable to single or even multiple consensus sites. This systematic characterization demonstrates that clusters of low-affinity binding sites achieve substantial occupancies, and that this occupancy can drive expression in eukaryotic promoters.
Collapse
Affiliation(s)
- Amir Shahein
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Maria López-Malo
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Ivan Istomin
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Evan J Olson
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Shiyu Cheng
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Sebastian J Maerkl
- Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| |
Collapse
|
6
|
Rodriguez K, Do A, Senay-Aras B, Perales M, Alber M, Chen W, Reddy GV. Concentration-dependent transcriptional switching through a collective action of cis-elements. SCIENCE ADVANCES 2022; 8:eabo6157. [PMID: 35947668 PMCID: PMC9365274 DOI: 10.1126/sciadv.abo6157] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 06/23/2022] [Indexed: 06/15/2023]
Abstract
Gene expression specificity of homeobox transcription factors has remained paradoxical. WUSCHEL activates and represses CLAVATA3 transcription at lower and higher concentrations, respectively. We use computational modeling and experimental analysis to investigate the properties of the cis-regulatory module. We find that intrinsically each cis-element can only activate CLAVATA3 at a higher WUSCHEL concentration. However, together, they repress CLAVATA3 at higher WUSCHEL and activate only at lower WUSCHEL, showing that the concentration-dependent interactions among cis-elements regulate both activation and repression. Biochemical experiments show that two adjacent functional cis-elements bind WUSCHEL with higher affinity and dimerize at relatively lower levels. Moreover, increasing the distance between cis-elements prolongs WUSCHEL monomer binding window, resulting in higher CLAVATA3 activation. Our work showing a constellation of optimally spaced cis-elements of defined affinities determining activation and repression thresholds in regulating CLAVATA3 transcription provides a previously unknown mechanism of cofactor-independent regulation of transcription factor binding in mediating gene expression specificity.
Collapse
Affiliation(s)
- Kevin Rodriguez
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA
| | - Albert Do
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA
| | - Betul Senay-Aras
- Department of Mathematics, University of California Riverside, Riverside, CA 92521, USA
- Interdisciplinary Center for Quantitative Modeling in Biology, University of California Riverside, Riverside, CA 92521, USA
| | - Mariano Perales
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA
| | - Mark Alber
- Department of Mathematics, University of California Riverside, Riverside, CA 92521, USA
- Interdisciplinary Center for Quantitative Modeling in Biology, University of California Riverside, Riverside, CA 92521, USA
| | - Weitao Chen
- Department of Mathematics, University of California Riverside, Riverside, CA 92521, USA
- Interdisciplinary Center for Quantitative Modeling in Biology, University of California Riverside, Riverside, CA 92521, USA
| | - G. Venugopala Reddy
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA
- Interdisciplinary Center for Quantitative Modeling in Biology, University of California Riverside, Riverside, CA 92521, USA
| |
Collapse
|
7
|
REDfly: An Integrated Knowledgebase for Insect Regulatory Genomics. INSECTS 2022; 13:insects13070618. [PMID: 35886794 PMCID: PMC9323752 DOI: 10.3390/insects13070618] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/01/2022] [Accepted: 07/06/2022] [Indexed: 11/29/2022]
Abstract
Simple Summary Understanding how genes are regulated is a vital area of current biological research and a crucial adjunct to ongoing efforts to sequence entire genomes. Knowing the DNA sequences responsible for gene regulation—transcriptional cis-regulatory modules (CRMs, e.g., “enhancers”) and transcription factor binding sites (TFBSs)—is important for many areas of research including interpretation and validation of data developed by large-scale genomics projects, providing training data for machine-learning CRM-discovery methods, genome annotation, modeling gene-regulatory networks, studying the evolution of gene regulation, and numerous aspects of the basic biology of transcriptional regulation. Knowledge of insect CRMs is also an important step in developing biotechnology methods for control of insect disease vectors and for eliminating pathogen transmission. The REDfly (Regulatory Element Database for Fly) database integrates all of the available insect cis-regulatory information from multiple sources to provide a comprehensive collection of known regulatory elements. In this paper, we describe REDfly’s basic contents and data model, emphasizing recently added features, and provide illustrated walk-throughs of some common search scenarios. Abstract We provide here an updated description of the REDfly (Regulatory Element Database for Fly) database of transcriptional regulatory elements, a unique resource that provides regulatory annotation for the genome of Drosophila and other insects. The genomic sequences regulating insect gene expression—transcriptional cis-regulatory modules (CRMs, e.g., “enhancers”) and transcription factor binding sites (TFBSs)—are not currently curated by any other major database resources. However, knowledge of such sequences is important, as CRMs play critical roles with respect to disease as well as normal development, phenotypic variation, and evolution. Characterized CRMs also provide useful tools for both basic and applied research, including developing methods for insect control. REDfly, which is the most detailed existing platform for metazoan regulatory-element annotation, includes over 40,000 experimentally verified CRMs and TFBSs along with their DNA sequences, their associated genes, and the expression patterns they direct. Here, we briefly describe REDfly’s contents and data model, with an emphasis on the new features implemented since 2020. We then provide an illustrated walk-through of several common REDfly search use cases.
Collapse
|
8
|
Boytsov A, Abramov S, Makeev VJ, Kulakovskiy IV. Positional weight matrices have sufficient prediction power for analysis of noncoding variants. F1000Res 2022; 11:33. [PMID: 35811788 PMCID: PMC9237556 DOI: 10.12688/f1000research.75471.3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/30/2022] [Indexed: 11/23/2022] Open
Abstract
The position weight matrix, also called the position-specific scoring matrix, is the commonly accepted model to quantify the specificity of transcription factor binding to DNA. Position weight matrices are used in thousands of projects and software tools in regulatory genomics, including computational prediction of the regulatory impact of single-nucleotide variants. Yet, recently Yan et al. reported that "the position weight matrices of most transcription factors lack sufficient predictive power" if applied to the analysis of regulatory variants studied with a newly developed experimental method, SNP-SELEX. Here, we re-analyze the rich experimental dataset obtained by Yan et al. and show that appropriately selected position weight matrices in fact can adequately quantify transcription factor binding to alternative alleles.
Collapse
Affiliation(s)
- Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
- Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
- Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - Vsevolod J. Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
- Moscow Institute of Physics and Technology, Dolgoprudny, 141700, Russian Federation
| | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russian Federation
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, 142290, Russian Federation
| |
Collapse
|
9
|
Cognate DNA Recognition by Engrailed Homeodomain Involves a Conformational Change Controlled via an Electrostatic-Spring-Loaded Latch. Int J Mol Sci 2022; 23:ijms23052412. [PMID: 35269555 PMCID: PMC8910618 DOI: 10.3390/ijms23052412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 02/11/2022] [Accepted: 02/11/2022] [Indexed: 02/01/2023] Open
Abstract
Transcription factors must scan genomic DNA, recognize the cognate sequence of their control element(s), and bind tightly to them. The DNA recognition process is primarily carried out by their DNA binding domains (DBD), which interact with the cognate site with high affinity and more weakly with any other DNA sequence. DBDs are generally thought to bind to their cognate DNA without changing conformation (lock-and-key). Here, we used nuclear magnetic resonance and circular dichroism to investigate the interplay between DNA recognition and DBD conformation in the engrailed homeodomain (enHD), as a model case for the homeodomain family of eukaryotic DBDs. We found that the conformational ensemble of enHD is rather flexible and becomes gradually more disordered as ionic strength decreases following a Debye–Hückel’s dependence. Our analysis indicates that enHD’s response to ionic strength is mediated by a built-in electrostatic spring-loaded latch that operates as a conformational transducer. We also found that, at moderate ionic strengths, enHD changes conformation upon binding to cognate DNA. This change is of larger amplitude and somewhat orthogonal to the response to ionic strength. As a consequence, very high ionic strengths (e.g., 700 mM) block the electrostatic-spring-loaded latch and binding to cognate DNA becomes lock-and-key. However, the interplay between enHD conformation and cognate DNA binding is robust across a range of ionic strengths (i.e., 45 to 300 mM) that covers the physiologically-relevant conditions. Therefore, our results demonstrate the presence of a mechanism for the conformational control of cognate DNA recognition on a eukaryotic DBD. This mechanism can function as a signal transducer that locks the DBD in place upon encountering the cognate site during active DNA scanning. The electrostatic-spring-loaded latch of enHD can also enable the fine control of DNA recognition in response to transient changes in local ionic strength induced by variate physiological processes.
Collapse
|
10
|
Wu X, Liang Y, Gao H, Wang J, Zhao Y, Hua L, Yuan Y, Wang A, Zhang X, Liu J, Zhou J, Meng X, Zhang D, Lin S, Huang X, Han B, Li J, Wang Y. Enhancing rice grain production by manipulating the naturally evolved cis-regulatory element-containing inverted repeat sequence of OsREM20. MOLECULAR PLANT 2021; 14:997-1011. [PMID: 33741527 DOI: 10.1016/j.molp.2021.03.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 01/19/2021] [Accepted: 03/14/2021] [Indexed: 05/05/2023]
Abstract
Grain number per panicle (GNP) is an important agronomic trait that contributes to rice grain yield. Despite its importance in rice breeding, the molecular mechanism underlying GNP regulation remains largely unknown. In this study, we identified a previously unrecognized regulatory gene that controls GNP in rice, Oryza sativa REPRODUCTIVE MERISTEM 20 (OsREM20), which encodes a B3 domain transcription factor. Through genetic analysis and transgenic validation we found that genetic variation in the CArG box-containing inverted repeat (IR) sequence of the OsREM20 promoter alters its expression level and contributes to GNP variation among rice varieties. Furthermore, we revealed that the IR sequence regulates OsREM20 expression by affecting the direct binding of OsMADS34 to the CArG box within the IR sequence. Interestingly, the divergent pOsREM20IR and pOsREM20ΔIR alleles were found to originate from different Oryza rufipogon accessions, and were independently inherited into the japonica and indica subspecies, respectively, during domestication. Importantly, we demonstrated that IR sequence variations in the OsREM20 promoter can be utilized for germplasm improvement through either genome editing or traditional breeding. Taken together, our study characterizes novel genetic variations responsible for GNP diversity in rice, reveals the underlying molecular mechanism in the regulation of agronomically important gene expression, and provides a promising strategy for improving rice production by manipulating the cis-regulatory element-containing IR sequence.
Collapse
Affiliation(s)
- Xiaowei Wu
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan Liang
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, Shandong Agricultural University, Taian, Shandong 271018, China
| | - Hengbin Gao
- College of Life Sciences, Shandong Agricultural University, Taian, Shandong 271018, China
| | - Jiyao Wang
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan Zhao
- National Center for Gene Research, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200233, China
| | - Lekai Hua
- College of Resources and Environment, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yundong Yuan
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Ahong Wang
- National Center for Gene Research, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200233, China
| | - Xiaohui Zhang
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiafan Liu
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jie Zhou
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiangbing Meng
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Dahan Zhang
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shaoyang Lin
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Bin Han
- CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China; National Center for Gene Research, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200233, China
| | - Jiayang Li
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yonghong Wang
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China; University of Chinese Academy of Sciences, Beijing 100049, China; College of Life Sciences, Shandong Agricultural University, Taian, Shandong 271018, China.
| |
Collapse
|
11
|
Carbon Catabolite Repression Governs Diverse Physiological Processes and Development in Aspergillus nidulans. mBio 2021; 13:e0373421. [PMID: 35164551 PMCID: PMC8844935 DOI: 10.1128/mbio.03734-21] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Carbon catabolite repression (CCR) is a common phenomenon of microorganisms that enable efficient utilization of carbon nutrients, critical for the fitness of microorganisms in the wild and for pathogenic species to cause infection. In most filamentous fungal species, the conserved transcription factor CreA/Cre1 mediates CCR. Previous studies demonstrated a primary function for CreA/Cre1 in carbon metabolism; however, the phenotype of creA/cre1 mutants indicated broader roles. The global function and regulatory mechanism of this wide-domain transcription factor has remained elusive. Here, we applied two powerful genomics methods (transcriptome sequencing and chromatin immunoprecipitation sequencing) to delineate the direct and indirect roles of Aspergillus nidulans CreA across diverse physiological processes, including secondary metabolism, iron homeostasis, oxidative stress response, development, N-glycan biosynthesis, unfolded protein response, and nutrient and ion transport. The results indicate intricate connections between the regulation of carbon metabolism and diverse cellular functions. Moreover, our work also provides key mechanistic insights into CreA regulation and identifies CreA as a master regulator controlling many transcription factors of different regulatory networks. The discoveries for this highly conserved transcriptional regulator in a model fungus have important implications for CCR in related pathogenic and industrial species. IMPORTANCE The ability to scavenge and use a wide range of nutrients for growth is crucial for microorganisms' survival in the wild. Carbon catabolite repression (CCR) is a transcriptional regulatory phenomenon of both bacteria and fungi to coordinate the expression of genes required for preferential utilization of carbon sources. Since carbon metabolism is essential for growth, CCR is central to the fitness of microorganisms. In filamentous fungi, CCR is mediated by the conserved transcription factor CreA/Cre1, whose function in carbon metabolism has been well established. However, the global roles and regulatory mechanism of CreA/Cre1 are poorly defined. This study uncovers the direct and indirect functions of CreA in the model organism Aspergillus nidulans over diverse physiological processes and development and provides mechanistic insights into how CreA controls different regulatory networks. The work also reveals an interesting functional divergence between filamentous fungal and yeast CreA/Cre1 orthologues.
Collapse
|
12
|
Peng PC, Khoueiry P, Girardot C, Reddington JP, Garfield DA, Furlong EEM, Sinha S. The Role of Chromatin Accessibility in cis-Regulatory Evolution. Genome Biol Evol 2020; 11:1813-1828. [PMID: 31114856 PMCID: PMC6601868 DOI: 10.1093/gbe/evz103] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2019] [Indexed: 02/07/2023] Open
Abstract
Transcription factor (TF) binding is determined by sequence as well as chromatin accessibility. Although the role of accessibility in shaping TF-binding landscapes is well recorded, its role in evolutionary divergence of TF binding, which in turn can alter cis-regulatory activities, is not well understood. In this work, we studied the evolution of genome-wide binding landscapes of five major TFs in the core network of mesoderm specification, between Drosophila melanogaster and Drosophila virilis, and examined its relationship to accessibility and sequence-level changes. We generated chromatin accessibility data from three important stages of embryogenesis in both Drosophila melanogaster and Drosophila virilis and recorded conservation and divergence patterns. We then used multivariable models to correlate accessibility and sequence changes to TF-binding divergence. We found that accessibility changes can in some cases, for example, for the master regulator Twist and for earlier developmental stages, more accurately predict binding change than is possible using TF-binding motif changes between orthologous enhancers. Accessibility changes also explain a significant portion of the codivergence of TF pairs. We noted that accessibility and motif changes offer complementary views of the evolution of TF binding and developed a combined model that captures the evolutionary data much more accurately than either view alone. Finally, we trained machine learning models to predict enhancer activity from TF binding and used these functional models to argue that motif and accessibility-based predictors of TF-binding change can substitute for experimentally measured binding change, for the purpose of predicting evolutionary changes in enhancer activity.
Collapse
Affiliation(s)
- Pei-Chen Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign.,Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA
| | - Pierre Khoueiry
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.,American University of Beirut (AUB), Department of Biochemistry and Molecular Genetics, Beirut, Lebanon
| | - Charles Girardot
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - James P Reddington
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - David A Garfield
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.,IRI-Life Sciences, Humboldt Universität zu Berlin, Berlin, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign
| |
Collapse
|
13
|
Ross J, Kuzin A, Brody T, Odenwald WF. Mutational analysis of a Drosophila neuroblast enhancer governing nubbin expression during CNS development. Genesis 2018; 56:e23237. [PMID: 30005136 PMCID: PMC6175444 DOI: 10.1002/dvg.23237] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 06/07/2018] [Accepted: 06/22/2018] [Indexed: 11/17/2022]
Abstract
While developmental studies of Drosophila neural stem cell lineages have identified transcription factors (TFs) important to cell identity decisions, currently only an incomplete understanding exists of the cis‐regulatory elements that control the dynamic expression of these TFs. Our previous studies have identified multiple enhancers that regulate the POU‐domain TF paralogs nubbin and pdm‐2 genes. Evolutionary comparative analysis of these enhancers reveals that they each contain multiple conserved sequence blocks (CSBs) that span TF DNA‐binding sites for known regulators of neuroblast (NB) gene expression in addition to novel sequences. This study functionally analyzes the conserved DNA sequence elements within a NB enhancer located within the nubbin gene and highlights a high level of complexity underlying enhancer structure. Mutational analysis has revealed CSBs that are important for enhancer activation and silencing in the developing CNS. We have also observed that adjusting the number and relative positions of the TF binding sites within these CSBs alters enhancer function.
Collapse
Affiliation(s)
- Jermaine Ross
- Neural Cell-Fate Determinants Section, NINDS, NIH, Bethesda, Maryland
| | - Alexander Kuzin
- Neural Cell-Fate Determinants Section, NINDS, NIH, Bethesda, Maryland
| | - Thomas Brody
- Neural Cell-Fate Determinants Section, NINDS, NIH, Bethesda, Maryland
| | - Ward F Odenwald
- Neural Cell-Fate Determinants Section, NINDS, NIH, Bethesda, Maryland
| |
Collapse
|
14
|
Lifanov AP, Kravatskaya GI, Esipova NG. Large-Scale Periodicities in the Nucleotide Sequences of Drosophila Early Developmental Gene Loci. Biophysics (Nagoya-shi) 2017. [DOI: 10.1134/s0006350917060124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
15
|
Li L, Wunderlich Z. An Enhancer's Length and Composition Are Shaped by Its Regulatory Task. Front Genet 2017; 8:63. [PMID: 28588608 PMCID: PMC5440464 DOI: 10.3389/fgene.2017.00063] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 05/08/2017] [Indexed: 12/02/2022] Open
Abstract
Enhancers drive the gene expression patterns required for virtually every process in metazoans. We propose that enhancer length and transcription factor (TF) binding site composition—the number and identity of TF binding sites—reflect the complexity of the enhancer's regulatory task. In development, we define regulatory task complexity as the number of fates specified in a set of cells at once. We hypothesize that enhancers with more complex regulatory tasks will be longer, with more, but less specific, TF binding sites. Larger numbers of binding sites can be arranged in more ways, allowing enhancers to drive many distinct expression patterns, and therefore cell fates, using a finite number of TF inputs. We compare ~100 enhancers patterning the more complex anterior-posterior (AP) axis and the simpler dorsal-ventral (DV) axis in Drosophila and find that the AP enhancers are longer with more, but less specific binding sites than the (DV) enhancers. Using a set of ~3,500 enhancers, we find enhancer length and TF binding site number again increase with increasing regulatory task complexity. Therefore, to be broadly applicable, computational tools to study enhancers must account for differences in regulatory task.
Collapse
Affiliation(s)
- Lily Li
- Department of Developmental and Cell Biology, University of California, IrvineIrvine, CA, United States
| | - Zeba Wunderlich
- Department of Developmental and Cell Biology, University of California, IrvineIrvine, CA, United States
| |
Collapse
|
16
|
Lengyel IM, Morelli LG. Multiple binding sites for transcriptional repressors can produce regular bursting and enhance noise suppression. Phys Rev E 2017; 95:042412. [PMID: 28505727 DOI: 10.1103/physreve.95.042412] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Indexed: 06/07/2023]
Abstract
Cells may control fluctuations in protein levels by means of negative autoregulation, where transcription factors bind DNA sites to repress their own production. Theoretical studies have assumed a single binding site for the repressor, while in most species it is found that multiple binding sites are arranged in clusters. We study a stochastic description of negative autoregulation with multiple binding sites for the repressor. We find that increasing the number of binding sites induces regular bursting of gene products. By tuning the threshold for repression, we show that multiple binding sites can also suppress fluctuations. Our results highlight possible roles for the presence of multiple binding sites of negative autoregulators.
Collapse
Affiliation(s)
- Iván M Lengyel
- Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA)-CONICET-Partner Institute of the Max Planck Society, Polo Científico Tecnológico, Godoy Cruz 2390, C1425FQD, Buenos Aires, Argentina
- Departamento de Física, FCEyN UBA, Ciudad Universitaria, 1428 Buenos Aires, Argentina
| | - Luis G Morelli
- Instituto de Investigación en Biomedicina de Buenos Aires (IBioBA)-CONICET-Partner Institute of the Max Planck Society, Polo Científico Tecnológico, Godoy Cruz 2390, C1425FQD, Buenos Aires, Argentina
- Departamento de Física, FCEyN UBA, Ciudad Universitaria, 1428 Buenos Aires, Argentina
- Max Planck Institute for Molecular Physiology, Department of Systemic Cell Biology, Otto-Hahn-Strasse 11, D-44227 Dortmund, Germany
| |
Collapse
|
17
|
Grossman SR, Zhang X, Wang L, Engreitz J, Melnikov A, Rogov P, Tewhey R, Isakova A, Deplancke B, Bernstein BE, Mikkelsen TS, Lander ES. Systematic dissection of genomic features determining transcription factor binding and enhancer function. Proc Natl Acad Sci U S A 2017; 114:E1291-E1300. [PMID: 28137873 PMCID: PMC5321001 DOI: 10.1073/pnas.1621150114] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Enhancers regulate gene expression through the binding of sequence-specific transcription factors (TFs) to cognate motifs. Various features influence TF binding and enhancer function-including the chromatin state of the genomic locus, the affinities of the binding site, the activity of the bound TFs, and interactions among TFs. However, the precise nature and relative contributions of these features remain unclear. Here, we used massively parallel reporter assays (MPRAs) involving 32,115 natural and synthetic enhancers, together with high-throughput in vivo binding assays, to systematically dissect the contribution of each of these features to the binding and activity of genomic regulatory elements that contain motifs for PPARγ, a TF that serves as a key regulator of adipogenesis. We show that distinct sets of features govern PPARγ binding vs. enhancer activity. PPARγ binding is largely governed by the affinity of the specific motif site and higher-order features of the larger genomic locus, such as chromatin accessibility. In contrast, the enhancer activity of PPARγ binding sites depends on varying contributions from dozens of TFs in the immediate vicinity, including interactions between combinations of these TFs. Different pairs of motifs follow different interaction rules, including subadditive, additive, and superadditive interactions among specific classes of TFs, with both spatially constrained and flexible grammars. Our results provide a paradigm for the systematic characterization of the genomic features underlying regulatory elements, applicable to the design of synthetic regulatory elements or the interpretation of human genetic variation.
Collapse
Affiliation(s)
- Sharon R Grossman
- Broad Institute, Cambridge, MA 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
- Health Sciences and Technology, Harvard Medical School, Boston, MA 02215
| | | | - Li Wang
- Broad Institute, Cambridge, MA 02142
| | - Jesse Engreitz
- Broad Institute, Cambridge, MA 02142
- Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | | | | | - Ryan Tewhey
- Broad Institute, Cambridge, MA 02142
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Cambridge, MA 02138
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
| | - Alina Isakova
- Institute of Bioengineering, CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Bart Deplancke
- Institute of Bioengineering, CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Bradley E Bernstein
- Broad Institute, Cambridge, MA 02142
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114
- Center for Cancer Research, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114
| | - Tarjei S Mikkelsen
- Broad Institute, Cambridge, MA 02142
- Harvard Stem Cell Institute, Harvard University, Cambridge, MA 02138
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138
| | - Eric S Lander
- Broad Institute, Cambridge, MA 02142;
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Systems Biology, Harvard Medical School, Boston, MA 02215
| |
Collapse
|
18
|
Guo Y, Gifford DK. Modular combinatorial binding among human trans-acting factors reveals direct and indirect factor binding. BMC Genomics 2017; 18:45. [PMID: 28061806 PMCID: PMC5219757 DOI: 10.1186/s12864-016-3434-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Accepted: 12/19/2016] [Indexed: 11/25/2022] Open
Abstract
Background The combinatorial binding of trans-acting factors (TFs) to the DNA is critical to the spatial and temporal specificity of gene regulation. For certain regulatory regions, more than one regulatory module (set of TFs that bind together) are combined to achieve context-specific gene regulation. However, previous approaches are limited to either pairwise TF co-association analysis or assuming that only one module is used in each regulatory region. Results We present a new computational approach that models the modular organization of TF combinatorial binding. Our method learns compact and coherent regulatory modules from in vivo binding data using a topic model. We found that the binding of 115 TFs in K562 cells can be organized into 49 interpretable modules. Furthermore, we found that tens of thousands of regulatory regions use multiple modules, a structure that cannot be observed with previous hard clustering based methods. The modules discovered recapitulate many published protein-protein physical interactions, have consistent functional annotations of chromatin states, and uncover context specific co-binding such as gene proximal binding of NFY + FOS + SP and distal binding of NFY + FOS + USF. For certain TFs, the co-binding partners of direct binding (motif present) differs from those of indirect binding (motif absent); the distinct set of co-binding partners can predict whether the TF binds directly or indirectly with up to 95% accuracy. Joint analysis across two cell types reveals both cell-type-specific and shared regulatory modules. Conclusions Our results provide comprehensive cell-type-specific combinatorial binding maps and suggest a modular organization of combinatorial binding. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3434-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuchun Guo
- MIT, Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, 02139, USA
| | - David K Gifford
- MIT, Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, 02139, USA.
| |
Collapse
|
19
|
Preger-Ben Noon E, Davis FP, Stern DL. Evolved Repression Overcomes Enhancer Robustness. Dev Cell 2016; 39:572-584. [PMID: 27840106 DOI: 10.1016/j.devcel.2016.10.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 07/26/2016] [Accepted: 10/14/2016] [Indexed: 12/18/2022]
Abstract
Biological systems display extraordinary robustness. Robustness of transcriptional enhancers results mainly from clusters of binding sites for the same transcription factor, and it is not clear how robust enhancers can evolve loss of expression through point mutations. Here, we report the high-resolution functional dissection of a robust enhancer of the shavenbaby gene that has contributed to morphological evolution. We found that robustness is encoded by many binding sites for the transcriptional activator Arrowhead and that, during evolution, some of these activator sites were lost, weakening enhancer activity. Complete silencing of enhancer function, however, required evolution of a binding site for the spatially restricted potent repressor Abrupt. These findings illustrate that recruitment of repressor binding sites can overcome enhancer robustness and may minimize pleiotropic consequences of enhancer evolution. Recruitment of repression may be a general mode of evolution to break robust regulatory linkages.
Collapse
Affiliation(s)
- Ella Preger-Ben Noon
- Janelia Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA.
| | - Fred P Davis
- Janelia Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - David L Stern
- Janelia Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA.
| |
Collapse
|
20
|
Dror I, Rohs R, Mandel-Gutfreund Y. How motif environment influences transcription factor search dynamics: Finding a needle in a haystack. Bioessays 2016; 38:605-12. [PMID: 27192961 PMCID: PMC5023137 DOI: 10.1002/bies.201600005] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Transcription factors (TFs) have to find their binding sites, which are distributed throughout the genome. Facilitated diffusion is currently the most widely accepted model for this search process. Based on this model the TF alternates between one-dimensional sliding along the DNA, and three-dimensional bulk diffusion. In this view, the non-specific associations between the proteins and the DNA play a major role in the search dynamics. However, little is known about how the DNA properties around the motif contribute to the search. Accumulating evidence showing that TF binding sites are embedded within a unique environment, specific to each TF, leads to the hypothesis that the search process is facilitated by favorable DNA features that help to improve the search efficiency. Here, we review the field and present the hypothesis that TF-DNA recognition is dictated not only by the motif, but is also influenced by the environment in which the motif resides.
Collapse
Affiliation(s)
- Iris Dror
- Department of Biology, Technion - Israel Institute of Technology, Technion City, Haifa, Israel.,Departments of Biological Sciences, Chemistry, Physics, and Computer Science, Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, USA
| | - Remo Rohs
- Departments of Biological Sciences, Chemistry, Physics, and Computer Science, Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, USA
| | - Yael Mandel-Gutfreund
- Department of Biology, Technion - Israel Institute of Technology, Technion City, Haifa, Israel
| |
Collapse
|
21
|
Lifanov AP, Makeev VJ, Esipova NG. Conserved sections of the transcription regulatory modules in Drosophila early genes, including homotypic transcription factor-binding sites, are arranged with an 84-nt period, which corresponds to the superhelical turn length of nucleosomal DNA. Biophysics (Nagoya-shi) 2016. [DOI: 10.1134/s0006350916010139] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
22
|
Gurdziel K, Lorberbaum DS, Udager AM, Song JY, Richards N, Parker DS, Johnson LA, Allen BL, Barolo S, Gumucio DL. Identification and Validation of Novel Hedgehog-Responsive Enhancers Predicted by Computational Analysis of Ci/Gli Binding Site Density. PLoS One 2015; 10:e0145225. [PMID: 26710299 PMCID: PMC4692483 DOI: 10.1371/journal.pone.0145225] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Accepted: 12/01/2015] [Indexed: 01/20/2023] Open
Abstract
The Hedgehog (Hh) signaling pathway directs a multitude of cellular responses during embryogenesis and adult tissue homeostasis. Stimulation of the pathway results in activation of Hh target genes by the transcription factor Ci/Gli, which binds to specific motifs in genomic enhancers. In Drosophila, only a few enhancers (patched, decapentaplegic, wingless, stripe, knot, hairy, orthodenticle) have been shown by in vivo functional assays to depend on direct Ci/Gli regulation. All but one (orthodenticle) contain more than one Ci/Gli site, prompting us to directly test whether homotypic clustering of Ci/Gli binding sites is sufficient to define a Hh-regulated enhancer. We therefore developed a computational algorithm to identify Ci/Gli clusters that are enriched over random expectation, within a given region of the genome. Candidate genomic regions containing Ci/Gli clusters were functionally tested in chicken neural tube electroporation assays and in transgenic flies. Of the 22 Ci/Gli clusters tested, seven novel enhancers (and the previously known patched enhancer) were identified as Hh-responsive and Ci/Gli-dependent in one or both of these assays, including: Cuticular protein 100A (Cpr100A); invected (inv), which encodes an engrailed-related transcription factor expressed at the anterior/posterior wing disc boundary; roadkill (rdx), the fly homolog of vertebrate Spop; the segment polarity gene gooseberry (gsb); and two previously untested regions of the Hh receptor-encoding patched (ptc) gene. We conclude that homotypic Ci/Gli clustering is not sufficient information to ensure Hh-responsiveness; however, it can provide a clue for enhancer recognition within putative Hedgehog target gene loci.
Collapse
Affiliation(s)
- Katherine Gurdziel
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
- Department of Computational Medicine and Bioinformatics, The University of Michigan, Ann Arbor, MI 48109, United States of America
| | - David S. Lorberbaum
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
- Cellular and Molecular Biology Program, The University of Michigan, Ann Arbor, MI 48109, United States of America
| | - Aaron M. Udager
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
| | - Jane Y. Song
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
- Cellular and Molecular Biology Program, The University of Michigan, Ann Arbor, MI 48109, United States of America
| | - Neil Richards
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
| | - David S. Parker
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
| | - Lisa A. Johnson
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
| | - Benjamin L. Allen
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
- * E-mail: (DLG); (SB); (BLA)
| | - Scott Barolo
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
- * E-mail: (DLG); (SB); (BLA)
| | - Deborah L. Gumucio
- Department of Cell and Developmental Biology, The University of Michigan, Ann Arbor, MI 48109, United States of America
- * E-mail: (DLG); (SB); (BLA)
| |
Collapse
|
23
|
Kozlov K, Gursky VV, Kulakovskiy IV, Dymova A, Samsonova M. Analysis of functional importance of binding sites in the Drosophila gap gene network model. BMC Genomics 2015; 16 Suppl 13:S7. [PMID: 26694511 PMCID: PMC4686791 DOI: 10.1186/1471-2164-16-s13-s7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The statistical thermodynamics based approach provides a promising framework for construction of the genotype-phenotype map in many biological systems. Among important aspects of a good model connecting the DNA sequence information with that of a molecular phenotype (gene expression) is the selection of regulatory interactions and relevant transcription factor bindings sites. As the model may predict different levels of the functional importance of specific binding sites in different genomic and regulatory contexts, it is essential to formulate and study such models under different modeling assumptions. RESULTS We elaborate a two-layer model for the Drosophila gap gene network and include in the model a combined set of transcription factor binding sites and concentration dependent regulatory interaction between gap genes hunchback and Kruppel. We show that the new variants of the model are more consistent in terms of gene expression predictions for various genetic constructs in comparison to previous work. We quantify the functional importance of binding sites by calculating their impact on gene expression in the model and calculate how these impacts correlate across all sites under different modeling assumptions. CONCLUSIONS The assumption about the dual interaction between hb and Kr leads to the most consistent modeling results, but, on the other hand, may obscure existence of indirect interactions between binding sites in regulatory regions of distinct genes. The analysis confirms the previously formulated regulation concept of many weak binding sites working in concert. The model predicts a more or less uniform distribution of functionally important binding sites over the sets of experimentally characterized regulatory modules and other open chromatin domains.
Collapse
Affiliation(s)
- Konstantin Kozlov
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| | - Vitaly V Gursky
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
- Ioffe Institute, 26 Polytechnicheskaya, 194021 St.Petersburg, Russia
| | - Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, 32 Vavilova, 119991 Moscow, Russia
| | - Arina Dymova
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| | - Maria Samsonova
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| |
Collapse
|
24
|
Liu L, Zhao W, Zhou X. Modeling co-occupancy of transcription factors using chromatin features. Nucleic Acids Res 2015; 44:e49. [PMID: 26590261 PMCID: PMC4797273 DOI: 10.1093/nar/gkv1281] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Accepted: 11/04/2015] [Indexed: 12/11/2022] Open
Abstract
Regulation of gene expression requires both transcription factor (TFs) and epigenetic modifications, and interplays between the two types of factors have been discovered. However study of relationships between chromatin features and TF–TF co-occupancy remains limited. Here, we revealed the relationship by first illustrating distinct profile patterns of chromatin features related to different binding events, including single TF binding and TF–TF co-occupancy of 71 TFs from five human cell lines. We further implemented statistical analyses to demonstrate the relationship by accurately predicting co-occupancy genome-widely using chromatin features including DNase I hypersensitivity, 11 histone modifications (HMs) and GC content. Remarkably, our results showed that the combination of chromatin features enables accurate predictions across the five cells. For individual chromatin features, DNase I enables high and consistent predictions. H3K27ac, H3K4me 2, H3K4me3 and H3K9ac are more reliable predictors than other HMs. Although the combination of 11 HMs achieves accurate predictions, their predictive ability varies considerably when a model obtained from one cell is applied to others, indicating relationship between HMs and TF–TF co-occupancy is cell type dependent. GC content is not a reliable predictor, but the addition of GC content to any other features enhances their predictive ability. Together, our results elucidate a strong relationship between TF–TF co-occupancy and chromatin features.
Collapse
Affiliation(s)
- Liang Liu
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Weiling Zhao
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Xiaobo Zhou
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| |
Collapse
|
25
|
Payne JL, Wagner A. Mechanisms of mutational robustness in transcriptional regulation. Front Genet 2015; 6:322. [PMID: 26579194 PMCID: PMC4621482 DOI: 10.3389/fgene.2015.00322] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 10/10/2015] [Indexed: 12/17/2022] Open
Abstract
Robustness is the invariance of a phenotype in the face of environmental or genetic change. The phenotypes produced by transcriptional regulatory circuits are gene expression patterns that are to some extent robust to mutations. Here we review several causes of this robustness. They include robustness of individual transcription factor binding sites, homotypic clusters of such sites, redundant enhancers, transcription factors, redundant transcription factors, and the wiring of transcriptional regulatory circuits. Such robustness can either be an adaptation by itself, a byproduct of other adaptations, or the result of biophysical principles and non-adaptive forces of genome evolution. The potential consequences of such robustness include complex regulatory network topologies that arise through neutral evolution, as well as cryptic variation, i.e., genotypic divergence without phenotypic divergence. On the longest evolutionary timescales, the robustness of transcriptional regulation has helped shape life as we know it, by facilitating evolutionary innovations that helped organisms such as flowering plants and vertebrates diversify.
Collapse
Affiliation(s)
- Joshua L Payne
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich Zurich, Switzerland ; Swiss Institute of Bioinformatics Lausanne, Switzerland
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich Zurich, Switzerland ; Swiss Institute of Bioinformatics Lausanne, Switzerland ; The Santa Fe Institute Santa Fe, NM, USA
| |
Collapse
|
26
|
Musayev FN, Zarate-Perez F, Bishop C, Burgner JW, Escalante CR. Structural Insights into the Assembly of the Adeno-associated Virus Type 2 Rep68 Protein on the Integration Site AAVS1. J Biol Chem 2015; 290:27487-99. [PMID: 26370092 DOI: 10.1074/jbc.m115.669960] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Indexed: 11/06/2022] Open
Abstract
Adeno-associated virus (AAV) is the only eukaryotic virus with the property of establishing latency by integrating site-specifically into the human genome. The integration site known as AAVS1 is located in chromosome 19 and contains multiple GCTC repeats that are recognized by the AAV non-structural Rep proteins. These proteins are multifunctional, with an N-terminal origin-binding domain (OBD) and a helicase domain joined together by a short linker. As a first step to understand the process of site-specific integration, we proceeded to characterize the recognition and assembly of Rep68 onto the AAVS1 site. We first determined the x-ray structure of AAV-2 Rep68 OBD in complex with the AAVS1 DNA site. Specificity is achieved through the interaction of a glycine-rich loop that binds the major groove and an α-helix that interacts with a downstream minor groove on the same face of the DNA. Although the structure shows a complex with three OBD molecules bound to the AAVS1 site, we show by using analytical centrifugation and electron microscopy that the full-length Rep68 forms a heptameric complex. Moreover, we determined that a minimum of two direct repeats is required to form a stable complex and to melt DNA. Finally, we show that although the individual domains bind DNA poorly, complex assembly requires oligomerization and cooperation between its OBD, helicase, and the linker domains.
Collapse
Affiliation(s)
- Faik N Musayev
- From the Department of Medicinal Chemistry, School of Pharmacy, and
| | - Francisco Zarate-Perez
- Department of Physiology and Biophysics, School of Medicine, Virginia Commonwealth University, Richmond, Virginia 23298
| | - Clayton Bishop
- Department of Physiology and Biophysics, School of Medicine, Virginia Commonwealth University, Richmond, Virginia 23298
| | - John W Burgner
- Department of Physiology and Biophysics, School of Medicine, Virginia Commonwealth University, Richmond, Virginia 23298
| | - Carlos R Escalante
- Department of Physiology and Biophysics, School of Medicine, Virginia Commonwealth University, Richmond, Virginia 23298
| |
Collapse
|
27
|
Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast. PLoS Comput Biol 2015; 11:e1004418. [PMID: 26291518 PMCID: PMC4546298 DOI: 10.1371/journal.pcbi.1004418] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 06/29/2015] [Indexed: 11/19/2022] Open
Abstract
Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome. Identification of transcription factor binding sites based on sequence motifs is typically accompanied by a high false positive rate. Increasing evidence suggests that there are many other factors besides DNA sequence that may affect the binding and interaction of TFs with DNA. Through the integration of sequence motif, chromatin state, and DNA structure properties, we show that TF binding can be better predicted. Moreover, considering chromatin state and DNA structure properties simultaneously yields a significant improvement. While the binding of some TFs can be readily predicted using either chromatin state information or DNA structure, other TFs need both. Thus, our findings provide insights on how different histone modifications and DNA structure properties may influence the binding of a particular TF and thus how TFs regulate gene expression. These features are referred to as sequence “intrinsic properties” because they can be predicted from sequences alone. These intrinsic properties can be used to build a TF binding prediction model that has a similar performance to considering all features. Moreover, the intrinsic property model allows TFBS predictions not only across TFs, but also across DNA-binding domain families that are present in most eukaryotes, suggesting that the model likely can be used across species.
Collapse
|
28
|
Dror I, Golan T, Levy C, Rohs R, Mandel-Gutfreund Y. A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res 2015; 25:1268-80. [PMID: 26160164 PMCID: PMC4561487 DOI: 10.1101/gr.184671.114] [Citation(s) in RCA: 92] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2014] [Accepted: 07/08/2015] [Indexed: 12/12/2022]
Abstract
Transcriptional regulation requires the binding of transcription factors (TFs) to short sequence-specific DNA motifs, usually located at the gene regulatory regions. Interestingly, based on a vast amount of data accumulated from genomic assays, it has been shown that only a small fraction of all potential binding sites containing the consensus motif of a given TF actually bind the protein. Recent in vitro binding assays, which exclude the effects of the cellular environment, also demonstrate selective TF binding. An intriguing conjecture is that the surroundings of cognate binding sites have unique characteristics that distinguish them from other sequences containing a similar motif that are not bound by the TF. To test this hypothesis, we conducted a comprehensive analysis of the sequence and DNA shape features surrounding the core-binding sites of 239 and 56 TFs extracted from in vitro HT-SELEX binding assays and in vivo ChIP-seq data, respectively. Comparing the nucleotide content of the regions around the TF-bound sites to the counterpart unbound regions containing the same consensus motifs revealed significant differences that extend far beyond the core-binding site. Specifically, the environment of the bound motifs demonstrated unique sequence compositions, DNA shape features, and overall high similarity to the core-binding motif. Notably, the regions around the binding sites of TFs that belong to the same TF families exhibited similar features, with high agreement between the in vitro and in vivo data sets. We propose that these unique features assist in guiding TFs to their cognate binding sites.
Collapse
Affiliation(s)
- Iris Dror
- Faculty of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel; Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Tamar Golan
- Department of Human Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Carmit Levy
- Department of Human Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Yael Mandel-Gutfreund
- Faculty of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| |
Collapse
|
29
|
Henry KF, Kawashima T, Goldberg RB. A cis-regulatory module activating transcription in the suspensor contains five cis-regulatory elements. PLANT MOLECULAR BIOLOGY 2015; 88:207-17. [PMID: 25796517 PMCID: PMC4441743 DOI: 10.1007/s11103-015-0308-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 03/13/2015] [Indexed: 05/08/2023]
Abstract
Little is known about the molecular mechanisms by which the embryo proper and suspensor of plant embryos activate specific gene sets shortly after fertilization. We analyzed the upstream region of the Scarlet Runner Bean (Phaseolus coccineus) G564 gene in order to understand how genes are activated specifically in the suspensor during early embryo development. Previously, we showed that a 54-bp fragment of the G564 upstream region is sufficient for suspensor transcription and contains at least three required cis-regulatory sequences, including the 10-bp motif (5'-GAAAAGCGAA-3'), the 10 bp-like motif (5'-GAAAAACGAA-3'), and Region 2 motif (partial sequence 5'-TTGGT-3'). Here, we use site-directed mutagenesis experiments in transgenic tobacco globular-stage embryos to identify two additional cis-regulatory elements within the 54-bp cis-regulatory module that are required for G564 suspensor transcription: the Fifth motif (5'-GAGTTA-3') and a third 10-bp-related sequence (5'-GAAAACCACA-3'). Further deletion of the 54-bp fragment revealed that a 47-bp fragment containing the five motifs (the 10-bp, 10-bp-like, 10-bp-related, Region 2 and Fifth motifs) is sufficient for suspensor transcription, and represents a cis-regulatory module. A consensus sequence for each type of motif was determined by comparing motif sequences shown to activate suspensor transcription. Phylogenetic analyses suggest that the regulation of G564 is evolutionarily conserved. A homologous cis-regulatory module was found upstream of the G564 ortholog in the Common Bean (Phaseolus vulgaris), indicating that the regulation of G564 is evolutionarily conserved in closely related bean species.
Collapse
Affiliation(s)
- Kelli F. Henry
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA 90095-7239 USA
| | - Tomokazu Kawashima
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA 90095-7239 USA
- Present Address: Gregor Mendel Institute, Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Robert B. Goldberg
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, 610 Charles E. Young Dr. East, Los Angeles, CA 90095-7239 USA
| |
Collapse
|
30
|
Homotypic clustering of OsMYB4 binding site motifs in promoters of the rice genome and cellular-level implications on sheath blight disease resistance. Gene 2015; 561:209-18. [DOI: 10.1016/j.gene.2015.02.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 02/08/2015] [Accepted: 02/12/2015] [Indexed: 11/18/2022]
|
31
|
Suryamohan K, Halfon MS. Identifying transcriptional cis-regulatory modules in animal genomes. WILEY INTERDISCIPLINARY REVIEWS. DEVELOPMENTAL BIOLOGY 2015; 4:59-84. [PMID: 25704908 PMCID: PMC4339228 DOI: 10.1002/wdev.168] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 11/04/2014] [Accepted: 11/16/2014] [Indexed: 11/08/2022]
Abstract
UNLABELLED Gene expression is regulated through the activity of transcription factors (TFs) and chromatin-modifying proteins acting on specific DNA sequences, referred to as cis-regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis-regulatory modules (CRMs), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily identifiable sequence characteristics, and for many years were not amenable to high-throughput discovery methods. However, the recent availability of complete genome sequences and the development of next-generation sequencing methods have led to an explosion of both computational and empirical methods for CRM discovery in model and nonmodel organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against TFs or histone post-translational modifications, identification of nucleosome-depleted 'open' chromatin regions, or sequencing-based high-throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted TF-binding sites, and supervised machine-learning approaches trained on known CRMs. All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false-positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. For further resources related to this article, please visit the WIREs website. CONFLICT OF INTEREST The authors have declared no conflicts of interest for this article.
Collapse
Affiliation(s)
- Kushal Suryamohan
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics and Life Sciences, Buffalo, NY 14203, USA
| | - Marc S. Halfon
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics and Life Sciences, Buffalo, NY 14203, USA
- Molecular and Cellular Biology Department and Program in Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
32
|
Taher L, Narlikar L, Ovcharenko I. Identification and computational analysis of gene regulatory elements. Cold Spring Harb Protoc 2015; 2015:pdb.top083642. [PMID: 25561628 PMCID: PMC5885252 DOI: 10.1101/pdb.top083642] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Over the last two decades, advances in experimental and computational technologies have greatly facilitated genomic research. Next-generation sequencing technologies have made de novo sequencing of large genomes affordable, and powerful computational approaches have enabled accurate annotations of genomic DNA sequences. Charting functional regions in genomes must account for not only the coding sequences, but also noncoding RNAs, repetitive elements, chromatin states, epigenetic modifications, and gene regulatory elements. A mix of comparative genomics, high-throughput biological experiments, and machine learning approaches has played a major role in this truly global effort. Here we describe some of these approaches and provide an account of our current understanding of the complex landscape of the human genome. We also present overviews of different publicly available, large-scale experimental data sets and computational tools, which we hope will prove beneficial for researchers working with large and complex genomes.
Collapse
Affiliation(s)
- Leila Taher
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, 18051 Rostock, Germany
| | - Leelavati Narlikar
- Chemical Engineering and Process Development Division, National Chemical Laboratory, CSIR, Pune 411008, India
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| |
Collapse
|
33
|
Lifanov AP, Makeev VJ, Esipova NG. Conserved double-stranded DNA regions (“cophased blocks”) of transcriptional regulatory modules are close in space due to phasing relative to the nucleosome DNA superhelix. Biophysics (Nagoya-shi) 2015. [DOI: 10.1134/s0006350915010182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
34
|
Crocker J, Abe N, Rinaldi L, McGregor AP, Frankel N, Wang S, Alsawadi A, Valenti P, Plaza S, Payre F, Mann RS, Stern DL. Low affinity binding site clusters confer hox specificity and regulatory robustness. Cell 2014; 160:191-203. [PMID: 25557079 DOI: 10.1016/j.cell.2014.11.041] [Citation(s) in RCA: 235] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Revised: 09/11/2014] [Accepted: 11/13/2014] [Indexed: 11/26/2022]
Abstract
In animals, Hox transcription factors define regional identity in distinct anatomical domains. How Hox genes encode this specificity is a paradox, because different Hox proteins bind with high affinity in vitro to similar DNA sequences. Here, we demonstrate that the Hox protein Ultrabithorax (Ubx) in complex with its cofactor Extradenticle (Exd) bound specifically to clusters of very low affinity sites in enhancers of the shavenbaby gene of Drosophila. These low affinity sites conferred specificity for Ubx binding in vivo, but multiple clustered sites were required for robust expression when embryos developed in variable environments. Although most individual Ubx binding sites are not evolutionarily conserved, the overall enhancer architecture-clusters of low affinity binding sites-is maintained and required for enhancer function. Natural selection therefore works at the level of the enhancer, requiring a particular density of low affinity Ubx sites to confer both specific and robust expression.
Collapse
Affiliation(s)
- Justin Crocker
- Janelia Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Namiko Abe
- Columbia University Medical Center, 701 West 168(th) Street, HHSC 1104, New York, NY 10032, USA
| | - Lucrezia Rinaldi
- Columbia University Medical Center, 701 West 168(th) Street, HHSC 1104, New York, NY 10032, USA
| | - Alistair P McGregor
- Department of Biological and Medical Sciences, Oxford Brookes University, Gipsy Lane, Oxford OX3 0BP, UK
| | - Nicolás Frankel
- Departamento de Ecología, Genética y Evolución, IEGEBA-CONICET, Facultad, de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad, Universitaria, Pabellón 2, 1428 Buenos Aires, Argentina
| | - Shu Wang
- New Jersey Neuroscience Institute, 65 James Street, Edison, NJ 08820, USA
| | - Ahmad Alsawadi
- Centre de Biologie du Développement, Université de Toulouse, UPS, 31062 Cedex 9, France; CNRS, UMR5547, Centre de Biologie du Développement, Toulouse, 31062 Cedex 9, France
| | - Philippe Valenti
- Centre de Biologie du Développement, Université de Toulouse, UPS, 31062 Cedex 9, France; CNRS, UMR5547, Centre de Biologie du Développement, Toulouse, 31062 Cedex 9, France
| | - Serge Plaza
- Centre de Biologie du Développement, Université de Toulouse, UPS, 31062 Cedex 9, France; CNRS, UMR5547, Centre de Biologie du Développement, Toulouse, 31062 Cedex 9, France
| | - François Payre
- Centre de Biologie du Développement, Université de Toulouse, UPS, 31062 Cedex 9, France; CNRS, UMR5547, Centre de Biologie du Développement, Toulouse, 31062 Cedex 9, France
| | - Richard S Mann
- Columbia University Medical Center, 701 West 168(th) Street, HHSC 1104, New York, NY 10032, USA.
| | - David L Stern
- Janelia Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA.
| |
Collapse
|
35
|
Mironova VV, Omelyanchuk NA, Wiebe DS, Levitsky VG. Computational analysis of auxin responsive elements in the Arabidopsis thaliana L. genome. BMC Genomics 2014; 15 Suppl 12:S4. [PMID: 25563792 PMCID: PMC4331925 DOI: 10.1186/1471-2164-15-s12-s4] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Auxin responsive elements (AuxRE) were found in upstream regions of target genes for ARFs (Auxin response factors). While Chip-seq data for most of ARFs are still unavailable, prediction of potential AuxRE is restricted by consensus models that detect too many false positive sites. Using sequence analysis of experimentally proven AuxREs, we revealed both an extended nucleotide context pattern for AuxRE itself and three distinct types of its coupling motifs (Y-patch, AuxRE-like, and ABRE-like), which together with AuxRE may form the composite elements. Computational analysis of the genome-wide distribution of the predicted AuxREs and their impact on auxin responsive gene expression allowed us to conclude that: (1) AuxREs are enriched around the transcription start site with the maximum density in 5'UTR; (2) AuxREs mediate auxin responsive up-regulation, not down-regulation. (3) Directly oriented single AuxREs and reverse multiple AuxREs are mostly associated with auxin responsiveness. In the composite AuxRE elements associated with auxin response, ABRE-like and Y-patch are 5'-flanking or overlapping AuxRE, whereas AuxRE-like motif is 3'-flanking. The specificity in location and orientation of the coupling elements suggests them as potential binding sites for ARFs partners.
Collapse
|
36
|
Abstract
BACKGROUND The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. RESULTS We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. CONCLUSIONS The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3) functional important sites are not exclusively located in cis-regulatory elements, but are rather dispersed through regulatory region. It is of importance that some of the sites with high functional impact in hb, Kr and kni regulatory regions coincide with strong sites annotated and verified in Dnase I footprint assays.
Collapse
Affiliation(s)
- Konstantin Kozlov
- St.Petersburg State Polytechnical University, Polytekhnicheskaya 29, 195251 St.Petersburg, Russia
| | - Vitaly Gursky
- Ioffe Physical-Technical Institute, RAS, Polytekhnicheskaya 26, 194021 St.Petersburg, Russia
| | - Ivan Kulakovskiy
- Engelhardt Institute of Molecular Biology, RAS, Vavilov 32, 119991 Moscow, Russia
| | - Maria Samsonova
- St.Petersburg State Polytechnical University, Polytekhnicheskaya 29, 195251 St.Petersburg, Russia
| |
Collapse
|
37
|
Ezer D, Zabet NR, Adryan B. Homotypic clusters of transcription factor binding sites: A model system for understanding the physical mechanics of gene expression. Comput Struct Biotechnol J 2014; 10:63-9. [PMID: 25349675 PMCID: PMC4204428 DOI: 10.1016/j.csbj.2014.07.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The organization of binding sites in cis-regulatory elements (CREs) can influence gene expression through a combination of physical mechanisms, ranging from direct interactions between TF molecules to DNA looping and transient chromatin interactions. The study of simple and common building blocks in promoters and other CREs allows us to dissect how all of these mechanisms work together. Many adjacent TF binding sites for the same TF species form homotypic clusters, and these CRE architecture building blocks serve as a prime candidate for understanding interacting transcriptional mechanisms. Homotypic clusters are prevalent in both bacterial and eukaryotic genomes, and are present in both promoters as well as more distal enhancer/silencer elements. Here, we review previous theoretical and experimental studies that show how the complexity (number of binding sites) and spatial organization (distance between sites and overall distance from transcription start sites) of homotypic clusters influence gene expression. In particular, we describe how homotypic clusters modulate the temporal dynamics of TF binding, a mechanism that can affect gene expression, but which has not yet been sufficiently characterized. We propose further experiments on homotypic clusters that would be useful in developing mechanistic models of gene expression.
Collapse
Affiliation(s)
- Daphne Ezer
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Nicolae Radu Zabet
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Boris Adryan
- Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
| |
Collapse
|
38
|
Hypoxia-inducible factor 2 alpha is essential for hepatic outgrowth and functions via the regulation of leg1 transcription in the zebrafish embryo. PLoS One 2014; 9:e101980. [PMID: 25000307 PMCID: PMC4084947 DOI: 10.1371/journal.pone.0101980] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Accepted: 06/13/2014] [Indexed: 12/11/2022] Open
Abstract
The liver plays a vital role in metabolism, detoxification, digestion, and the maintenance of homeostasis. During development, the vertebrate embryonic liver undergoes a series of morphogenic processes known as hepatogenesis. Hepatogenesis can be separated into three interrelated processes: endoderm specification, hepatoblast differentiation, and hepatic outgrowth. Throughout this process, signaling molecules and transcription factors initiate and regulate the coordination of cell proliferation, apoptosis, differentiation, intercellular adhesion, and cell migration. Hifs are already recognized to be essential in embryonic development, but their role in hepatogenesis remains unknown. Using the zebrafish embryo as a model organism, we report that the lack of Hif2-alpha but not Hif1-alpha blocks hepatic outgrowth. While Hif2-alpha is not involved in hepatoblast specification, this transcription factor regulates hepatocyte cell proliferation during hepatic outgrowth. Furthermore, we demonstrated that the lack of Hif2-alpha can reduce the expression of liver-enriched gene 1 (leg1), which encodes a secretory protein essential for hepatic outgrowth. Additionally, exogenous mRNA expression of leg1 can rescue the small liver phenotype of hif2-alpha morphants. We also showed that Hif2-alpha directly binds to the promoter region of leg1 to control leg1 expression. Interestingly, we discovered overrepresented, high-density Hif-binding sites in the potential upstream regulatory sequences of leg1 in teleosts but not in terrestrial mammals. We concluded that hif2-alpha is a key factor required for hepatic outgrowth and regulates leg1 expression in zebrafish embryos. We also proposed that the hif2-alpha-leg1 axis in liver development may have resulted from the adaptation of teleosts to their environment.
Collapse
|
39
|
Xu Z, Chen H, Ling J, Yu D, Struffi P, Small S. Impacts of the ubiquitous factor Zelda on Bicoid-dependent DNA binding and transcription in Drosophila. Genes Dev 2014; 28:608-21. [PMID: 24637116 PMCID: PMC3967049 DOI: 10.1101/gad.234534.113] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The Drosophila transcription factor Bicoid (Bcd) binds thousands of genomic sites during early embryogenesis, but it is unclear how many of these binding events are functionally important. Here, Small and colleagues test the role of the maternal factor Zelda (Zld) in Bcd-mediated binding and transcription. Embryos lacking Zld show enhanced Bcd binding to a subset of genomic locations, causing early activation of target genes normally silent until later stages. This study demonstrates a critical role for Zld in controlling Bcd binding and target gene activation in the early embryo. In vivo cross-linking studies suggest that the Drosophila transcription factor Bicoid (Bcd) binds to several thousand sites during early embryogenesis, but it is not clear how many of these binding events are functionally important. In contrast, reporter gene studies have identified >60 Bcd-dependent enhancers, all of which contain clusters of the consensus binding sequence TAATCC. These studies also identified clusters of TAATCC motifs (inactive fragments) that failed to drive Bcd-dependent activation. In general, active fragments showed higher levels of Bcd binding in vivo and were enriched in predicted binding sites for the ubiquitous maternal protein Zelda (Zld). Here we tested the role of Zld in Bcd-mediated binding and transcription. Removal of Zld function and mutations in Zld sites caused significant reductions in Bcd binding to known enhancers and variable effects on the activation and spatial positioning of Bcd-dependent expression patterns. Also, insertion of Zld sites converted one of six inactive fragments into a Bcd-responsive enhancer. Genome-wide binding experiments in zld mutants showed variable effects on Bcd-binding peaks, ranging from strong reductions to significantly enhanced levels of binding. Increases in Bcd binding caused the precocious Bcd-dependent activation of genes that are normally not expressed in early embryos, suggesting that Zld controls the genome-wide binding profile of Bcd at the qualitative level and is critical for selecting target genes for activation in the early embryo. These results underscore the importance of combinatorial binding in enhancer function and provide data that will help predict regulatory activities based on DNA sequence.
Collapse
Affiliation(s)
- Zhe Xu
- Department of Biology, New York University, New York, New York 10003, USA
| | | | | | | | | | | |
Collapse
|
40
|
Samee MAH, Sinha S. Quantitative modeling of a gene's expression from its intergenic sequence. PLoS Comput Biol 2014; 10:e1003467. [PMID: 24604095 PMCID: PMC3945089 DOI: 10.1371/journal.pcbi.1003467] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 12/18/2013] [Indexed: 11/18/2022] Open
Abstract
Modeling a gene's expression from its intergenic locus and trans-regulatory context is a fundamental goal in computational biology. Owing to the distributed nature of cis-regulatory information and the poorly understood mechanisms that integrate such information, gene locus modeling is a more challenging task than modeling individual enhancers. Here we report the first quantitative model of a gene's expression pattern as a function of its locus. We model the expression readout of a locus in two tiers: 1) combinatorial regulation by transcription factors bound to each enhancer is predicted by a thermodynamics-based model and 2) independent contributions from multiple enhancers are linearly combined to fit the gene expression pattern. The model does not require any prior knowledge about enhancers contributing toward a gene's expression. We demonstrate that the model captures the complex multi-domain expression patterns of anterior-posterior patterning genes in the early Drosophila embryo. Altogether, we model the expression patterns of 27 genes; these include several gap genes, pair-rule genes, and anterior, posterior, trunk, and terminal genes. We find that the model-selected enhancers for each gene overlap strongly with its experimentally characterized enhancers. Our findings also suggest the presence of sequence-segments in the locus that would contribute ectopic expression patterns and hence were "shut down" by the model. We applied our model to identify the transcription factors responsible for forming the stripe boundaries of the studied genes. The resulting network of regulatory interactions exhibits a high level of agreement with known regulatory influences on the target genes. Finally, we analyzed whether and why our assumption of enhancer independence was necessary for the genes we studied. We found a deterioration of expression when binding sites in one enhancer were allowed to influence the readout of another enhancer. Thus, interference between enhancer activities was a possible factor necessitating enhancer independence in our model.
Collapse
Affiliation(s)
- Md. Abul Hassan Samee
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (MAHS); (SS)
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (MAHS); (SS)
| |
Collapse
|
41
|
Burgess D, Freeling M. The most deeply conserved noncoding sequences in plants serve similar functions to those in vertebrates despite large differences in evolutionary rates. THE PLANT CELL 2014; 26:946-61. [PMID: 24681619 PMCID: PMC4001403 DOI: 10.1105/tpc.113.121905] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
In vertebrates, conserved noncoding elements (CNEs) are functionally constrained sequences that can show striking conservation over >400 million years of evolutionary distance and frequently are located megabases away from target developmental genes. Conserved noncoding sequences (CNSs) in plants are much shorter, and it has been difficult to detect conservation among distantly related genomes. In this article, we show not only that CNS sequences can be detected throughout the eudicot clade of flowering plants, but also that a subset of 37 CNSs can be found in all flowering plants (diverging ∼170 million years ago). These CNSs are functionally similar to vertebrate CNEs, being highly associated with transcription factor and development genes and enriched in transcription factor binding sites. Some of the most highly conserved sequences occur in genes encoding RNA binding proteins, particularly the RNA splicing-associated SR genes. Differences in sequence conservation between plants and animals are likely to reflect differences in the biology of the organisms, with plants being much more able to tolerate genomic deletions and whole-genome duplication events due, in part, to their far greater fecundity compared with vertebrates.
Collapse
|
42
|
Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data. BMC Genomics 2014; 15:80. [PMID: 24472686 PMCID: PMC4234207 DOI: 10.1186/1471-2164-15-80] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 01/25/2014] [Indexed: 02/07/2023] Open
Abstract
Background ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models. Results Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets. Conclusions The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-80) contains supplementary material, which is available to authorized users.
Collapse
|
43
|
Erceg J, Saunders TE, Girardot C, Devos DP, Hufnagel L, Furlong EEM. Subtle changes in motif positioning cause tissue-specific effects on robustness of an enhancer's activity. PLoS Genet 2014; 10:e1004060. [PMID: 24391522 PMCID: PMC3879207 DOI: 10.1371/journal.pgen.1004060] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2013] [Accepted: 11/11/2013] [Indexed: 12/14/2022] Open
Abstract
Deciphering the specific contribution of individual motifs within cis-regulatory modules (CRMs) is crucial to understanding how gene expression is regulated and how this process is affected by sequence variation. But despite vast improvements in the ability to identify where transcription factors (TFs) bind throughout the genome, we are limited in our ability to relate information on motif occupancy to function from sequence alone. Here, we engineered 63 synthetic CRMs to systematically assess the relationship between variation in the content and spacing of motifs within CRMs to CRM activity during development using Drosophila transgenic embryos. In over half the cases, very simple elements containing only one or two types of TF binding motifs were capable of driving specific spatio-temporal patterns during development. Different motif organizations provide different degrees of robustness to enhancer activity, ranging from binary on-off responses to more subtle effects including embryo-to-embryo and within-embryo variation. By quantifying the effects of subtle changes in motif organization, we were able to model biophysical rules that explain CRM behavior and may contribute to the spatial positioning of CRM activity in vivo. For the same enhancer, the effects of small differences in motif positions varied in developmentally related tissues, suggesting that gene expression may be more susceptible to sequence variation in one tissue compared to another. This result has important implications for human eQTL studies in which many associated mutations are found in cis-regulatory regions, though the mechanism for how they affect tissue-specific gene expression is often not understood.
Collapse
Affiliation(s)
- Jelena Erceg
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Timothy E. Saunders
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Charles Girardot
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Damien P. Devos
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Lars Hufnagel
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Eileen E. M. Furlong
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
- * E-mail:
| |
Collapse
|
44
|
Khan MR, Ali GM. Functional evolution of cis-regulatory modules of STMADS11 superclade MADS-box genes. PLANT MOLECULAR BIOLOGY 2013; 83:489-506. [PMID: 23860795 DOI: 10.1007/s11103-013-0105-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Accepted: 07/03/2013] [Indexed: 06/02/2023]
Abstract
Evolution of phenotypic morphologies is closely associated with modular organization of cis-regulatory elements underlying expression divergence. The MADS-box gene family is the subject of extensive studies that try to unscramble the structural complexity of flowering plants. This study is envisaged to explore the potential of CRMs in highly constrained non-coding elements of STMADS11superclade MADS-box genes in expression divergence. Phylogenetic reconstruction differentiated the STMADS11 genes into SVP-like, ZMM19-like, MPF1-like and MPF2-like clades. Differential gene expression in vegetative and floral organs was evident within the clades as well as at inter-clade level. The genomic DNA search for clusters of short motifs and sequence conservation of the -2 kb promoter region of particularly, MPF2-like clade permitted to establish three well defined CRMs where transcription factors bind, being CRM1 the activator, CRM2 the repressor, and CRM3 the enhancer element. Similar clusters were also mapped in the large 1st introns in the coding region. Within these CRMs many transcription factor-binding sites, particularly the hotspots for MADS-domain TF binding elements--CArG-boxes, directing sepal specific expression in Arabidopsis--were accrued in the CRM1 of MPF2-like promoters. Site-directed mutagenesis and motif swapping through reporter assays allude towards their implication as functionally active elements. In terms of directional evolution of MPF2-like promoters, CRMs are significantly more conserved than flanking regions, hence, bearing the signatures for purifying selection. Thus, CRMs are the pervasive feature of STMADS11 genes and mutations and/or appearance of new transcription factor binding sites and position of the CRMs are responsible for the divergence in expression patterns in this clade. These results have implications in understanding functional evolution of cis-regulatory modules in plants.
Collapse
Affiliation(s)
- Muhammad Ramzan Khan
- National Institute for Genomics and Advanced Biotechnology (NIGAB), National Agricultural Research Centre, Park Road, Islamabad, Pakistan,
| | | |
Collapse
|
45
|
Bardet AF, Steinmann J, Bafna S, Knoblich JA, Zeitlinger J, Stark A. Identification of transcription factor binding sites from ChIP-seq data at high resolution. Bioinformatics 2013; 29:2705-13. [PMID: 23980024 PMCID: PMC3799470 DOI: 10.1093/bioinformatics/btt470] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2013] [Revised: 07/28/2013] [Accepted: 08/07/2013] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Chromatin immunoprecipitation coupled to next-generation sequencing (ChIP-seq) is widely used to study the in vivo binding sites of transcription factors (TFs) and their regulatory targets. Recent improvements to ChIP-seq, such as increased resolution, promise deeper insights into transcriptional regulation, yet require novel computational tools to fully leverage their advantages. RESULTS To this aim, we have developed peakzilla, which can identify closely spaced TF binding sites at high resolution (i.e. resolves individual binding sites even if spaced closely), as we demonstrate using semisynthetic datasets, performing ChIP-seq for the TF Twist in Drosophila embryos with different experimental fragment sizes, and analyzing ChIP-exo datasets. We show that the increased resolution reached by peakzilla is highly relevant, as closely spaced Twist binding sites are strongly enriched in transcriptional enhancers, suggesting a signature to discriminate functional from abundant non-functional or neutral TF binding. Peakzilla is easy to use, as it estimates all the necessary parameters from the data and is freely available. AVAILABILITY AND IMPLEMENTATION The peakzilla program is available from https://github.com/steinmann/peakzilla or http://www.starklab.org/data/peakzilla/. CONTACT stark@starklab.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anaïs F Bardet
- Research Institute of Molecular Pathology (IMP), Institute of Molecular Biotechnology (IMBA), Vienna, Austria and Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | | | | | | | | |
Collapse
|
46
|
Xie D, Boyle AP, Wu L, Zhai J, Kawli T, Snyder M. Dynamic trans-acting factor colocalization in human cells. Cell 2013; 155:713-24. [PMID: 24243024 DOI: 10.1016/j.cell.2013.09.043] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Revised: 07/13/2013] [Accepted: 08/27/2013] [Indexed: 01/02/2023]
Abstract
Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.
Collapse
Affiliation(s)
- Dan Xie
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | | | | | | | | |
Collapse
|
47
|
Kazemian M, Pham H, Wolfe SA, Brodsky MH, Sinha S. Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res 2013; 41:8237-52. [PMID: 23847101 PMCID: PMC3783179 DOI: 10.1093/nar/gkt598] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Regulation of eukaryotic gene transcription is often combinatorial in nature, with multiple transcription factors (TFs) regulating common target genes, often through direct or indirect mutual interactions. Many individual examples of cooperative binding by directly interacting TFs have been identified, but it remains unclear how pervasive this mechanism is during animal development. Cooperative TF binding should be manifest in genomic sequences as biased arrangements of TF-binding sites. Here, we explore the extent and diversity of such arrangements related to gene regulation during Drosophila embryogenesis. We used the DNA-binding specificities of 322 TFs along with chromatin accessibility information to identify enriched spacing and orientation patterns of TF-binding site pairs. We developed a new statistical approach for this task, specifically designed to accurately assess inter-site spacing biases while accounting for the phenomenon of homotypic site clustering commonly observed in developmental regulatory regions. We observed a large number of short-range distance preferences between TF-binding site pairs, including examples where the preference depends on the relative orientation of the binding sites. To test whether these binding site patterns reflect physical interactions between the corresponding TFs, we analyzed 27 TF pairs whose binding sites exhibited short distance preferences. In vitro protein–protein binding experiments revealed that >65% of these TF pairs can directly interact with each other. For five pairs, we further demonstrate that they bind cooperatively to DNA if both sites are present with the preferred spacing. This study demonstrates how DNA-binding motifs can be used to produce a comprehensive map of sequence signatures for different mechanisms of combinatorial TF action.
Collapse
Affiliation(s)
- Majid Kazemian
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA, Laboratory of Molecular Immunology and Immunology Center, National Heart Lung and Blood Institute, National Institutes of Health, MD, USA, Program in Gene Function and Expression, University of Massachusetts Medical School, MA, USA, Department of Biochemistry and Molecular Pharmacology University of Massachusetts Medical School, MA, USA, Department of Molecular Medicine, University of Massachusetts Medical School, MA, USA and Institute of Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | | | | | | |
Collapse
|
48
|
Gata3 directly regulates early inner ear expression of Fgf10. Dev Biol 2013; 374:210-22. [DOI: 10.1016/j.ydbio.2012.11.028] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2012] [Revised: 11/23/2012] [Accepted: 11/26/2012] [Indexed: 01/19/2023]
|
49
|
Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, Bajic VB, Makeev VJ. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res 2012; 41:D195-202. [PMID: 23175603 PMCID: PMC3531053 DOI: 10.1093/nar/gks1089] [Citation(s) in RCA: 156] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.
Collapse
Affiliation(s)
- Ivan V Kulakovskiy
- Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russia.
| | | | | | | | | | | | | |
Collapse
|
50
|
Hansen L, Mariño-Ramírez L, Landsman D. Differences in local genomic context of bound and unbound motifs. Gene 2012; 506:125-34. [PMID: 22692006 PMCID: PMC3412921 DOI: 10.1016/j.gene.2012.06.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 06/04/2012] [Indexed: 11/25/2022]
Abstract
Understanding gene regulation is a major objective in molecular biology research. Frequently, transcription is driven by transcription factors (TFs) that bind to specific DNA sequences. These motifs are usually short and degenerate, rendering the likelihood of multiple copies occurring throughout the genome due to random chance as high. Despite this, TFs only bind to a small subset of sites, thus prompting our investigation into the differences between motifs that are bound by TFs and those that remain unbound. Here we constructed vectors representing various chromatin- and sequence-based features for a published set of bound and unbound motifs representing nine TFs in the budding yeast Saccharomyces cerevisiae. Using a machine learning approach, we identified a set of features that can be used to discriminate between bound and unbound motifs. We also discovered that some TFs bind most or all of their strong motifs in intergenic regions. Our data demonstrate that local sequence context can be strikingly different around motifs that are bound compared to motifs that are unbound. We concluded that there are multiple combinations of genomic features that characterize bound or unbound motifs.
Collapse
Affiliation(s)
- Loren Hansen
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8900 Rockville Pike, Bethesda, MD 20894
- Bioinformatics Program, Boston University, Boston, MA 02215, USA
| | - Leonardo Mariño-Ramírez
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8900 Rockville Pike, Bethesda, MD 20894
- PanAmerican Bioinformatics Institute, Santa Marta, Magdalena, Colombia
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8900 Rockville Pike, Bethesda, MD 20894
| |
Collapse
|