301
|
O'Connor TR, Bailey TL. Creating and validating cis-regulatory maps of tissue-specific gene expression regulation. Nucleic Acids Res 2014; 42:11000-10. [PMID: 25200088 PMCID: PMC4176179 DOI: 10.1093/nar/gku801] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Predicting which genomic regions control the transcription of a given gene is a challenge. We present a novel computational approach for creating and validating maps that associate genomic regions (cis-regulatory modules–CRMs) with genes. The method infers regulatory relationships that explain gene expression observed in a test tissue using widely available genomic data for ‘other’ tissues. To predict the regulatory targets of a CRM, we use cross-tissue correlation between histone modifications present at the CRM and expression at genes within 1 Mbp of it. To validate cis-regulatory maps, we show that they yield more accurate models of gene expression than carefully constructed control maps. These gene expression models predict observed gene expression from transcription factor binding in the CRMs linked to that gene. We show that our maps are able to identify long-range regulatory interactions and improve substantially over maps linking genes and CRMs based on either the control maps or a ‘nearest neighbor’ heuristic. Our results also show that it is essential to include CRMs predicted in multiple tissues during map-building, that H3K27ac is the most informative histone modification, and that CAGE is the most informative measure of gene expression for creating cis-regulatory maps.
Collapse
Affiliation(s)
- Timothy R O'Connor
- Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia
| | - Timothy L Bailey
- Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia
| |
Collapse
|
302
|
Lesluyes T, Johnson J, Machanick P, Bailey TL. Differential motif enrichment analysis of paired ChIP-seq experiments. BMC Genomics 2014; 15:752. [PMID: 25179504 PMCID: PMC4167127 DOI: 10.1186/1471-2164-15-752] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 08/22/2014] [Indexed: 02/06/2023] Open
Abstract
Background Motif enrichment analysis of transcription factor ChIP-seq data can help identify transcription factors that cooperate or compete. Previously, little attention has been given to comparative motif enrichment analysis of pairs of ChIP-seq experiments, where the binding of the same transcription factor is assayed under different conditions. Such comparative analysis could potentially identify the distinct regulatory partners/competitors of the assayed transcription factor under different conditions or at different stages of development. Results We describe a new methodology for identifying sequence motifs that are differentially enriched in one set of DNA or RNA sequences relative to another set, and apply it to paired ChIP-seq experiments. We show that, using paired ChIP-seq data for a single transcription factor, differential motif enrichment analysis identifies all the known key transcription factors involved in the transformation of non-cancerous immortalized breast cells (MCF10A-ER-Src cells) into cancer stem cells whereas non-differential motif enrichment analysis does not. We also show that differential motif enrichment analysis identifies regulatory motifs that are significantly enriched at constrained locations within the bound promoters, and that these motifs are not identified by non-differential motif enrichment analysis. Our methodology differs from other approaches in that it leverages both comparative enrichment and positional enrichment of motifs in ChIP-seq peak regions or in the promoters of genes bound by the transcription factor. Conclusions We show that differential motif enrichment analysis of paired ChIP-seq experiments offers biological insights not available from non-differential analysis. In contrast to previous approaches, our method detects motifs that are enriched in a constrained region in one set of sequences, but not enriched in the same region in the comparative set. We have enhanced the web-based CentriMo algorithm to allow it to perform the constrained differential motif enrichment analysis described in this paper, and CentriMo’s on-line interface (http://meme.ebi.edu.au) provides dozens of databases of DNA- and RNA-binding motifs from a full range of organisms. All data and output files presented here are available at http://research.imb.uq.edu.au/t.bailey/supplementary_data/Lesluyes2014. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-752) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Timothy L Bailey
- Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, 4072 Brisbane, Australia.
| |
Collapse
|
303
|
Kaul A, Schuster E, Jennings BH. The Groucho co-repressor is primarily recruited to local target sites in active chromatin to attenuate transcription. PLoS Genet 2014; 10:e1004595. [PMID: 25165826 PMCID: PMC4148212 DOI: 10.1371/journal.pgen.1004595] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 07/03/2014] [Indexed: 12/25/2022] Open
Abstract
Gene expression is regulated by the complex interaction between transcriptional activators and repressors, which function in part by recruiting histone-modifying enzymes to control accessibility of DNA to RNA polymerase. The evolutionarily conserved family of Groucho/Transducin-Like Enhancer of split (Gro/TLE) proteins act as co-repressors for numerous transcription factors. Gro/TLE proteins act in several key pathways during development (including Notch and Wnt signaling), and are implicated in the pathogenesis of several human cancers. Gro/TLE proteins form oligomers and it has been proposed that their ability to exert long-range repression on target genes involves oligomerization over broad regions of chromatin. However, analysis of an endogenous gro mutation in Drosophila revealed that oligomerization of Gro is not always obligatory for repression in vivo. We have used chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) to profile Gro recruitment in two Drosophila cell lines. We find that Gro predominantly binds at discrete peaks (<1 kilobase). We also demonstrate that blocking Gro oligomerization does not reduce peak width as would be expected if Gro oligomerization induced spreading along the chromatin from the site of recruitment. Gro recruitment is enriched in “active” chromatin containing developmentally regulated genes. However, Gro binding is associated with local regions containing hypoacetylated histones H3 and H4, which is indicative of chromatin that is not fully open for efficient transcription. We also find that peaks of Gro binding frequently overlap the transcription start sites of expressed genes that exhibit strong RNA polymerase pausing and that depletion of Gro leads to release of polymerase pausing and increased transcription at a bona fide target gene. Our results demonstrate that Gro is recruited to local sites by transcription factors to attenuate rather than silence gene expression by promoting histone deacetylation and polymerase pausing. Repression by transcription factors plays a central role in gene regulation. The Groucho/Transducin-Like Enhancer of split (Gro/TLE) family of co-repressors interacts with many different transcription factors and has many essential roles during animal development. Groucho/TLE proteins form oligomers that are necessary for target gene repression in some contexts. We have profiled the genome-wide recruitment of the founding member of this family, Groucho (from Drosophila) to gain insight into how and where it binds with respect to target genes and to identify factors associated with its binding. We find that Groucho binds in discrete peaks, frequently at transcription start sites, and that blocking Groucho from forming oligomers does not significantly change the pattern of Groucho recruitment. Although Groucho acts as a repressor, Groucho binding is enriched in chromatin that is permissive for transcription, and we find that it acts to attenuate rather than completely silence target gene expression. Thus, Groucho does not act as an “on/off” switch on target gene expression, but rather as a “mute” button.
Collapse
Affiliation(s)
- Aamna Kaul
- UCL Cancer Institute, University College London, London, United Kingdom
| | - Eugene Schuster
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Barbara H. Jennings
- UCL Cancer Institute, University College London, London, United Kingdom
- * E-mail:
| |
Collapse
|
304
|
Schiller BJ, Chodankar R, Watson LC, Stallcup MR, Yamamoto KR. Glucocorticoid receptor binds half sites as a monomer and regulates specific target genes. Genome Biol 2014; 15:418. [PMID: 25085117 PMCID: PMC4149261 DOI: 10.1186/s13059-014-0418-y] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 07/17/2014] [Indexed: 11/10/2022] Open
Abstract
Background Glucocorticoid receptor (GR) is a hormone-activated, DNA-binding transcriptional regulatory factor that controls inflammation, metabolism, stress responses, and other physiological processes. In vitro, GR binds as an inverted dimer to a motif consisting of two imperfectly palindromic 6 bp half sites separated by 3 bp spacers. In vivo, GR employs different patterns of functional surfaces of GR to regulate different target genes. The relationships between GR genomic binding and functional surface utilization have not been defined. Results We find that A477T, a GR mutant that disrupts the dimerization interface, differs from wild-type GRα in binding and regulation of target genes. Genomic regions strongly occupied by A477T are enriched for a novel half site motif. In vitro, GRα binds half sites as a monomer. Through the overlap between GRα- and A477T-bound regions, we identify GRα-bound regions containing only half sites. We further identify GR target genes linked with half sites and not with the full motif. Conclusions Genomic regions bound by GR differ in underlying DNA sequence motifs and in the GR functional surfaces employed for regulation. Identification of GR binding regions that selectively utilize particular GR surfaces may discriminate sub-motifs, including the half site motif, that favor those surfaces. This approach may contribute to predictive models for GR activity and therapy. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0418-y) contains supplementary material, which is available to authorized users.
Collapse
|
305
|
Worsley Hunt R, Wasserman WW. Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets. Genome Biol 2014; 15:412. [PMID: 25070602 PMCID: PMC4165360 DOI: 10.1186/s13059-014-0412-4] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 07/29/2014] [Indexed: 12/15/2022] Open
Abstract
Background The global effort to annotate the non-coding portion of the human genome relies heavily on chromatin immunoprecipitation data generated with high-throughput DNA sequencing (ChIP-seq). ChIP-seq is generally successful in detailing the segments of the genome bound by the immunoprecipitated transcription factor (TF), however almost all datasets contain genomic regions devoid of the canonical motif for the TF. It remains to be determined if these regions are related to the immunoprecipitated TF or whether, despite the use of controls, there is a portion of peaks that can be attributed to other causes. Results Analyses across hundreds of ChIP-seq datasets generated for sequence-specific DNA binding TFs reveal a small set of TF binding profiles for which predicted TF binding site motifs are repeatedly observed to be significantly enriched. Grouping related binding profiles, the set includes: CTCF-like, ETS-like, JUN-like, and THAP11 profiles. These frequently enriched profiles are termed ‘zingers’ to highlight their unanticipated enrichment in datasets for which they were not the targeted TF, and their potential impact on the interpretation and analysis of TF ChIP-seq data. Peaks with zinger motifs and lacking the ChIPped TF’s motif are observed to compose up to 45% of a ChIP-seq dataset. There is substantial overlap of zinger motif containing regions between diverse TF datasets, suggesting a mechanism that is not TF-specific for the recovery of these regions. Conclusions Based on the zinger regions proximity to cohesin-bound segments, a loading station model is proposed. Further study of zingers will advance understanding of gene regulation. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0412-4) contains supplementary material, which is available to authorized users.
Collapse
|
306
|
Van Bortle K, Nichols MH, Li L, Ong CT, Takenaka N, Qin ZS, Corces VG. Insulator function and topological domain border strength scale with architectural protein occupancy. Genome Biol 2014; 15:R82. [PMID: 24981874 PMCID: PMC4226948 DOI: 10.1186/gb-2014-15-5-r82] [Citation(s) in RCA: 219] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Accepted: 06/30/2014] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Chromosome conformation capture studies suggest that eukaryotic genomes are organized into structures called topologically associating domains. The borders of these domains are highly enriched for architectural proteins with characterized roles in insulator function. However, a majority of architectural protein binding sites localize within topological domains, suggesting sites associated with domain borders represent a functionally different subclass of these regulatory elements. How topologically associating domains are established and what differentiates border-associated from non-border architectural protein binding sites remain unanswered questions. RESULTS By mapping the genome-wide target sites for several Drosophila architectural proteins, including previously uncharacterized profiles for TFIIIC and SMC-containing condensin complexes, we uncover an extensive pattern of colocalization in which architectural proteins establish dense clusters at the borders of topological domains. Reporter-based enhancer-blocking insulator activity as well as endogenous domain border strength scale with the occupancy level of architectural protein binding sites, suggesting co-binding by architectural proteins underlies the functional potential of these loci. Analyses in mouse and human stem cells suggest that clustering of architectural proteins is a general feature of genome organization, and conserved architectural protein binding sites may underlie the tissue-invariant nature of topologically associating domains observed in mammals. CONCLUSIONS We identify a spectrum of architectural protein occupancy that scales with the topological structure of chromosomes and the regulatory potential of these elements. Whereas high occupancy architectural protein binding sites associate with robust partitioning of topologically associating domains and robust insulator function, low occupancy sites appear reserved for gene-specific regulation within topological domains.
Collapse
Affiliation(s)
- Kevin Van Bortle
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
| | - Michael H Nichols
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
| | - Li Li
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Chin-Tong Ong
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
| | - Naomi Takenaka
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Victor G Corces
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
| |
Collapse
|
307
|
Chen TW, Li HP, Lee CC, Gan RC, Huang PJ, Wu TH, Lee CY, Chang YF, Tang P. ChIPseek, a web-based analysis tool for ChIP data. BMC Genomics 2014; 15:539. [PMID: 24974934 PMCID: PMC4092222 DOI: 10.1186/1471-2164-15-539] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 06/20/2014] [Indexed: 02/08/2023] Open
Abstract
Background Chromatin is a dynamic but highly regulated structure. DNA-binding proteins such as transcription factors, epigenetic and chromatin modifiers are responsible for regulating specific gene expression pattern and may result in different phenotypes. To reveal the identity of the proteins associated with the specific region on DNA, chromatin immunoprecipitation (ChIP) is the most widely used technique. ChIP assay followed by next generation sequencing (ChIP-seq) or microarray (ChIP-chip) is often used to study patterns of protein-binding profiles in different cell types and in cancer samples on a genome-wide scale. However, only a limited number of bioinformatics tools are available for ChIP datasets analysis. Results We present ChIPseek, a web-based tool for ChIP data analysis providing summary statistics in graphs and offering several commonly demanded analyses. ChIPseek can provide statistical summary of the dataset including histogram of peak length distribution, histogram of distances to the nearest transcription start site (TSS), and pie chart (or bar chart) of genomic locations for users to have a comprehensive view on the dataset for further analysis. For examining the potential functions of peaks, ChIPseek provides peak annotation, visualization of peak genomic location, motif identification, sequence extraction, and comparison between datasets. Beyond that, ChIPseek also offers users the flexibility to filter peaks and re-analyze the filtered subset of peaks. ChIPseek supports 20 different genome assemblies for 12 model organisms including human, mouse, rat, worm, fly, frog, zebrafish, chicken, yeast, fission yeast, Arabidopsis, and rice. We use demo datasets to demonstrate the usage and intuitive user interface of ChIPseek. Conclusions ChIPseek provides a user-friendly interface for biologists to analyze large-scale ChIP data without requiring any programing skills. All the results and figures produced by ChIPseek can be downloaded for further analysis. The analysis tools built into ChIPseek, especially the ones for selecting and examine a subset of peaks from ChIP data, provides invaluable helps for exploring the high through-put data from either ChIP-seq or ChIP-chip. ChIPseek is freely available at http://chipseek.cgu.edu.tw.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Petrus Tang
- Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| |
Collapse
|
308
|
Meng J, Lu Z, Liu H, Zhang L, Zhang S, Chen Y, Rao MK, Huang Y. A protocol for RNA methylation differential analysis with MeRIP-Seq data and exomePeak R/Bioconductor package. Methods 2014; 69:274-81. [PMID: 24979058 DOI: 10.1016/j.ymeth.2014.06.008] [Citation(s) in RCA: 224] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Revised: 06/14/2014] [Accepted: 06/19/2014] [Indexed: 01/08/2023] Open
Abstract
Despite the prevalent studies of DNA/Chromatin related epigenetics, such as, histone modifications and DNA methylation, RNA epigenetics has not drawn deserved attention until a new affinity-based sequencing approach MeRIP-Seq was developed and applied to survey the global mRNA N6-methyladenosine (m(6)A) in mammalian cells. As a marriage of ChIP-Seq and RNA-Seq, MeRIP-Seq has the potential to study the transcriptome-wide distribution of various post-transcriptional RNA modifications. We have previously developed an R/Bioconductor package 'exomePeak' for detecting RNA methylation sites under a specific experimental condition or the identifying the differential RNA methylation sites in a case control study from MeRIP-Seq data. Compared with other relatively well studied data types such as ChIP-Seq and RNA-Seq, the study of MeRIP-Seq data is still at very early stage, and existing protocols are not optimized for dealing with the intrinsic characteristic of MeRIP-Seq data. We therein provide here a detailed and easy-to-use protocol of using exomePeak R/Bioconductor package along with other software programs for analysis of MeRIP-Seq data, which covers raw reads alignment, RNA methylation site detection, motif discovery, differential RNA methylation analysis, and functional analysis. Particularly, the rationales behind each processing step as well as the specific method used, the best practice, and possible alternative strategies are briefly discussed. The exomePeak R/Bioconductor package is freely available from Bioconductor: http://www.bioconductor.org/packages/release/bioc/html/exomePeak.html.
Collapse
Affiliation(s)
- Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China.
| | - Zhiliang Lu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Hui Liu
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Lin Zhang
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Shaowu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yidong Chen
- Department of Cellular Structural Biology, University of Texas Health Science Center at San Antonio, TX 78229, USA; Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, TX 78229, USA
| | - Manjeet K Rao
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, TX 78229, USA; Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, TX 78229, USA
| | - Yufei Huang
- Department of Electrical and Computer Engineering, University of Texas at San Antonio, TX 78249, USA; Department of Cellular Structural Biology, University of Texas Health Science Center at San Antonio, TX 78229, USA.
| |
Collapse
|
309
|
Worsley Hunt R, Mathelier A, Del Peso L, Wasserman WW. Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment. BMC Genomics 2014; 15:472. [PMID: 24927817 PMCID: PMC4082612 DOI: 10.1186/1471-2164-15-472] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 05/20/2014] [Indexed: 11/10/2022] Open
Abstract
Background Chromatin immunoprecipitation (ChIP) coupled to high-throughput sequencing (ChIP-Seq) techniques can reveal DNA regions bound by transcription factors (TF). Analysis of the ChIP-Seq regions is now a central component in gene regulation studies. The need remains strong for methods to improve the interpretation of ChIP-Seq data and the study of specific TF binding sites (TFBS). Results We introduce a set of methods to improve the interpretation of ChIP-Seq data, including the inference of mediating TFs based on TFBS motif over-representation analysis and the subsequent study of spatial distribution of TFBSs. TFBS over-representation analysis applied to ChIP-Seq data is used to detect which TFBSs arise more frequently than expected by chance. Visualization of over-representation analysis results with new composition-bias plots reveals systematic bias in over-representation scores. We introduce the BiasAway background generating software to resolve the problem. A heuristic procedure based on topological motif enrichment relative to the ChIP-Seq peaks’ local maximums highlights peaks likely to be directly bound by a TF of interest. The results suggest that on average two-thirds of a ChIP-Seq dataset’s peaks are bound by the ChIP’d TF; the origin of the remaining peaks remaining undetermined. Additional visualization methods allow for the study of both inter-TFBS spatial relationships and motif-flanking sequence properties, as demonstrated in case studies for TBP and ZNF143/THAP11. Conclusions Topological properties of TFBS within ChIP-Seq datasets can be harnessed to better interpret regulatory sequences. Using GC content corrected TFBS over-representation analysis, combined with visualization techniques and analysis of the topological distribution of TFBS, we can distinguish peaks likely to be directly bound by a TF. The new methods will empower researchers for exploration of gene regulation and TF binding. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-472) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
310
|
Abstract
MEME-ChIP is a web-based tool for analyzing motifs in large DNA or RNA data sets. It can analyze peak regions identified by ChIP-seq, cross-linking sites identified by CLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP performs two complementary types of de novo motif discovery: weight matrix-based discovery for high accuracy; and word-based discovery for high sensitivity. Motif enrichment analysis using DNA or RNA motifs from human, mouse, worm, fly and other model organisms provides even greater sensitivity. MEME-ChIP's interactive HTML output groups and aligns significant motifs to ease interpretation. This protocol takes less than 3 h, and it provides motif discovery approaches that are distinct and complementary to other online methods.
Collapse
Affiliation(s)
- Wenxiu Ma
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - William S Noble
- 1] Department of Genome Sciences, University of Washington, Seattle, Washington, USA. [2] Department of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
| | - Timothy L Bailey
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
311
|
Site-specific association with host and viral chromatin by Kaposi's sarcoma-associated herpesvirus LANA and its reversal during lytic reactivation. J Virol 2014; 88:6762-77. [PMID: 24696474 DOI: 10.1128/jvi.00268-14] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
UNLABELLED Latency-associated nuclear antigen (LANA), a multifunctional protein expressed by the Kaposi sarcoma-associated herpesvirus (KSHV) in latently infected cells, is required for stable maintenance of the viral episome. This is mediated by two interactions: LANA binds to specific sequences (LBS1 and LBS2) on viral DNA and also engages host histones, tethering the viral genome to host chromosomes in mitosis. LANA has also been suggested to affect host gene expression, but both the mechanism(s) and role of this dysregulation in KSHV biology remain unclear. Here, we have examined LANA interactions with host chromatin on a genome-wide scale using chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) and show that LANA predominantly targets human genes near their transcriptional start sites (TSSs). These host LANA-binding sites are generally found within transcriptionally active promoters and display striking overrepresentation of a consensus DNA sequence virtually identical to the LANA-binding site 1 (LBS1) motif in KSHV DNA. Comparison of the ChIP-seq profile with whole-transcriptome (high-throughput sequencing of RNA transcripts [RNA-seq]) data reveals that few of the genes that are differentially regulated in latent infection are occupied by LANA at their promoters. This suggests that direct LANA binding to promoters is not the prime determinant of altered host transcription in KSHV-infected cells. Most surprisingly, the association of LANA to both host and viral DNA is strongly disrupted during the lytic cycle of KSHV. This disruption can be prevented by the inhibition of viral DNA synthesis, suggesting the existence of novel and potent regulatory mechanisms linked to either viral DNA replication or late gene expression. IMPORTANCE Here, we employ complementary genome-wide analyses to evaluate the distribution of the highly abundant latency-associated nuclear antigen, LANA, on the host genome and its impact on host gene expression during KSHV latent infection. Combined, ChIP-seq and RNA-seq reveal that LANA accumulates at active gene promoters that harbor specific short DNA sequences that are highly reminiscent of its cognate binding sites in the virus genome. Unexpectedly, we found that such association does not lead to remodeling of global host transcription during latency. We also report for the first time that LANA's ability to bind host and viral chromatin is highly dynamic and is disrupted in cells undergoing an extensive lytic reactivation. This therefore suggests that the association of LANA to chromatin during a productive infection cycle is controlled by a new regulatory mechanism.
Collapse
|
312
|
Madhamshettiwar PB, Maetschke SR, Davis MJ, Reverter A, Ragan MA. INsPeCT: INtegrative Platform for Cancer Transcriptomics. Cancer Inform 2014; 13:59-66. [PMID: 24653643 PMCID: PMC3956744 DOI: 10.4137/cin.s13630] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2013] [Revised: 01/08/2014] [Accepted: 01/08/2014] [Indexed: 01/21/2023] Open
Abstract
The emergence of transcriptomics, fuelled by high-throughput sequencing technologies, has changed the nature of cancer research and resulted in a massive accumulation of data. Computational analysis, integration, and data visualization are now major bottlenecks in cancer biology and translational research. Although many tools have been brought to bear on these problems, their use remains unnecessarily restricted to computational biologists, as many tools require scripting skills, data infrastructure, and powerful computational facilities. New user-friendly, integrative, and automated analytical approaches are required to make computational methods more generally useful to the research community. Here we present INsPeCT (INtegrative Platform for Cancer Transcriptomics), which allows users with basic computer skills to perform comprehensive in-silico analyses of microarray, ChIP-seq, and RNA-seq data. INsPeCT supports the selection of interesting genes for advanced functional analysis. Included in its automated workflows are (i) a novel analytical framework, RMaNI (regulatory module network inference), which supports the inference of cancer subtype-specific transcriptional module networks and the analysis of modules; and (ii) WGCNA (weighted gene co-expression network analysis), which infers modules of highly correlated genes across microarray samples, associated with sample traits, eg survival time. INsPeCT is available free of cost from Bioinformatics Resource Australia-EMBL and can be accessed at http://inspect.braembl.org.au.
Collapse
Affiliation(s)
- Piyush B Madhamshettiwar
- The University of Queensland, Institute for Molecular Bioscience, St. Lucia, Brisbane, Queensland, Australia. ; Australian Research Council Centre of Excellence in Bioinformatics, St. Lucia, Brisbane, Queensland, Australia
| | - Stefan R Maetschke
- The University of Queensland, Institute for Molecular Bioscience, St. Lucia, Brisbane, Queensland, Australia. ; Australian Research Council Centre of Excellence in Bioinformatics, St. Lucia, Brisbane, Queensland, Australia
| | - Melissa J Davis
- The University of Queensland, Institute for Molecular Bioscience, St. Lucia, Brisbane, Queensland, Australia. ; Australian Research Council Centre of Excellence in Bioinformatics, St. Lucia, Brisbane, Queensland, Australia
| | - Antonio Reverter
- CSIRO Animal, Food and Health Sciences, St. Lucia, Brisbane, Queensland, Australia
| | - Mark A Ragan
- The University of Queensland, Institute for Molecular Bioscience, St. Lucia, Brisbane, Queensland, Australia. ; Australian Research Council Centre of Excellence in Bioinformatics, St. Lucia, Brisbane, Queensland, Australia
| |
Collapse
|
313
|
Tran NTL, Huang CH. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data. Biol Direct 2014; 9:4. [PMID: 24555784 PMCID: PMC4022013 DOI: 10.1186/1745-6150-9-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Revised: 01/08/2014] [Accepted: 02/11/2014] [Indexed: 12/24/2022] Open
Abstract
Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong).
Collapse
Affiliation(s)
- Ngoc Tam L Tran
- Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, USA.
| | | |
Collapse
|
314
|
Orenstein Y, Shamir R. A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data. Nucleic Acids Res 2014; 42:e63. [PMID: 24500199 PMCID: PMC4005680 DOI: 10.1093/nar/gku117] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Understanding gene regulation is a key challenge in today's biology. The new technologies of protein-binding microarrays (PBMs) and high-throughput SELEX (HT-SELEX) allow measurement of the binding intensities of one transcription factor (TF) to numerous synthetic double-stranded DNA sequences in a single experiment. Recently, Jolma et al. reported the results of 547 HT-SELEX experiments covering human and mouse TFs. Because 162 of these TFs were also covered by PBM technology, for the first time, a large-scale comparison between implementations of these two in vitro technologies is possible. Here we assessed the similarities and differences between binding models, represented as position weight matrices, inferred from PBM and HT-SELEX, and also measured how well these models predict in vivo binding. Our results show that HT-SELEX- and PBM-derived models agree for most TFs. For some TFs, the HT-SELEX-derived models are longer versions of the PBM-derived models, whereas for other TFs, the HT-SELEX models match the secondary PBM-derived models. Remarkably, PBM-based 8-mer ranking is more accurate than that of HT-SELEX, but models derived from HT-SELEX predict in vivo binding better. In addition, we reveal several biases in HT-SELEX data including nucleotide frequency bias, enrichment of C-rich k-mers and oligos and underrepresentation of palindromes.
Collapse
Affiliation(s)
- Yaron Orenstein
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 69978, Israel
| | | |
Collapse
|
315
|
Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen CY, Chou A, Ienasescu H, Lim J, Shyr C, Tan G, Zhou M, Lenhard B, Sandelin A, Wasserman WW. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 2014; 42:D142-7. [PMID: 24194598 PMCID: PMC3965086 DOI: 10.1093/nar/gkt997] [Citation(s) in RCA: 786] [Impact Index Per Article: 78.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 10/03/2013] [Indexed: 11/14/2022] Open
Abstract
JASPAR (http://jaspar.genereg.net) is the largest open-access database of matrix-based nucleotide profiles describing the binding preference of transcription factors from multiple species. The fifth major release greatly expands the heart of JASPAR-the JASPAR CORE subcollection, which contains curated, non-redundant profiles-with 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. In addition, the web interface has been enhanced with advanced capabilities in browsing, searching and subsetting. Finally, the new JASPAR release is accompanied by a new BioPython package, a new R tool package and a new R/Bioconductor data package to facilitate access for both manual and automated methods.
Collapse
Affiliation(s)
- Anthony Mathelier
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Xiaobei Zhao
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Allen W. Zhang
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - François Parcy
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Rebecca Worsley-Hunt
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - David J. Arenillas
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Sorana Buchman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Chih-yu Chen
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Alice Chou
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Hans Ienasescu
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Jonathan Lim
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Casper Shyr
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Ge Tan
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Michelle Zhou
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Boris Lenhard
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Albin Sandelin
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| | - Wyeth W. Wasserman
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, University of British Columbia, Vancouver, BC, Canada, Department of Biology and Biotech Research and Innovation Centre, The Bioinformatics Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA, Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France, Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK, and Department of Informatics, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway
| |
Collapse
|
316
|
Stielow C, Stielow B, Finkernagel F, Scharfe M, Jarek M, Suske G. SUMOylation of the polycomb group protein L3MBTL2 facilitates repression of its target genes. Nucleic Acids Res 2013; 42:3044-58. [PMID: 24369422 PMCID: PMC3950706 DOI: 10.1093/nar/gkt1317] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Lethal(3) malignant brain tumour like 2 (L3MBTL2) is an integral component of the polycomb repressive complex 1.6 (PRC1.6) and has been implicated in transcriptional repression and chromatin compaction. Here, we show that L3MBTL2 is modified by SUMO2/3 at lysine residues 675 and 700 close to the C-terminus. SUMOylation of L3MBTL2 neither affected its repressive activity in reporter gene assays nor it’s binding to histone tails in vitro. In order to analyse whether SUMOylation affects binding of L3MBTL2 to chromatin, we performed ChIP-Seq analysis with chromatin of wild-type HEK293 cells and with chromatin of HEK293 cells stably expressing either FLAG-tagged SUMOylation-competent or SUMOylation-defective L3MBTL2. Wild-type FLAG-L3MBTL2 and the SUMOylation-defective FLAG-L3MBTL2 K675/700R mutant essentially occupied the same sites as endogenous L3MBTL2 suggesting that SUMOylation of L3MBTL2 does not affect chromatin binding. However, a subset of L3MBTL2-target genes, particularly those with low L3MBTL2 occupancy including pro-inflammatory genes, was de-repressed in cells expressing the FLAG-L3MBTL2 K675/700R mutant. Finally, we provide evidence that SUMOylation of L3MBTL2 facilitates repression of these PRC1.6-target genes by balancing the local H2Aub1 levels established by the ubiquitinating enzyme RING2 and the de-ubiquitinating PR–DUB complex.
Collapse
Affiliation(s)
- Christina Stielow
- Institute of Molecular Biology and Tumor Research, Philipps-University, Emil-Mannkopff-Str. 2, D-35032 Marburg and Helmholtz Centre for Infection Research (HZI), Inhoffenstraße 7, D-38124 Braunschweig, Germany
| | - Bastian Stielow
- Institute of Molecular Biology and Tumor Research, Philipps-University, Emil-Mannkopff-Str. 2, D-35032 Marburg and Helmholtz Centre for Infection Research (HZI), Inhoffenstraße 7, D-38124 Braunschweig, Germany
| | - Florian Finkernagel
- Institute of Molecular Biology and Tumor Research, Philipps-University, Emil-Mannkopff-Str. 2, D-35032 Marburg and Helmholtz Centre for Infection Research (HZI), Inhoffenstraße 7, D-38124 Braunschweig, Germany
| | - Maren Scharfe
- Institute of Molecular Biology and Tumor Research, Philipps-University, Emil-Mannkopff-Str. 2, D-35032 Marburg and Helmholtz Centre for Infection Research (HZI), Inhoffenstraße 7, D-38124 Braunschweig, Germany
| | - Michael Jarek
- Institute of Molecular Biology and Tumor Research, Philipps-University, Emil-Mannkopff-Str. 2, D-35032 Marburg and Helmholtz Centre for Infection Research (HZI), Inhoffenstraße 7, D-38124 Braunschweig, Germany
| | - Guntram Suske
- Institute of Molecular Biology and Tumor Research, Philipps-University, Emil-Mannkopff-Str. 2, D-35032 Marburg and Helmholtz Centre for Infection Research (HZI), Inhoffenstraße 7, D-38124 Braunschweig, Germany
- *To whom correspondence should be addressed. Tel: +49 6421 2866697; Fax +49 6421 2865959;
| |
Collapse
|
317
|
Abstract
MOTIVATION Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. RESULTS We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. AVAILABILITY AND IMPLEMENTATION DiMO is available at http://stormo.wustl.edu/DiMO
Collapse
Affiliation(s)
- Ronak Y Patel
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | |
Collapse
|
318
|
Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, Madrigal P, Taslim C, Zhang J. Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 2013; 9:e1003326. [PMID: 24244136 PMCID: PMC3828144 DOI: 10.1371/journal.pcbi.1003326] [Citation(s) in RCA: 164] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.
Collapse
Affiliation(s)
- Timothy Bailey
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
- * E-mail: (TB); (PM)
| | - Pawel Krajewski
- Department of Biometry and Bioinformatics, Institute of Plant Genetics, Polish Academy of Sciences, Poznań, Poland
| | - Istvan Ladunga
- Department of Statistics, Beadle Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Celine Lefebvre
- Inserm U981, Cancer Institute Gustave Roussy, Villejuif, France
| | - Qunhua Li
- Department of Statistics, Penn State University, University Park, Pennsylvania, United States of America
| | - Tao Liu
- Department of Biochemistry, University at Buffalo, Buffalo, New York, United States of America
| | - Pedro Madrigal
- Department of Biometry and Bioinformatics, Institute of Plant Genetics, Polish Academy of Sciences, Poznań, Poland
- * E-mail: (TB); (PM)
| | - Cenny Taslim
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Jie Zhang
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
319
|
Aldridge S, Watt S, Quail MA, Rayner T, Lukk M, Bimson MF, Gaffney D, Odom DT. AHT-ChIP-seq: a completely automated robotic protocol for high-throughput chromatin immunoprecipitation. Genome Biol 2013; 14:R124. [PMID: 24200198 PMCID: PMC4053851 DOI: 10.1186/gb-2013-14-11-r124] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2013] [Accepted: 11/07/2013] [Indexed: 12/22/2022] Open
Abstract
ChIP-seq is an established manually-performed method for identifying DNA-protein interactions genome-wide. Here, we describe a protocol for automated high-throughput (AHT) ChIP-seq. To demonstrate the quality of data obtained using AHT-ChIP-seq, we applied it to five proteins in mouse livers using a single 96-well plate, demonstrating an extremely high degree of qualitative and quantitative reproducibility among biological and technical replicates. We estimated the optimum and minimum recommended cell numbers required to perform AHT-ChIP-seq by running an additional plate using HepG2 and MCF7 cells. With this protocol, commercially available robotics can perform four hundred experiments in five days.
Collapse
|
320
|
Yao Z, Macquarrie KL, Fong AP, Tapscott SJ, Ruzzo WL, Gentleman RC. Discriminative motif analysis of high-throughput dataset. ACTA ACUST UNITED AC 2013; 30:775-83. [PMID: 24162561 DOI: 10.1093/bioinformatics/btt615] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. RESULTS We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data. AVAILABILITY The motifRG package is publically available via the bioconductor repository. CONTACT yzizhen@fhcrc.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zizhen Yao
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Molecular and Cellular Biology Program, University of Washington, Seattle, Washington, 98105, USA, Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Department of Pediatrics, School of Medicine, Department of Neurology, School of Medicine, University of Washington, Seattle, Washington, 98105, USA, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Department of Computer Science and Engineering, Department of Genome Sciences, University of Washington, Seattle, Washington, 98105, USA and Bioinformatics and Computational Biology, Genentech, South San Francisco, CA 94080, USA
| | | | | | | | | | | |
Collapse
|
321
|
Slattery M, Voutev R, Ma L, Nègre N, White KP, Mann RS. Divergent transcriptional regulatory logic at the intersection of tissue growth and developmental patterning. PLoS Genet 2013; 9:e1003753. [PMID: 24039600 PMCID: PMC3764184 DOI: 10.1371/journal.pgen.1003753] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Accepted: 07/10/2013] [Indexed: 12/19/2022] Open
Abstract
The Yorkie/Yap transcriptional coactivator is a well-known regulator of cellular proliferation in both invertebrates and mammals. As a coactivator, Yorkie (Yki) lacks a DNA binding domain and must partner with sequence-specific DNA binding proteins in the nucleus to regulate gene expression; in Drosophila, the developmental regulators Scalloped (Sd) and Homothorax (Hth) are two such partners. To determine the range of target genes regulated by these three transcription factors, we performed genome-wide chromatin immunoprecipitation experiments for each factor in both the wing and eye-antenna imaginal discs. Strong, tissue-specific binding patterns are observed for Sd and Hth, while Yki binding is remarkably similar across both tissues. Binding events common to the eye and wing are also present for Sd and Hth; these are associated with genes regulating cell proliferation and “housekeeping” functions, and account for the majority of Yki binding. In contrast, tissue-specific binding events for Sd and Hth significantly overlap enhancers that are active in the given tissue, are enriched in Sd and Hth DNA binding sites, respectively, and are associated with genes that are consistent with each factor's previously established tissue-specific functions. Tissue-specific binding events are also significantly associated with Polycomb targeted chromatin domains. To provide mechanistic insights into tissue-specific regulation, we identify and characterize eye and wing enhancers of the Yki-targeted bantam microRNA gene and demonstrate that they are dependent on direct binding by Hth and Sd, respectively. Overall these results suggest that both Sd and Hth use distinct strategies – one shared between tissues and associated with Yki, the other tissue-specific, generally Yki-independent and associated with developmental patterning – to regulate distinct gene sets during development. The Hippo tumor suppressor pathway controls proliferation in a tissue-nonspecific fashion in Drosophila epithelial progenitor tissues via the transcriptional coactivator Yorkie (Yki). However, despite the tissue-nonspecific role that Yki plays in tissue growth, the transcription factors that recruit Yki to DNA, most notably Scalloped (Sd) and Homothorax (Hth), are important regulators of developmental patterning with many tissue-specific functions. Thus, these three transcriptional regulators – Yki, Sd, and Hth – provide a model for exploring the properties of protein-DNA interactions that regulate both tissue-shared and tissue-specific functions. With this goal in mind, we identified the positions in the fly genome that are bound by Yki, Sd, and Hth in the progenitors of the wing and eye-antenna structures of the fly. These data not only provide a global view of the Yki gene regulatory network, they reveal an unusual amount of tissue specificity in the genomic regions targeted by Sd and Hth, but not Yki. The data also reveal that tissue-specific binding is very likely to overlap tissue-specific enhancer regions, provide important clues for how tissue-specific Sd and Hth binding occurs, and support the idea that gene regulatory networks are plastic, with spatial differences in binding significantly impacting network structures.
Collapse
Affiliation(s)
- Matthew Slattery
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Institute for Genomics and Systems Biology and Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Roumen Voutev
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
| | - Lijia Ma
- Institute for Genomics and Systems Biology and Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Nicolas Nègre
- Institute for Genomics and Systems Biology and Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Université de Montpellier 2 and INRA, UMR1333 DGIMI, Montpellier, France
| | - Kevin P. White
- Institute for Genomics and Systems Biology and Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Richard S. Mann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
322
|
Handel AE, Sandve GK, Disanto G, Berlanga-Taylor AJ, Gallone G, Hanwell H, Drabløs F, Giovannoni G, Ebers GC, Ramagopalan SV. Vitamin D receptor ChIP-seq in primary CD4+ cells: relationship to serum 25-hydroxyvitamin D levels and autoimmune disease. BMC Med 2013; 11:163. [PMID: 23849224 PMCID: PMC3710212 DOI: 10.1186/1741-7015-11-163] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Accepted: 06/20/2013] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Vitamin D insufficiency has been implicated in autoimmunity. ChIP-seq experiments using immune cell lines have shown that vitamin D receptor (VDR) binding sites are enriched near regions of the genome associated with autoimmune diseases. We aimed to investigate VDR binding in primary CD4+ cells from healthy volunteers. METHODS We extracted CD4+ cells from nine healthy volunteers. Each sample underwent VDR ChIP-seq. Our results were analyzed in relation to published ChIP-seq and RNA-seq data in the Genomic HyperBrowser. We used MEMEChIP for de novo motif discovery. 25-Hydroxyvitamin D levels were measured using liquid chromatography-tandem mass spectrometry and samples were divided into vitamin D sufficient (25(OH)D ≥75 nmol/L) and insufficient/deficient (25(OH)D <75 nmol/L) groups. RESULTS We found that the amount of VDR binding is correlated with the serum level of 25-hydroxyvitamin D (r = 0.92, P= 0.0005). In vivo VDR binding sites are enriched for autoimmune disease associated loci, especially when 25-hydroxyvitamin D levels (25(OH)D) were sufficient (25(OH)D ≥75: 3.13-fold, P<0.0001; 25(OH)D <75: 2.76-fold, P<0.0001; 25(OH)D ≥75 enrichment versus 25(OH)D <75 enrichment: P= 0.0002). VDR binding was also enriched near genes associated specifically with T-regulatory and T-helper cells in the 25(OH)D ≥75 group. MEME ChIP did not identify any VDR-like motifs underlying our VDR ChIP-seq peaks. CONCLUSION Our results show a direct correlation between in vivo 25-hydroxyvitamin D levels and the number of VDR binding sites, although our sample size is relatively small. Our study further implicates VDR binding as important in gene-environment interactions underlying the development of autoimmunity and provides a biological rationale for 25-hydroxyvitamin D sufficiency being based at 75 nmol/L. Our results also suggest that VDR binding in response to physiological levels of vitamin D occurs predominantly in a VDR motif-independent manner.
Collapse
Affiliation(s)
- Adam E Handel
- Medical Research Council Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Parks Road, Oxford OX1 3PT, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
323
|
Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks. Proc Natl Acad Sci U S A 2013; 110:11952-7. [PMID: 23818646 DOI: 10.1073/pnas.1307449110] [Citation(s) in RCA: 154] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Transcription factors (TFs) recognize short sequence motifs that are present in millions of copies in large eukaryotic genomes. TFsmust distinguish their target binding sites from a vast genomic excess of spurious motif occurrences; however, it is unclear whether functional sites are distinguished from nonfunctional motifs by local primary sequence features or by the larger genomic context in which motifs reside. We used a massively parallel enhancer assay in living mouse retinas to compare 1,300 sequences bound in the genome by the photoreceptor transcription factor Cone-rod homeobox (Crx), to 3,000 control sequences. We found that very short sequences bound in the genome by Crx activated transcription at high levels, whereas unbound genomic regions with equal numbers of Crx motifs did not activate above background levels, even when liberated from their larger genomic context. High local GC content strongly distinguishes bound motifs from unbound motifs across the entire genome. Our results show that the cis-regulatory potential of TF-bound DNA is determined largely by highly local sequence features and not by genomic context.
Collapse
|
324
|
Oda M, Kumaki Y, Shigeta M, Jakt LM, Matsuoka C, Yamagiwa A, Niwa H, Okano M. DNA methylation restricts lineage-specific functions of transcription factor Gata4 during embryonic stem cell differentiation. PLoS Genet 2013; 9:e1003574. [PMID: 23825962 PMCID: PMC3694845 DOI: 10.1371/journal.pgen.1003574] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2012] [Accepted: 05/02/2013] [Indexed: 12/19/2022] Open
Abstract
DNA methylation changes dynamically during development and is essential for embryogenesis in mammals. However, how DNA methylation affects developmental gene expression and cell differentiation remains elusive. During embryogenesis, many key transcription factors are used repeatedly, triggering different outcomes depending on the cell type and developmental stage. Here, we report that DNA methylation modulates transcription-factor output in the context of cell differentiation. Using a drug-inducible Gata4 system and a mouse embryonic stem (ES) cell model of mesoderm differentiation, we examined the cellular response to Gata4 in ES and mesoderm cells. The activation of Gata4 in ES cells is known to drive their differentiation to endoderm. We show that the differentiation of wild-type ES cells into mesoderm blocks their Gata4-induced endoderm differentiation, while mesoderm cells derived from ES cells that are deficient in the DNA methyltransferases Dnmt3a and Dnmt3b can retain their response to Gata4, allowing lineage conversion from mesoderm cells to endoderm. Transcriptome analysis of the cells' response to Gata4 over time revealed groups of endoderm and mesoderm developmental genes whose expression was induced by Gata4 only when DNA methylation was lost, suggesting that DNA methylation restricts the ability of these genes to respond to Gata4, rather than controlling their transcription per se. Gata4-binding-site profiles and DNA methylation analyses suggested that DNA methylation modulates the Gata4 response through diverse mechanisms. Our data indicate that epigenetic regulation by DNA methylation functions as a heritable safeguard to prevent transcription factors from activating inappropriate downstream genes, thereby contributing to the restriction of the differentiation potential of somatic cells. Animal bodies are constructed from many different specialized cell types that are generated during embryogenesis from a single fertilized egg, and acquire their specific characteristics through a series of differentiation steps. After being committed to a specific cell type, it is generally difficult for differentiated cells to convert to other cell types, at least partly because the cells maintain some memory or mark of their developmental history. Such cellular memory is mediated by “epigenetic” mechanisms, which function to stabilize the cell state. DNA methylation, a chemical modification of genomic cytosine residues, is one such mechanism. Genomic DNA methylation patterns in early embryonic cells are established in a cell-type-dependent manner, and these specific patterns are propagated through cell divisions in a clonal manner. However, our understanding of how DNA methylation controls cell differentiation and developmental gene regulation is limited. In this study, using an in vitro model of differentiation, we obtained evidence that DNA methylation modulates the cell's response to DNA-binding transcription factors in a cell-type-dependent manner. These findings extend our understanding of how cellular traits are stabilized within specific lineages during development, and may contribute to advances in cellular engineering.
Collapse
Affiliation(s)
- Masaaki Oda
- Laboratory for Mammalian Epigenetic Studies, Center for Developmental Biology, RIKEN, Kobe, Japan
| | - Yuichi Kumaki
- Laboratory for Mammalian Epigenetic Studies, Center for Developmental Biology, RIKEN, Kobe, Japan
| | - Masaki Shigeta
- Laboratory for Pluripotent Cell Studies, Center for Developmental Biology, RIKEN, Kobe, Japan
| | - Lars Martin Jakt
- Laboratory for Stem Cell Biology, Center for Developmental Biology, RIKEN, Kobe, Japan
| | - Chisa Matsuoka
- Laboratory for Mammalian Epigenetic Studies, Center for Developmental Biology, RIKEN, Kobe, Japan
| | - Akiko Yamagiwa
- Laboratory for Mammalian Epigenetic Studies, Center for Developmental Biology, RIKEN, Kobe, Japan
| | - Hitoshi Niwa
- Laboratory for Pluripotent Cell Studies, Center for Developmental Biology, RIKEN, Kobe, Japan
| | - Masaki Okano
- Laboratory for Mammalian Epigenetic Studies, Center for Developmental Biology, RIKEN, Kobe, Japan
- * E-mail:
| |
Collapse
|
325
|
Zambelli F, Pesole G, Pavesi G. PscanChIP: Finding over-represented transcription factor-binding site motifs and their correlations in sequences from ChIP-Seq experiments. Nucleic Acids Res 2013; 41:W535-43. [PMID: 23748563 PMCID: PMC3692095 DOI: 10.1093/nar/gkt448] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Chromatin immunoprecipitation followed by sequencing with next-generation technologies (ChIP-Seq) has become the de facto standard for building genome-wide maps of regions bound by a given transcription factor (TF). The regions identified, however, have to be further analyzed to determine the actual DNA-binding sites for the TF, as well as sites for other TFs belonging to the same TF complex or in general co-operating or interacting with it in transcription regulation. PscanChIP is a web server that, starting from a collection of genomic regions derived from a ChIP-Seq experiment, scans them using motif descriptors like JASPAR or TRANSFAC position-specific frequency matrices, or descriptors uploaded by users, and it evaluates both motif enrichment and positional bias within the regions according to different measures and criteria. PscanChIP can successfully identify not only the actual binding sites for the TF investigated by a ChIP-Seq experiment but also secondary motifs corresponding to other TFs that tend to bind the same regions, and, if present, precise positional correlations among their respective sites. The web interface is free for use, and there is no login requirement. It is available at http://www.beaconlab.it/pscan_chip_dev.
Collapse
Affiliation(s)
- Federico Zambelli
- Dipartimento di Bioscienze, Università di Milano, Via Celoria 26, 20133 Milano, Italy
| | | | | |
Collapse
|
326
|
Holo-TFIID controls the magnitude of a transcription burst and fine-tuning of transcription. Proc Natl Acad Sci U S A 2013; 110:7678-83. [PMID: 23610421 DOI: 10.1073/pnas.1221712110] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Transcription factor (TF)IID is a central player in activated transcription initiation. Recent evidence suggests that the role and composition of TFIID are more diverse than previously understood. To investigate the effects of changing the composition of TFIID in a simple system, we depleted TATA box-binding protein-associated factor (TAF)1 from Drosophila cells and determined the consequences on metal-induced transcription at an inducible gene, metallothionein B. We observe a marked increase in the levels of both the mature message and pre-mRNA in TAF1-depleted cells. Under conditions of continued metal exposure, we show that TAF1 depletion increases the magnitude of the initial transcription burst but has no effect on the timing of that burst. We also show that TAF1 depletion causes delay in the shutoff of transcription upon removal of the stimulus. Thus, TAFs are involved in both establishing an upper limit of transcription during induction and efficiently turning the gene off once the inducer is removed. Using genome-wide nascent sequencing, we identify hundreds of genes that are controlled in a similar manner, indicating that the findings at this inducible gene are likely generalizable to a large set of promoters. There is a long-standing appreciation for the importance of the spatial and temporal control of transcription. Here we uncover an important third dimension of control: the magnitude of the response. Our results show that the magnitude of the transcriptional response to the same signaling event, even at the same promoter, can vary greatly depending on the composition of the TFIID complex in the cell.
Collapse
|
327
|
Abstract
The specificity of protein-DNA interactions is most commonly modeled using position weight matrices (PWMs). First introduced in 1982, they have been adapted to many new types of data and many different approaches have been developed to determine the parameters of the PWM. New high-throughput technologies provide a large amount of data rapidly and offer an unprecedented opportunity to determine accurately the specificities of many transcription factors (TFs). But taking full advantage of the new data requires advanced algorithms that take into account the biophysical processes involved in generating the data. The new large datasets can also aid in determining when the PWM model is inadequate and must be extended to provide accurate predictions of binding sites. This article provides a general mathematical description of a PWM and how it is used to score potential binding sites, a brief history of the approaches that have been developed and the types of data that are used with an emphasis on algorithms that we have developed for analyzing high-throughput datasets from several new technologies. It also describes extensions that can be added when the simple PWM model is inadequate and further enhancements that may be necessary. It briefly describes some applications of PWMs in the discovery and modeling of in vivo regulatory networks.
Collapse
|
328
|
Dominissini D, Moshitch-Moshkovitz S, Salmon-Divon M, Amariglio N, Rechavi G. Transcriptome-wide mapping of N(6)-methyladenosine by m(6)A-seq based on immunocapturing and massively parallel sequencing. Nat Protoc 2013; 8:176-89. [PMID: 23288318 DOI: 10.1038/nprot.2012.148] [Citation(s) in RCA: 478] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
N(6)-methyladenosine-sequencing (m(6)A-seq) is an immunocapturing approach for the unbiased transcriptome-wide localization of m(6)A in high resolution. To our knowledge, this is the first protocol to allow a global view of this ubiquitous RNA modification, and it is based on antibody-mediated enrichment of methylated RNA fragments followed by massively parallel sequencing. Building on principles of chromatin immunoprecipitation-sequencing (ChIP-seq) and methylated DNA immunoprecipitation (MeDIP), read densities of immunoprecipitated RNA relative to untreated input control are used to identify methylated sites. A consensus motif is deduced, and its distance to the point of maximal enrichment is assessed; these measures further corroborate the success of the protocol. Identified locations are intersected in turn with gene architecture to draw conclusions regarding the distribution of m(6)A between and within gene transcripts. When applied to human and mouse transcriptomes, m(6)A-seq generated comprehensive methylation profiles revealing, for the first time, tenets governing the nonrandom distribution of m(6)A. The protocol can be completed within ~9 d for four different sample pairs (each consists of an immunoprecipitation and corresponding input).
Collapse
Affiliation(s)
- Dan Dominissini
- Cancer Research Center, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | | | | | | | | |
Collapse
|
329
|
The relationship between long-range chromatin occupancy and polymerization of the Drosophila ETS family transcriptional repressor Yan. Genetics 2012; 193:633-49. [PMID: 23172856 DOI: 10.1534/genetics.112.146647] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
ETS family transcription factors are evolutionarily conserved downstream effectors of Ras/MAPK signaling with critical roles in development and cancer. In Drosophila, the ETS repressor Yan regulates cell proliferation and differentiation in a variety of tissues; however, the mechanisms of Yan-mediated repression are not well understood and only a few direct target genes have been identified. Yan, like its human ortholog TEL1, self-associates through an N-terminal sterile α-motif (SAM), leading to speculation that Yan/TEL1 polymers may spread along chromatin to form large repressive domains. To test this hypothesis, we created a monomeric form of Yan by recombineering a point mutation that blocks SAM-mediated self-association into the yan genomic locus and compared its genome-wide chromatin occupancy profile to that of endogenous wild-type Yan. Consistent with the spreading model predictions, wild-type Yan-bound regions span multiple kilobases. Extended occupancy patterns appear most prominent at genes encoding crucial developmental regulators and signaling molecules and are highly conserved between Drosophila melanogaster and D. virilis, suggesting functional relevance. Surprisingly, although occupancy is reduced, the Yan monomer still makes extensive multikilobase contacts with chromatin, with an overall pattern similar to that of wild-type Yan. Despite its near-normal chromatin recruitment, the repressive function of the Yan monomer is significantly impaired, as evidenced by elevated target gene expression and failure to rescue a yan null mutation. Together our data argue that SAM-mediated polymerization contributes to the functional output of the active Yan repressive complexes that assemble across extended stretches of chromatin, but does not directly mediate recruitment to DNA or chromatin spreading.
Collapse
|
330
|
Narlikar L. MuMoD: a Bayesian approach to detect multiple modes of protein-DNA binding from genome-wide ChIP data. Nucleic Acids Res 2012; 41:21-32. [PMID: 23093591 PMCID: PMC3592440 DOI: 10.1093/nar/gks950] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
High-throughput chromatin immunoprecipitation has become the method of choice for identifying genomic regions bound by a protein. Such regions are then investigated for overrepresented sequence motifs, the assumption being that they must correspond to the binding specificity of the profiled protein. However this approach often fails: many bound regions do not contain the 'expected' motif. This is because binding DNA directly at its recognition site is not the only way the protein can cause the region to immunoprecipitate. Its binding specificity can change through association with different co-factors, it can bind DNA indirectly, through intermediaries, or even enforce its function through long-range chromosomal interactions. Conventional motif discovery methods, though largely capable of identifying overrepresented motifs from bound regions, lack the ability to characterize such diverse modes of protein-DNA binding and binding specificities. We present a novel Bayesian method that identifies distinct protein-DNA binding mechanisms without relying on any motif database. The method successfully identifies co-factors of proteins that do not bind DNA directly, such as mediator and p300. It also predicts literature-supported enhancer-promoter interactions. Even for well-studied direct-binding proteins, this method provides compelling evidence for previously uncharacterized dependencies within positions of binding sites, long-range chromosomal interactions and dimerization.
Collapse
Affiliation(s)
- Leelavati Narlikar
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune 411008, India.
| |
Collapse
|
331
|
A survey of 6,300 genomic fragments for cis-regulatory activity in the imaginal discs of Drosophila melanogaster. Cell Rep 2012; 2:1014-24. [PMID: 23063361 DOI: 10.1016/j.celrep.2012.09.010] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Revised: 09/14/2012] [Accepted: 09/17/2012] [Indexed: 11/21/2022] Open
Abstract
Over 6,000 fragments from the genome of Drosophila melanogaster were analyzed for their ability to drive expression of GAL4 reporter genes in the third-instar larval imaginal discs. About 1,200 reporter genes drove expression in the eye, antenna, leg, wing, haltere, or genital imaginal discs. The patterns ranged from large regions to individual cells. About 75% of the active fragments drove expression in multiple discs; 20% were expressed in ventral, but not dorsal, discs (legs, genital, and antenna), whereas ∼23% were expressed in dorsal but not ventral discs (wing, haltere, and eye). Several patterns, for example, within the leg chordotonal organ, appeared a surprisingly large number of times. Unbiased searches for DNA sequence motifs suggest candidate transcription factors that may regulate enhancers with shared activities. Together, these expression patterns provide a valuable resource to the community and offer a broad overview of how transcriptional regulatory information is distributed in the Drosophila genome.
Collapse
|
332
|
oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets. G3-GENES GENOMES GENETICS 2012; 2:987-1002. [PMID: 22973536 PMCID: PMC3429929 DOI: 10.1534/g3.112.003202] [Citation(s) in RCA: 230] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2012] [Accepted: 06/11/2012] [Indexed: 01/12/2023]
Abstract
oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.
Collapse
|
333
|
Lin Y, Li Z, Ozsolak F, Kim SW, Arango-Argoty G, Liu TT, Tenenbaum SA, Bailey T, Monaghan AP, Milos PM, John B. An in-depth map of polyadenylation sites in cancer. Nucleic Acids Res 2012; 40:8460-71. [PMID: 22753024 PMCID: PMC3458571 DOI: 10.1093/nar/gks637] [Citation(s) in RCA: 115] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Revised: 05/16/2012] [Accepted: 06/06/2012] [Indexed: 12/22/2022] Open
Abstract
We present a comprehensive map of over 1 million polyadenylation sites and quantify their usage in major cancers and tumor cell lines using direct RNA sequencing. We built the Expression and Polyadenylation Database to enable the visualization of the polyadenylation maps in various cancers and to facilitate the discovery of novel genes and gene isoforms that are potentially important to tumorigenesis. Analyses of polyadenylation sites indicate that a large fraction (∼30%) of mRNAs contain alternative polyadenylation sites in their 3' untranslated regions, independent of the cell type. The shortest 3' untranslated region isoforms are preferentially upregulated in cancer tissues, genome-wide. Candidate targets of alternative polyadenylation-mediated upregulation of short isoforms include POLR2K, and signaling cascades of cell-cell and cell-extracellular matrix contact, particularly involving regulators of Rho GTPases. Polyadenylation maps also helped to improve 3' untranslated region annotations and identify candidate regulatory marks such as sequence motifs, H3K36Me3 and Pabpc1 that are isoform dependent and occur in a position-specific manner. In summary, these results highlight the need to go beyond monitoring only the cumulative transcript levels for a gene, to separately analysing the expression of its RNA isoforms.
Collapse
Affiliation(s)
- Yuefeng Lin
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Zhihua Li
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Fatih Ozsolak
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Sang Woo Kim
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Gustavo Arango-Argoty
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Teresa T. Liu
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Scott A. Tenenbaum
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Timothy Bailey
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - A. Paula Monaghan
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Patrice M. Milos
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| | - Bino John
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, Helicos BioSciences Corporation, One Kendall Square, Cambridge, MA 02139, College of Nanoscale Science and Engineering, University at Albany-Suny, Albany, NY, USA, Institute for Molecular Bioscience, the University of Queensland, Queensland, Australia and Department of Neurobiology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA
| |
Collapse
|
334
|
Tallack MR, Magor GW, Dartigues B, Sun L, Huang S, Fittock JM, Fry SV, Glazov EA, Bailey TL, Perkins AC. Novel roles for KLF1 in erythropoiesis revealed by mRNA-seq. Genome Res 2012; 22:2385-98. [PMID: 22835905 PMCID: PMC3514668 DOI: 10.1101/gr.135707.111] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
KLF1 (formerly known as EKLF) regulates the development of erythroid cells from bi-potent progenitor cells via the transcriptional activation of a diverse set of genes. Mice lacking Klf1 die in utero prior to E15 from severe anemia due to the inadequate expression of genes controlling hemoglobin production, cell membrane and cytoskeletal integrity, and the cell cycle. We have recently described the full repertoire of KLF1 binding sites in vivo by performing KLF1 ChIP-seq in primary erythroid tissue (E14.5 fetal liver). Here we describe the KLF1-dependent erythroid transcriptome by comparing mRNA-seq from Klf1+/+ and Klf1−/− erythroid tissue. This has revealed novel target genes not previously obtainable by traditional microarray technology, and provided novel insights into the function of KLF1 as a transcriptional activator. We define a cis-regulatory module bound by KLF1, GATA1, TAL1, and EP300 that coordinates a core set of erythroid genes. We also describe a novel set of erythroid-specific promoters that drive high-level expression of otherwise ubiquitously expressed genes in erythroid cells. Our study has identified two novel lncRNAs that are dynamically expressed during erythroid differentiation, and discovered a role for KLF1 in directing apoptotic gene expression to drive the terminal stages of erythroid maturation.
Collapse
Affiliation(s)
- Michael R Tallack
- Mater Medical Research Institute, Mater Hospital, Brisbane, Queensland 4101, Australia
| | | | | | | | | | | | | | | | | | | |
Collapse
|